Hello, First of all, thank you for a nice repository. I am however a

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Ordering of sentences trained on matters for the inferred vectors. about fast_sentence_embeddings HOT 4 CLOSED

oborchers commented on May 22, 2024

Ordering of sentences trained on matters for the inferred vectors.

from fast_sentence_embeddings.

Comments (4)

Filco306 commented on May 22, 2024 1

Hello there! Very nice, thank you for this!

from fast_sentence_embeddings.

Filco306 commented on May 22, 2024

I can add that this is the case, even if I set a seed and run these two separately.

from fast_sentence_embeddings.

grantmwilliams commented on May 22, 2024

Hey @Filco306 i was curious about this and give it a look and it looks to me that this is simply a precision issue. If instead of using np.all(vecs == vecs2) you try using assert_allclose(vecs, vecs2, atol=1e-5) from numpy's testing library you'll see it asserts true.

As an example I get print(vecs-vecs2):

[[ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   7.4505806e-09  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  3.7252903e-09
   0.0000000e+00  0.0000000e+00  1.4901161e-08  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00 -9.3132257e-10
   0.0000000e+00  0.0000000e+00 -3.7252903e-09  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  7.4505806e-09  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00 -1.8626451e-09  0.0000000e+00
   1.8626451e-09  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   1.8626451e-09  3.7252903e-09  0.0000000e+00 -7.4505806e-09
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   2.2351742e-08  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -9.3132257e-10  0.0000000e+00  0.0000000e+00  0.0000000e+00
  -9.3132257e-10  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  1.8626451e-09  0.0000000e+00 -9.3132257e-10
   0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00
   0.0000000e+00  0.0000000e+00  3.7252903e-09  7.4505806e-09
   0.0000000e+00  0.0000000e+00  4.4703484e-08  2.3283064e-10
   3.7252903e-09  1.8626451e-09  0.0000000e+00  0.0000000e+00]]

I suspect this is because under the hood fast text uses asynchronous stochastic gradient descent, or Hogwild as the optimization algorithm. From the Gensim documentation you'll see that setting the seed isn't enough to guarantee perfect reproducibility and you also need to set the number of threads to 1 and possibly set the PYTHONHASHSEED env variable.

seed (int, optional) – Seed for the random number generator. Initial vectors for each word are seeded with a hash of the concatenation of word + str(seed). Note that for a fully deterministically-reproducible run, you must also limit the model to a single worker thread (workers=1), to eliminate ordering jitter from OS thread scheduling. (In Python 3, reproducibility between interpreter launches also requires use of the PYTHONHASHSEED environment variable to control hash randomization).

from fast_sentence_embeddings.

Filco306 commented on May 22, 2024

Then I will consider this closed :)

from fast_sentence_embeddings.

Ordering of sentences trained on matters for the inferred vectors. about fast_sentence_embeddings HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent