Coder Social home page Coder Social logo

Comments (12)

Sosuke115 avatar Sosuke115 commented on August 12, 2024 1

Hi

The issue might be due to the pretrained Wikipedia2Vec vectors.
I used float32 for my experiments but converted them to float16 for Google Drive's storage limit when publishing them.
You could try training the Wikipedia2Vec vectors yourself.

S2="enwiki.fp16.768.vec";

https://wikipedia2vec.github.io/wikipedia2vec/

from ease.

Sosuke115 avatar Sosuke115 commented on August 12, 2024 1

Yes.
Note that I set the vector dimension (--dim-size) to 768 to align with BERT's hidden size, as mentioned in the paper.

from ease.

Sosuke115 avatar Sosuke115 commented on August 12, 2024 1

Thank you for sharing the information.
I've forgotten the detailed results for each hyperparameter when using EASE, but it may indeed be true that EASE is more sensitive to these hyperparameters than SimCSE.
As you've demonstrated, it seems there might be more optimal hyperparameters when using the float16 version of Wikipedia2Vec.

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024 1

Oh, I understand your current situation.

I used lr = 3e-05 & batch size = 64 and the result is below.
image

Also, when using lr = 3e-05 & batch size = 128, the result is below.
image

You don't need to overdo it for accessing pertrained vectors !!
I really appreciate your opinions and feedbacks.
Thank you!

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024

I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120.
Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.

However, there are some hyperparameters.
So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?

Thank you.

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024

I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120. Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.

However, there are some hyperparameters. So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?

Thank you.

I found that you used default hyperparameters in paper. Thank you.

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024

I also share below result when using batchsize=64 (per_device_train_batch_size=8 & gradient_accumulation_steps=8) and enwiki.fp16.768.vec.

image

I just modified only batch size and gained improvement of avg 2.77 compared to batch size 128 (default batch size value in train_monolingual_ease.sh).

In contrast with the mention in SimCSE paper "We find that SimCSE is not sensitive to batch sizes as long as tuning the learning rates accordingly", it looks like EASE is sensitive to batch size.
But, it is possible that the appropriate learning rate was not used when batch size 128 was applied.

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024

I trained entity vector by using wikipedia data in https://archive.org/download/enwiki-20190120, but the results based on this trained entity vector was not good compared to that in the paper.

Could you have the way for releasing pretrained entity vector?

Thank you.

from ease.

Sosuke115 avatar Sosuke115 commented on August 12, 2024

I see.
The pretrained vectors I used during my experiments are on the server that I may not be able to access at the moment.
I will check later to see if I can access them.

Could you please show me the performance of EASE using that vector?
Also, since the performance might depend on the slight randomness of Wikipedia2Vec, could you please try training EASE with different hyperparameters?
(e.g. learning rate ∈ {3e − 05, 5e − 05}, batch size ∈ {64, 128} )

Furthermore, there's a possibility that Wikipedia2Vec might not be the cause.
(At the moment, I'm not sure what could be the cause...)

from ease.

Sosuke115 avatar Sosuke115 commented on August 12, 2024

I see.
Given such a significant performance difference with the fp32 vector, it's unlikely that the pretrained Wikipedia2Vec vector is the cause.
I'll inspect the execution settings later to determine what might be causing the issue.
Thank you for sharing this information.

from ease.

Sosuke115 avatar Sosuke115 commented on August 12, 2024

@kimwongyuda

I have identified a potential cause.
Please try selecting --pooler_type cls during training, and --pooler_type cls_before_poolerduring evaluation.
In monolingual EASE, a linear layer should be added during training only, the same as in Unsupervised SimCSE.

We add a linear layer after the output sentence embeddings only during training, as in Gao et al. (2021).

from ease.

kimwongyuda avatar kimwongyuda commented on August 12, 2024

Thank you for your help!

I think we can close this issue.

Thank you!

from ease.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.