Comments (12)
Hi
The issue might be due to the pretrained Wikipedia2Vec vectors.
I used float32 for my experiments but converted them to float16 for Google Drive's storage limit when publishing them.
You could try training the Wikipedia2Vec vectors yourself.
Line 14 in 37a811e
https://wikipedia2vec.github.io/wikipedia2vec/
from ease.
Yes.
Note that I set the vector dimension (--dim-size) to 768 to align with BERT's hidden size, as mentioned in the paper.
from ease.
Thank you for sharing the information.
I've forgotten the detailed results for each hyperparameter when using EASE, but it may indeed be true that EASE is more sensitive to these hyperparameters than SimCSE.
As you've demonstrated, it seems there might be more optimal hyperparameters when using the float16 version of Wikipedia2Vec.
from ease.
Oh, I understand your current situation.
I used lr = 3e-05 & batch size = 64 and the result is below.
Also, when using lr = 3e-05 & batch size = 128, the result is below.
You don't need to overdo it for accessing pertrained vectors !!
I really appreciate your opinions and feedbacks.
Thank you!
from ease.
I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120.
Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.
However, there are some hyperparameters.
So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?
Thank you.
from ease.
I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120. Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.
However, there are some hyperparameters. So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?
Thank you.
I found that you used default hyperparameters in paper. Thank you.
from ease.
I also share below result when using batchsize=64 (per_device_train_batch_size=8 & gradient_accumulation_steps=8) and enwiki.fp16.768.vec.
I just modified only batch size and gained improvement of avg 2.77 compared to batch size 128 (default batch size value in train_monolingual_ease.sh).
In contrast with the mention in SimCSE paper "We find that SimCSE is not sensitive to batch sizes as long as tuning the learning rates accordingly", it looks like EASE is sensitive to batch size.
But, it is possible that the appropriate learning rate was not used when batch size 128 was applied.
from ease.
I trained entity vector by using wikipedia data in https://archive.org/download/enwiki-20190120, but the results based on this trained entity vector was not good compared to that in the paper.
Could you have the way for releasing pretrained entity vector?
Thank you.
from ease.
I see.
The pretrained vectors I used during my experiments are on the server that I may not be able to access at the moment.
I will check later to see if I can access them.
Could you please show me the performance of EASE using that vector?
Also, since the performance might depend on the slight randomness of Wikipedia2Vec, could you please try training EASE with different hyperparameters?
(e.g. learning rate ∈ {3e − 05, 5e − 05}, batch size ∈ {64, 128} )
Furthermore, there's a possibility that Wikipedia2Vec might not be the cause.
(At the moment, I'm not sure what could be the cause...)
from ease.
I see.
Given such a significant performance difference with the fp32 vector, it's unlikely that the pretrained Wikipedia2Vec vector is the cause.
I'll inspect the execution settings later to determine what might be causing the issue.
Thank you for sharing this information.
from ease.
I have identified a potential cause.
Please try selecting --pooler_type cls
during training, and --pooler_type cls_before_pooler
during evaluation.
In monolingual EASE, a linear layer should be added during training only, the same as in Unsupervised SimCSE.
We add a linear layer after the output sentence embeddings only during training, as in Gao et al. (2021).
from ease.
Thank you for your help!
I think we can close this issue.
Thank you!
from ease.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ease.