Hi, first of all, thank you for your nice work. I just trained monol

I trained entity vector by using wikipedia data in <a href="https://archive.org/downlo

About results of training and evaluation about ease HOT 12 CLOSED

kimwongyuda commented on August 12, 2024

About results of training and evaluation

from ease.

Comments (12)

Sosuke115 commented on August 12, 2024 1

The issue might be due to the pretrained Wikipedia2Vec vectors.
I used float32 for my experiments but converted them to float16 for Google Drive's storage limit when publishing them.
You could try training the Wikipedia2Vec vectors yourself.

ease/download_all.sh

Line 14 in 37a811e

S2="enwiki.fp16.768.vec";

https://wikipedia2vec.github.io/wikipedia2vec/

from ease.

Sosuke115 commented on August 12, 2024 1

Yes.
Note that I set the vector dimension (--dim-size) to 768 to align with BERT's hidden size, as mentioned in the paper.

from ease.

Sosuke115 commented on August 12, 2024 1

Thank you for sharing the information.
I've forgotten the detailed results for each hyperparameter when using EASE, but it may indeed be true that EASE is more sensitive to these hyperparameters than SimCSE.
As you've demonstrated, it seems there might be more optimal hyperparameters when using the float16 version of Wikipedia2Vec.

from ease.

kimwongyuda commented on August 12, 2024 1

Oh, I understand your current situation.

I used lr = 3e-05 & batch size = 64 and the result is below.

Also, when using lr = 3e-05 & batch size = 128, the result is below.

You don't need to overdo it for accessing pertrained vectors !!
I really appreciate your opinions and feedbacks.
Thank you!

from ease.

kimwongyuda commented on August 12, 2024

I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120.
Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.

However, there are some hyperparameters.
So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?

Thank you.

from ease.

kimwongyuda commented on August 12, 2024

I got wikipedia data (enwiki-20190120-pages-articles.xml.bz2) from https://archive.org/download/enwiki-20190120. Then, I'm going to train Wikipedia2Vec vectors by following https://wikipedia2vec.github.io/wikipedia2vec/commands/.

However, there are some hyperparameters. So, could you let me know hyperparameters or setting or codes which you used for training float32 vectors?

Thank you.

I found that you used default hyperparameters in paper. Thank you.

from ease.

kimwongyuda commented on August 12, 2024

I also share below result when using batchsize=64 (per_device_train_batch_size=8 & gradient_accumulation_steps=8) and enwiki.fp16.768.vec.

I just modified only batch size and gained improvement of avg 2.77 compared to batch size 128 (default batch size value in train_monolingual_ease.sh).

In contrast with the mention in SimCSE paper "We find that SimCSE is not sensitive to batch sizes as long as tuning the learning rates accordingly", it looks like EASE is sensitive to batch size.
But, it is possible that the appropriate learning rate was not used when batch size 128 was applied.

from ease.

kimwongyuda commented on August 12, 2024

I trained entity vector by using wikipedia data in https://archive.org/download/enwiki-20190120, but the results based on this trained entity vector was not good compared to that in the paper.

Could you have the way for releasing pretrained entity vector?

Thank you.

from ease.

Sosuke115 commented on August 12, 2024

I see.
The pretrained vectors I used during my experiments are on the server that I may not be able to access at the moment.
I will check later to see if I can access them.

Could you please show me the performance of EASE using that vector?
Also, since the performance might depend on the slight randomness of Wikipedia2Vec, could you please try training EASE with different hyperparameters?
(e.g. learning rate ∈ {3e − 05, 5e − 05}, batch size ∈ {64, 128} )

Furthermore, there's a possibility that Wikipedia2Vec might not be the cause.
(At the moment, I'm not sure what could be the cause...)

from ease.

Sosuke115 commented on August 12, 2024

I see.
Given such a significant performance difference with the fp32 vector, it's unlikely that the pretrained Wikipedia2Vec vector is the cause.
I'll inspect the execution settings later to determine what might be causing the issue.
Thank you for sharing this information.

from ease.

Sosuke115 commented on August 12, 2024

@kimwongyuda

I have identified a potential cause.
Please try selecting --pooler_type cls during training, and --pooler_type cls_before_poolerduring evaluation.
In monolingual EASE, a linear layer should be added during training only, the same as in Unsupervised SimCSE.

We add a linear layer after the output sentence embeddings only during training, as in Gao et al. (2021).

from ease.

kimwongyuda commented on August 12, 2024

Thank you for your help!

I think we can close this issue.

Thank you!

from ease.

About results of training and evaluation about ease HOT 12 CLOSED

Comments (12)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent