chijames / poly-encoder Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Dear @chijames ,
I came across your Poly-Encoder and would like to adapt it for some work purposes. I was told that I can't use it unless it is open-source licensed. I was wondering if you are willing to allow for that, perhaps through an MIT license etc.?
https://choosealicense.com/licenses/mit/
Hope to hear from you and thank you very much!
Best Regards,
Chor Seng
Hi! Thanks a lot for sharing your code!
I got some question about the performance.
You propose the performance of your code on DSTC7 with bi-encoder as follows,
However, in the original paper, the performance of bi-encoder on DSTC7 is
With your code we can get R@1 for 0.437 but the performance in the original paper is 0.565 on dev set and 0.668 on test set. I read your code carefully but find little difference with the setting in the original paper. I also change your default one-bert to two different bert for bi-encoder, but still cannot get the same performance as that in the original paper. Why?
First of all, I really appreciate for the nice repo.
The t_total in run.py
is calculated by t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs
and the t_total
is passed into transformers.get_linear_schedule_with_warmup
. This indicates the total number of steps of the training process.
However, I guess the total nember of steps is calculated by the number of batches
* epoch
. Therefore, the code for calculating t_total
should be t_total = len(train_dataloader) // (args.train_batch_size * args.gradient_accumulation_steps) * args.num_train_epochs
If I'm wrong, please let me know what am I missing.
Hi @chijames, thanks so much for this wonderful project!
After digging into the code, I have two questions:
Is there any special reason why masking is not implemented in this section?
Lines 72 to 78 in e5299e3
Can we speed up the construction of poly_code_embeddings
by using nn.Parameters
? In this way, we don't need to create poly_ids
and move it to GPU in every batches.
Thanks for your reply!
(To be honest, I'm not used to "deep learning coding" (PyTorch, Huggingface, etc...), so this might be a silly question. Keep in mind I'm a beginner.)
The original paper said that context encoder and candidate encoder are trained separately.
However I found in your code that both transformers are called as self.bert()
.
https://github.com/chijames/Poly-Encoder/blob/master/encoder.py#L20-L27
Is it OK? I doubt these two encoders have different weights after training.
FYI: In the official implementation of BLINK(https://arxiv.org/pdf/1911.03814.pdf ) paper, they prepare different methods. https://github.com/facebookresearch/BLINK/blob/master/blink/biencoder/biencoder.py#L37-L48
Hi, Work is really great.
I am just trying to understand if labels are None, then encoders are outputting matrices instead of scalers, but you have not made any provision for this in your code.
Also what is neg in cross encoders? Can you please provide some context on the variables that you use using commenting?
Also why bi & poly encoder model use responses as 3-dimensional stuff?
Hi! I'm using your code and want to reproduce your result on the DSTC 7 dataset.
When training the Cross-Encoder. I use BERT-small (uncased_L-4_H-512_A-8.zip) and leave all hyperparameters unchanged as in run.py (batch size=32, max context length=128, max response length=32). However I came across OOM on my Tesla M40 GPU, which has a memory of 11G.
I wonder how you can train the cross-encoder on your GPU. I guess the default hyperparameters in run.py are designed for training bi-encoder and poly-encoder. Could you please show me your hyperparameters when training cross-encoder?
Why do you convert the google-bert weight instead of directly using the bert weight of huggingface. Is there any performance difference between the two?
# converted weight from google-bert
bert = BertModelClass.from_pretrained(args.bert_model, state_dict=model_state_dict)
# huggingface weight
bert = BertModelClass.from_pretrained('bert-base-uncased')
All input samples are positive, the training is meaningless.
Lines 20 to 27 in 6f0d9c4
I have noticed that in parse.py, the candidate response is concatenated to context by '\t'. This will lead to mistake when reading this record for training. Considering this case that candidate response is "", which actually exists in the dstc7 dataset, when split this record by '\t' to extract response, the last utterence in context will be chosen.
(this is my first time to submit the issue, I hope I have dipicted the bug clearly.
There doesn’t seem to be the config file you used to run this code, I’m just curious what some of the values you are using are. Specifically hidden size referenced in the poly encoder section to calculate your m.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.