tigerchen52 / love Goto Github PK
View Code? Open in Web Editor NEWACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
License: MIT License
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
License: MIT License
Hi,
I was wondering When generating typos for Extrinsic tasks (tables 3, 4) in the paper,
Did you corrupt both the train and test datasets?
Thank you!
Hello, I am trying to use BERT and LOVE for text classification recently. In your latest released code, I have some questions:
Hi!
I'm trying to reproduce results from your ACL22 paper for MLRC2022.
In your paper, you show that LOVE+FastText and LOVE+BERT achieve more robustness in OOV, with respect to their respective baselines.
However, I found no instructions on how to produce typos/a corrupted dataset, except for details provided in Appendix B.3, where you talk about simulating post-OCR errors.
Would you be able to provide more insights on simulated typos?
I also found these paths in extrinsic/rnn_ner/gen_vocab.py:
train_path = 'input/train.txt'
dev_path = 'typo_data/typo_dev.txt'
test_path = 'typo_data/typo_test.txt'
out_path = 'typo_data/typo_vocab.txt'
which make me believe that some folders or script for simulating typos are missing.
Thank you!
Hello! I am currently trying to reproduce the LOVE model, but I have encountered an issue with data augmentation.
Specifically, the paper mentions that one of the strategies for data augmentation is to replace the original word with a synonymous word. However, I noticed that the 'data/synonym.txt' file does not contain the full set of 2M vocabulary as expected.
Could you please provide the complete 'data/synonym.txt' file or, alternatively, share the code that can be used to generate this file? Thank you for your assistance!
Hi,
I was wondering LOVE makes a hard negative sample that top-100 similar words are extracted from each target word,
So do you make hard negative samples for every target words in vector file?
And How can create the similar words by edit distance for each target words?
or could you please provide the hard negative sample file used in LOVE framework?
Thank you!
Hi,
I've been trying to reproduce the performance in the paper for the SST2 task using the 'BERT+LOVE' embedding you provided.
I tried changing various hyper-parameters in the model and modifying the code.
However, I failed to reproduce the performance of the paper.
My reproduction performance is below.
Could you provide the code that performed the SST2 Task?
Thank you!
Hello! I am currently trying to reproduce the LOVE model,
The sentence "For ease of implementation, we learn only from the words that are not separated into pieces." in your paper.
As I understand, In the vocab.txt file you provided, you did not use special tokens (e.g. "[PAD]", "[UNK]") and separated words (e.g. ##ir, ##di).
Is my understanding correct?
Thank you!
Hello! I have a question about the detail of FastText baseline in the Table 2 ~ 4.
For this baseline, as handling OOV words, we have two choices:
In the context of Bojanowski et al. (2017)[1], the first option corresponds to the "sisg-" setting, while the second aligns with the "sisg" setting.
Could you please specify which option was utilized in your experiments?
My conjecture leans towards the option 1) because the option 2) doesn't seem to follow a mimick-like model.
Nonetheless, I would greatly appreciate your guidance on this matter.
Thank you in advance for your help!
[1] Bojanowski et al., Enriching Word Vectors with Subword Information, TACL 2017.
Hi,
I was wondering how you got the intrinsic scores for KVQ-FH in Table 2. The scores reported in their paper (table 4) are much higher than the scores you report, and higher than the scores reported for LOVE.
Kind regards,
Stéphan
Hi,
I was wondering where I could find the testing/training code for the extrinsic tasks, i.e., SST-2 and CoNLL-03.
Also, are the models included in the repository the models for which you report the scores in the paper?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.