Comments (20)
Working on this for Thai with a new network trained on QRNN. I got worse perplexity on Wiki but I think it might not matter much.
from multifit.
I've started setting up basic data processing for XNLI at a new branch: https://github.com/n-waves/ulmfit-multilingual/tree/xnli
e.g. XNLI download script here
If I understand correctly, the goal is to get it to the point where we can use pretrain_lm
with XNLI in the same way we use it with Wikidumps?
from multifit.
I think the best thing to try is:
- Use a special delimiter token between the premise and the hypothesis and treat premise + delimiter + hypothesis as a single input as we would for classification (this is what is done in the OpenAI paper).
We could also run the RNN over the premise and the hypothesis separately and then feed in the concatenation of the sum, the difference, and the product of their representations as input to a classifier (this is done in a lot of entailment papers), but I think this would perform worse than the first approach as we don't have any interaction between hypothesis and premise.
from multifit.
Agreed. It'd be cool if we could try this later. That way, we could also position our approach as allowing faster experimentation with such things compared to BERT.
from multifit.
I tried to test on Arabic model and got poor results then tried English using WT103_1 model and it didn't work well either. Not sure what's wrong. Code for En here. I used 2 cols (premise and hypo) with mark_fields=True.
from multifit.
Superb! @cstorm125 let me add you to this repo so you can write directly.
from multifit.
@cstorm125 can you describe how you want to have this implemented?
from multifit.
Thanks for opening this issue. I think setting up the data processing and general evaluation on XNLI will be necessary for submitting the paper and should be largely independent of the other things we're doing.
from multifit.
Thanks, Nirant! The goal would be to fine-tune the pretrained language model and train the classifier on the XNLI data and to then evaluate on it, so that we can compare to multilingual BERT. Does that make sense?
from multifit.
@sebastianruder how do you want to differentiate between the text and the hypothesis? Using different xxfld
, or using separate RNNs one to read the text and another to read hypothesis?
from multifit.
Ok if we want to use fields we should most likely fix the order in which fields appear in the backward pass as currently, they appear as xxfld 1 text
in froward LM and text 1 xxfld
in backward LM.
from multifit.
I think it'd be good if we could get rid of fields in favour of <bos>
(beginning of sentence/example) and <eos>
(end of sentence/example) tokens, which are more standard in the literature. We're already adding <eos>
tokens to the vocabulary to replace the newlines in the Wikipedia data, so it seems we just need to add the <bos>
tokens.
from multifit.
so you would like to have <bos> premise text <eos> <bos> hypothesis <eos>
? How would it work in backward LM? BERT is adding an additional trainable vector to all words in the second sentence, I though they do that because it works better than what openai suggested.
But I think we can experiment with a few different markups.
from multifit.
Basically, except we'd add another special token for entailment, so that the input would be:
<bos> premise text <eos> <sep> <bos> hypothesis <eos>
Representations for <bos>
and <eos>
should already be learned during language modeling. During fine-tuning, we then only need to learn an embedding for <sep>
.
Yeah, in their case they also pretrain the segment embeddings for sentence A and sentence B with their next sentence prediction task. As we don't do that, I'm not sure if this will be better than the first approach, but it's worth a try.
from multifit.
Yeah, in their case they also pre-train the segment embeddings for sentence A and sentence B with their next sentence prediction task
Nothing stops us from doing this as an intermediate step, that way <sep>
would also be learned. But that for later.
from multifit.
Context: I am trying to use the train_clas
to fine tune and classify for XNLI-English. I am using the read_xnli
from fastai_contrib
.
I am able to fine tune the Wikitext-103 LM from fastai for 2 epochs with the following:
$python ulmfit/train_clas.py --data_dir=./data --dataset=xnli --bs=20
...
2 4.168341 4.187915 0.326962
Starting classifier training
epoch train_loss valid_loss accuracy
... ... ... ...
At the classifier step, (when trying to validate?) at the end of first epoch:
Traceback (most recent call last):
File "ulmfit/train_clas.py", line 162, in <module>
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "ulmfit/train_clas.py", line 145, in new_train_clas
File "/home/nirant/fastai/fastai/train.py", line 20, in fit_one_cycle
learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
File "/home/nirant/fastai/fastai/basic_train.py", line 162, in fit
callbacks=self.callbacks+callbacks)
File "/home/nirant/fastai/fastai/basic_train.py", line 94, in fit
raise e
File "/home/nirant/fastai/fastai/basic_train.py", line 89, in fit
cb_handler=cb_handler, pbar=pbar)
File "/home/nirant/fastai/fastai/basic_train.py", line 49, in validate
for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 65, in __iter__
for i,o in enumerate(self._gen):
File "/home/nirant/fastai/fastai/basic_data.py", line 47, in __iter__
for b in self.dl:
File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/nirant/fastai/fastai/text/data.py", line 92, in pad_collate
return res, tensor([s[1] for s in samples])
File "/home/nirant/fastai/fastai/torch_core.py", line 68, in tensor
return torch.tensor(x) if is_listy(x) else as_tensor(x)
RuntimeError: Could not infer dtype of NoneType
This happens at the fit_one_cycle
for classification. Is this because of the vocabulary mismatch? Or because we do not update automatically from 2-predicted classes as in imdb to 3-class as in XNLI?
What are other possible points of error which I should check?
from multifit.
Another source of error might be that we're not mapping the XNLI labels to int
at the moment.
from multifit.
Would that not fail on first iteration of the classifier if the our mapping is not accepted by the fit_one_cycle
?
from multifit.
Guys, can anyone have a look at refactoring branch and try to implement a working XNLI that respect recent changes in Fastai, so we get a baseline that we can later try to improve upon?
from multifit.
Let me close that for the time as it is unlikely we will play with XNLI anytime soon
from multifit.
Related Issues (20)
- Different size of CLS unsupervised data between .csv and original .xml files HOT 1
- Multifit inference problem HOT 2
- Training custom classifier HOT 1
- Specifying a validation set HOT 1
- Problems with reproducing zero-shot learning results HOT 2
- Where can I find the dataset de.train.csv? HOT 3
- Kernel restarted HOT 1
- Saliency maps
- Tokenizer HOT 1
- Get activations of a specific layer of the multifit model
- Missing File in CLS-DE.ipynb HOT 1
- Always labels are tokenizing instead of text column, Kindly fix the issue facing HOT 1
- Download music/books data in german version
- OOM during finetuning
- File exists but it doesn't found it!!!!! HOT 1
- fp16
- Label_for_lm() takes too much time!
- multifit does'nt work on Google Colab HOT 4
- Port to Fastai 2 HOT 1
- Create classifier with fastai v1.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multifit.