Coder Social home page Coder Social logo

Comments (20)

cstorm125 avatar cstorm125 commented on August 14, 2024 1

Working on this for Thai with a new network trained on QRNN. I got worse perplexity on Wiki but I think it might not matter much.

from multifit.

NirantK avatar NirantK commented on August 14, 2024 1

I've started setting up basic data processing for XNLI at a new branch: https://github.com/n-waves/ulmfit-multilingual/tree/xnli

e.g. XNLI download script here

If I understand correctly, the goal is to get it to the point where we can use pretrain_lm with XNLI in the same way we use it with Wikidumps?

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024 1

I think the best thing to try is:

  • Use a special delimiter token between the premise and the hypothesis and treat premise + delimiter + hypothesis as a single input as we would for classification (this is what is done in the OpenAI paper).
    We could also run the RNN over the premise and the hypothesis separately and then feed in the concatenation of the sum, the difference, and the product of their representations as input to a classifier (this is done in a lot of entailment papers), but I think this would perform worse than the first approach as we don't have any interaction between hypothesis and premise.

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024 1

Agreed. It'd be cool if we could try this later. That way, we could also position our approach as allowing faster experimentation with such things compared to BERT.

from multifit.

abedkhooli avatar abedkhooli commented on August 14, 2024 1

I tried to test on Arabic model and got poor results then tried English using WT103_1 model and it didn't work well either. Not sure what's wrong. Code for En here. I used 2 cols (premise and hypo) with mark_fields=True.

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

Superb! @cstorm125 let me add you to this repo so you can write directly.

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

@cstorm125 can you describe how you want to have this implemented?

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024

Thanks for opening this issue. I think setting up the data processing and general evaluation on XNLI will be necessary for submitting the paper and should be largely independent of the other things we're doing.

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024

Thanks, Nirant! The goal would be to fine-tune the pretrained language model and train the classifier on the XNLI data and to then evaluate on it, so that we can compare to multilingual BERT. Does that make sense?

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

@sebastianruder how do you want to differentiate between the text and the hypothesis? Using different xxfld, or using separate RNNs one to read the text and another to read hypothesis?

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

Ok if we want to use fields we should most likely fix the order in which fields appear in the backward pass as currently, they appear as xxfld 1 text in froward LM and text 1 xxfld in backward LM.

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024

I think it'd be good if we could get rid of fields in favour of <bos> (beginning of sentence/example) and <eos> (end of sentence/example) tokens, which are more standard in the literature. We're already adding <eos> tokens to the vocabulary to replace the newlines in the Wikipedia data, so it seems we just need to add the <bos> tokens.

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

so you would like to have <bos> premise text <eos> <bos> hypothesis <eos>? How would it work in backward LM? BERT is adding an additional trainable vector to all words in the second sentence, I though they do that because it works better than what openai suggested.

But I think we can experiment with a few different markups.

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024

Basically, except we'd add another special token for entailment, so that the input would be:
<bos> premise text <eos> <sep> <bos> hypothesis <eos>
Representations for <bos> and <eos> should already be learned during language modeling. During fine-tuning, we then only need to learn an embedding for <sep>.
Yeah, in their case they also pretrain the segment embeddings for sentence A and sentence B with their next sentence prediction task. As we don't do that, I'm not sure if this will be better than the first approach, but it's worth a try.

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

Yeah, in their case they also pre-train the segment embeddings for sentence A and sentence B with their next sentence prediction task

Nothing stops us from doing this as an intermediate step, that way <sep> would also be learned. But that for later.

from multifit.

NirantK avatar NirantK commented on August 14, 2024

Context: I am trying to use the train_clas to fine tune and classify for XNLI-English. I am using the read_xnli from fastai_contrib.

I am able to fine tune the Wikitext-103 LM from fastai for 2 epochs with the following:

$python ulmfit/train_clas.py --data_dir=./data --dataset=xnli --bs=20
...
2      4.168341    4.187915    0.326962
Starting classifier training
epoch  train_loss  valid_loss  accuracy
... ... ... ...

At the classifier step, (when trying to validate?) at the end of first epoch:

Traceback (most recent call last):
  File "ulmfit/train_clas.py", line 162, in <module>
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "ulmfit/train_clas.py", line 145, in new_train_clas

  File "/home/nirant/fastai/fastai/train.py", line 20, in fit_one_cycle
    learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
  File "/home/nirant/fastai/fastai/basic_train.py", line 162, in fit
    callbacks=self.callbacks+callbacks)
  File "/home/nirant/fastai/fastai/basic_train.py", line 94, in fit
    raise e
  File "/home/nirant/fastai/fastai/basic_train.py", line 89, in fit
    cb_handler=cb_handler, pbar=pbar)
  File "/home/nirant/fastai/fastai/basic_train.py", line 49, in validate
    for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 65, in __iter__
    for i,o in enumerate(self._gen):
  File "/home/nirant/fastai/fastai/basic_data.py", line 47, in __iter__
    for b in self.dl:
  File "/home/nirant/anaconda3/envs/ulmfit/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/nirant/fastai/fastai/text/data.py", line 92, in pad_collate
    return res, tensor([s[1] for s in samples])
  File "/home/nirant/fastai/fastai/torch_core.py", line 68, in tensor
    return torch.tensor(x) if is_listy(x) else as_tensor(x)
RuntimeError: Could not infer dtype of NoneType

This happens at the fit_one_cycle for classification. Is this because of the vocabulary mismatch? Or because we do not update automatically from 2-predicted classes as in imdb to 3-class as in XNLI?
What are other possible points of error which I should check?

from multifit.

sebastianruder avatar sebastianruder commented on August 14, 2024

Another source of error might be that we're not mapping the XNLI labels to int at the moment.

from multifit.

NirantK avatar NirantK commented on August 14, 2024

Would that not fail on first iteration of the classifier if the our mapping is not accepted by the fit_one_cycle?

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

Guys, can anyone have a look at refactoring branch and try to implement a working XNLI that respect recent changes in Fastai, so we get a baseline that we can later try to improve upon?

from multifit.

PiotrCzapla avatar PiotrCzapla commented on August 14, 2024

Let me close that for the time as it is unlikely we will play with XNLI anytime soon

from multifit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.