Coder Social home page Coder Social logo

instadeepai / tunbert Goto Github PK

View Code? Open in Web Editor NEW
104.0 9.0 35.0 169 KB

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)

License: MIT License

Python 98.37% Shell 1.63%
nlp bert-models question-answering sentiment-analysis dialect-identification

tunbert's People

Contributors

ak-instadeep avatar ferchichinourchene avatar w3st3ry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tunbert's Issues

Pretraining data sharing

Hello, Thank you for sharing and the models.

I was wondering if you can share the details about the pre-training data. Is it possible de share the data for pre-training?

Thank you in advance.

Links to pre-trained weights are broken

Hello! I'd love to use your model, but the links to the pre-trained weights (both PyTorch and TensorFlow) are showing the following error:

<Error>
  <Code>ProjectNotFound</Code>
  <Message>The requested project was not found.</Message>
  <Details>The requested project was not found.</Details>
</Error>

If you'd like, I can help you get the weights up on the Hugging Face Hub. We have documentation on how to do so, but I'm more than happy to help ๐Ÿ˜„

cc: @osanseviero

The prediction function skips a row when outputting "test_results.tsv" results

After training the module using finetuning_sa_tdid.sh , then running predictions on the test.tsv, the output of the prediction which gets written in test_results.tsv I noticed that the test file and the test_results file don't have same number of rows.

for example the tunbert/dev-data/sentiment_analysis_tsac/test.tsv file that comes with your repo has 5 sentences/rows. When I run the predictions the tunbert/finetuning_tsac/test_results.tsv file outputs only 4 sentences/rows.

my_issue

Update: After spending more time tinkering with the model, It truns out that I didn't spot a minor issue when I was dealing with headers. Sorry for the inconvenience and the false issue. Great work, I really enjoyed it ^^

ModuleNotFoundError: No module named 'nemo'

While running the command python models/bert-nvidia/bert_finetuning_SA_DC.py --config-name "sentiment_analysis_config" model.language_model.lm_checkpoint="/path/to/checkpoints/PretrainingBERTFromText--end.ckpt" model.train_ds.file_path="/path/to/train.tsv" model.validation_ds.file_path="/path/to/valid.tsv" model.test_ds.file_path="/path/to/test.tsv" to fine-tune TunBERT-PyTorch on the SA task, I had the following error
image
I tried to install NeMo several times, but the error still persists.

Number of Steps/Epochs, and Runtime

Could you please provide me with the number of Steps/Epochs, and total run time taken to train your pytorch model? Also, do you have a published/pre-print paper by any chance? Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.