aub-mind / arabic-empathetic-chatbot Goto Github PK

View Code? Open in Web Editor NEW

55.0 55.0 12.0 5.21 MB

Seq2Seq-based open domain empathetic conversational model for Arabic: Dataset & Model

Python 14.54% Jupyter Notebook 85.46%

arabic-empathetic-chatbot's People

Contributors

Stargazers

Watchers

Forkers

tareknaous himataha medoredo ahmedsoror mma1979 mohsenshamas hossamhasanin smalsenan devedtara assaadhalabi ghada1997 kershrita

arabic-empathetic-chatbot's Issues

cannot reproduce the test result

Thanks for your great work!

I have difficulty in reproducing the bleu score on the test set for the BERT2BERT model, which should be 5.58 according to your paper:

what I have done is:

I downloaded your pretrained BERT2BERT model from huggingface and then did the prediction on the test set. I also tried to tune the top_k, top_p and other args during generation, according to your instruction. The highest bleu score I could get on the test set is 0.91:

I also tried to train the model from scratch, using the train-bert2bert.ipynb file, however, the highest bleu score I could get is less than 2.

As you can see, it is far from 5.58. So could you please share with us the prediction code, as well as the values of all args, hyper-parameters (e.g. top_k, top_p, temperature, length_penalty...) that could reproduce your test result?

Thanks a lot!

google colab

can u please tell me the steps to run the code on colab ?

Error in BLUE calculation

Hi,

Thanks for sharing the code and models for your great work!!!

I tried to reproduce the BLUE scores using your code but could not get the 5% BLUE score you reported in the paper.
Based on your code and experiments we did internally, it seems that you calculate the bleu score on the segmented text (with farasa segment). This roughly gives a BLUE score which is x2.5 times larger than what it should be. We found that in the best case (after extensive hyperparameters tuning (>40 experiments) the best BLUE score is around 1.6-1.9.

The bug is in the compute_metrics method:

def compute_metrics(pred):
  labels_ids = pred.label_ids
  #pred_ids = torch.argmax(pred.predictions,dim=2)
  pred_ids = pred.predictions  

  # all unnecessary tokens are removed
  pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
  labels_ids[labels_ids == -100] = tokenizer.pad_token_id
  label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)

#########################################################
### You should add these lines to calculate the BLUE score correctly  ######
#########################################################
  response = response.replace("\'", '')
  response = response.replace("[[CLS]", '')
  response = response.replace("[SEP]]", '')
  response = str(arabert_prep.desegment(response)) # this is the most important one
########################################


  return {"bleu": round(corpus_bleu(pred_str , [label_str]).score, 4)}

Here is an example of how BLUE scores vary between the same segmented vs. desegmented text:

t1 = "أنا متأكد من أنها ستكون بخير."
ppt1 = arabert_prep.preprocess(t1)
print(ppt1) # أنا متأكد من أن +ها س+ تكون ب+ خير .

t2 = "أنا متأكد ستكون بخير."
ppt2 = arabert_prep.preprocess(t2)
print(ppt2) # أنا متأكد س+ تكون ب+ خير .


print("Wrong BLUE on segmenented text", sacrebleu.sentence_bleu(ppt2, [ppt1]).score) # 51.51425457345961
print("Correct BLUE on unsegmenented text", sacrebleu.sentence_bleu(t2, [t1]).score) # 33.51600230178196

Thanks,

Epoch Training Loss Validation Loss

First, I would like to thank you for the dataset. It did help me a lot in my thesis, good work. Second, when I run the model on Colab it crashes with output Your session crashed after using all available RAM .

I would like to know if the model requires specific computational specs since it is not mentioned.

Best Regards.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.