ictnlp / dialoflow Goto Github PK

Code for ACL 2021 main conference paper "Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances".

License: MIT License

Python 99.87% Shell 0.13%

dialogue-systems dialogue-generation dialogue-evaluation dialogue-pretraining flow-score

dialoflow's People

Contributors

Stargazers

Watchers

Forkers

lynxgsm dumpmemory chateval qroam soundarya98 sunminyu awaller12 gongel chenyangjun45 azizullah2017

dialoflow's Issues

train

Hello, I would like to ask you some questions when I execute generation.py. Could you please provide me with the environmental requirements for model execution?

no modele named '_regex'

First of all, kudos for this nice work. I really liked your work.
I am trying to reproduce the results of your paper. It will be very helpful if you could share the evaluation script for the automated metrics. In the paper, it is written that "We employ the evaluation scripts used by DialoGPT.". Could you please point out the DialoGPT file used for your evaluation.

Question about constructing test set

First of all, thanks for your great work!
I'm trying to run the code and model on the DialyDialog dataset to have a better understanding of your work. But I cant figure out how to construct the input to the model from /data/test.json. Also, the file 'test.refs.txt' which apprears in generate.py is not provided in this repository.
I have tried to construct the input to the model myself from the /data/test.json, but i was confused since i couldn't find multi-reference for every examples.
I wonder whether the code to preprocess /data/test.json could be released?

ModuleNotFoundError: No module named '_regex'

How to reproduce

I reproduce the DialoFlow base in DailyDialog，
the evaluation results are:
NIST: [2.9148, 3.3919, 3.5077, 3.5375]
BLEU: [0.4535, 0.2323, 0.1367, 0.086]
METEOR: 0.1479778034868275
Entropy: [6.250107671407306, 8.663223223839859, 9.603956363262926, 9.959120587252972]
Distinct: [0.08599954617653732, 0.32188216456202917]
avg_len: 9.154005934718102

The results are lower than the results shown in paper.

Can you show the detail of fine-tune in DailyDialog？

My setting is:
Training:
Batch_size 16 (4 GPU , per_gpu_batch_size=4)
gradient_accumulation_steps 1
epoch 50

The best Validation loss is 7.5168, at epoch 34.

generate:
The Config parameters is default，and I set the beam_size=5.

And I did not use the Apex.

Pre-trained model release date?

We are really interested in using DialoFlow for our research on chatbots and their influence on psychological well-being. Our experiments should presumably start in the span of 2-3 weeks. Will the pre-trained DialoFlow model be available by then?

logger/log.out Issue

Hello,
I fellow README about instruction "bash fine-tune.sh" while got error of "fine-tune.sh: line 1: logger/log.out: No such file or directory".

How can I solve it? Thanks for Help!

关于中文版本

作者你好，请问中文版本什么时候会开源出来呢

Cannot access to googledrive.

Hi, Dr. Li,
Thank you for your code.
But I cannot access to googledrive for some reasons.
Could you provide another position to download your pre-trained models, like DialoFlow-base?
Thanks for your reply in advance.

No module named '_regex'

Hello,

I followed the README but got some errors.
I want to use Flow score for the evaluation metric analysis.
when I run the code:
`from flow_score import * MODEL_PATH = "models/DialoFlow_large.bin" FLOW_SCORE = FlowScore(MODEL_PATH) dialogues = ["hello", "Hi there. tell me about yourself.", "Well I'm a college student who loves learning about the world around me!","verry good !"] flow_score = FLOW_SCORE.score(dialogues)`

An error has occurred：ModuleNotFoundError: No module named '_regex'
environment：python==3.7 torch==1.7.0 transformers==3.0.2； pickle==4.0；
I don't know what happen .How do I need to solve the version or the environment problem?

Getting 'nan' flow_score

I am trying to run the code:

from flow_score import *
MODEL_PATH = "models/DialoFlow_large.bin"
FLOW_SCORE = FlowScore(MODEL_PATH)
dialogues = ["hello", "Hi there. tell me about yourself.", "Well I'm a college student who loves learning about the world around me!"]
flow_score = FLOW_SCORE.score(dialogues)

I am using torch==1.7.1 and transformers==3.0.2.

The value of flow_score I get is 'nan' and I am getting a lot of warnings when loading the model:

/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'model.DPKSModel' has changed. Saved a reverse patch to DPKSModel.patch. Run `patch -p0 < DPKSModel.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'transformers.modeling_gpt2.GPT2Model' has changed. Saved a reverse patch to GPT2Model.patch. Run `patch -p0 < GPT2Model.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.sparse.Embedding' has changed. Saved a reverse patch to Embedding.patch. Run `patch -p0 < Embedding.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. Saved a reverse patch to Dropout.patch. Run `patch -p0 < Dropout.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. Saved a reverse patch to ModuleList.patch. Run `patch -p0 < ModuleList.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.normalization.LayerNorm' has changed. Saved a reverse patch to LayerNorm.patch. Run `patch -p0 < LayerNorm.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'model.PlanModel' has changed. Saved a reverse patch to PlanModel.patch. Run `patch -p0 < PlanModel.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.loss.MSELoss' has changed. Saved a reverse patch to MSELoss.patch. Run `patch -p0 < MSELoss.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. Saved a reverse patch to Linear.patch. Run `patch -p0 < Linear.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.activation.Sigmoid' has changed. Saved a reverse patch to Sigmoid.patch. Run `patch -p0 < Sigmoid.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)
/DialoFlow/FlowScore/dialoflow_venv/lib/python3.9/site-packages/torch/serialization.py:658: SourceChangeWarning: source code of class 'torch.nn.modules.loss.CrossEntropyLoss' has changed. Saved a reverse patch to CrossEntropyLoss.patch. Run `patch -p0 < CrossEntropyLoss.patch` to revert your changes.
  warnings.warn(msg, SourceChangeWarning)

How could I fix this and get numerical scores? Could you share the requirements.txt? maybe it is other packages that are causing the issue.

Turn-level DSTC9 annotation data

I noticed there is just dialog-level DSTC9 data used in your work. May I ask if turn-level DSTC9 annotation data is made public? How can I get this data? Thank you very much!

about data tokenizer

hello, i see tokenizer seq in paper is :
[u1] [C] [u2] [C] [res] [C]

but tokenizer in code dataset is :
[speaker1] [u1] [eos] [speaker2] [u2] [eos] [bos] [res] [eos]

Is there any difference between the two? which works best

config.json not found

I got the following error when running the fine-tune.sh:

Traceback (most recent call last):
File "D:\python3.7\lib\site-packages\transformers\configuration_utils.py", line 238, in get_config_dict
local_files_only=local_files_only,
File "D:\python3.7\lib\site-packages\transformers\file_utils.py", line 578, in cached_path
raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file ../models/DialoFlow_base/config.json not found

Could you tell me where is your model's config file? I couldn't find it. Thanks.

Clarification on post-processing generated result

Hi. Kudos for this nice work. I am trying to reproduce the results on DailyDialog dataset. It will be very helpful if you can clarify the following details.
In Issue #13, you mentioned using "nltk.word_tokenize() to tokenize the sentence and then concatenate the tokens" to make the format of the generated dialogue same as the reference response. I have two questions here,

Did you use any post-processing on the reference files?
Did you try only nltk.word_tokenize() or some other tokenizer as well?

It will be very useful if you can briefly mention your post-processing steps.

Unable to reappear?

I hope I can modify the code and README, because there are many errors when running again. Thank you very much

Why PlanModel didn't use mask

     for i, block in enumerate(self.h):
        outputs = block(hidden_states)
        hidden_states, present = outputs[:2]

in this code , planmodel didn't use mask .
it make seq can attention future context

ModuleNotFoundError: No module named 'flow_score'

Hello,
I tried to follow the README, but I receive a ModuleNotFoundError. I downloaded the model from the Google Drive and put it into a custom modelpath. I don't see why this error would appear. Do you perhaps know where the issue stems from?

code

代码中的 empty 是表示论文里面的 C 么？
info 这个特别字符表示的是什么意思呢？

在编码的时候只添加了history,后面的responsez这个后面是没有使用么？

Can't load the model!

Greetings,

Actually I'm surprised that such an error came up, my problem lies with this line

model = torch.load("models/DialoFlow_large/model.bin")

model.bin is placed appropriately, and EC2 works with cuda 11.2 and pytorch = 1.9.

Where would the problem come from?

Thanks in advance

about the Automatic evaluation

First of all, thank you very much for your help. I have encountered a problem and hope you can answer it. I used myself to implement the automatic evaluation indicators, but the results are quite different from those in the paper. Can you please disclose the implementation code of your evaluation indicators?

Nan in README instruction

Hi,

I followed "How to use?" in the README but got strange results.
flow_score is nan.

The only changes is putting these tensors to self.cuda and my environment has no GPU.

DialoFlow/FlowScore/flow_score.py

Lines 99 to 101 in f4a69fd

    
           conv_seq = conv_seq.unsqueeze(0).cuda() 
        
           sentence_index = sentence_index.unsqueeze(0).cuda() 
        
           token_type_seq = token_type_seq.unsqueeze(0).cuda()

Environment: python==3.7 torch==1.7.0 transformers==3.0.2 regex==2017.4.5

Could you please provide some guides to solve this?

model

When will the Chinese version be released?

	conv_seq = conv_seq.unsqueeze(0).cuda()
	sentence_index = sentence_index.unsqueeze(0).cuda()
	token_type_seq = token_type_seq.unsqueeze(0).cuda()