ufal-dsg / tgen Goto Github PK

View Code? Open in Web Editor NEW

204.0 15.0 62.0 1.17 MB

Statistical NLG for spoken dialogue systems

License: Other

Makefile 9.39% Python 70.73% Perl 19.84% Shell 0.05%

natural-language-generation dialogue seq2seq python tgen dialogue-systems seq2seq-generation computational-linguistics

tgen's People

Contributors

Stargazers

Watchers

Forkers

fooyou xiongshufeng chagge binbinbian qjay612 johndpope tonydeep sudanenator chengniu anupamaray chandreshiit changhc ml-lab colinsongf yuhanchengo mixcoder onpoeet yiyepiaoling0715 zxsted xiaoduozhou beroth shubhangi-tandon linhuaiyi hyokenmi jderiu youlei5898 dmhowcroft yespon shubhampachori12110095 apoorvchandurkar yylong711 leotilli henry-e lucky7323 bowendoctor will-holden nastya11181 databill86 shexianron2016 dragomirradev rivamarco annebeyer daywatch arita37 stg880631 yjxiao cmry jongwon-jay-lee christinataft worldie-com sharonno computational-linguistics-research zskyx nepiskopos schneider20 mrmikeyc samlukka canhlocphan xukai816 sumitsrv harel-coffee

tgen's Issues

Allow lexicalization using lexicalized DAs

Abstraction files shouldn't be the only way

ValueError: need more than 1 value to unpack

Hello,

I had the experiment sfx-restaurant trained with the line:

./run_tgen.py seq2seq_train sfx-restaurant/config/seq2seq.py sfx-restaurant/input/data-das.txt sfx-restaurant/input/data-text.txt model.pickle.gz

and it succeeded.

The problem raised on running the model with:

./run_tgen.py seq2seq_gen -w out.txt model.pickle.gz sfx-restaurant/input/data-text.txt

I tried also without -w out.txt but the outcome is identical:

Mon Jul 17 09:52:06 2017 INFO: Running on CPython version 2.7.12
Mon Jul 17 09:52:06 2017 INFO: Loading generator from model.pickle.gz...
Mon Jul 17 09:52:06 2017 INFO: Loading reranker from model.tftreecl.pickle.gz...
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Mon Jul 17 09:52:11 2017 INFO: Loading lexicalizer from model.lexic.pickle.gz...
Mon Jul 17 09:52:24 2017 INFO: Loading saved TF session from /home/gfrison/ws/tgen/model.tfsess...
Traceback (most recent call last):
  File "./run_tgen.py", line 593, in <module>
    seq2seq_gen(args)
  File "./run_tgen.py", line 466, in seq2seq_gen
    das = read_das(args.da_test_file)
  File "/home/gfrison/ws/tgen/tgen/futil.py", line 47, in read_das
    da = DA.parse(line.strip())
  File "/home/gfrison/ws/tgen/tgen/data.py", line 131, in parse
    da.append(DAI.parse(dai_text + ')'))
  File "/home/gfrison/ws/tgen/tgen/data.py", line 66, in parse
    da_type, svp = dai_text[:-1].split('(', 1)
ValueError: need more than 1 value to unpack

What did I do wrong?

Thank you,

E2E NLG Challenge

Hi @tuetschek
The baseline outputs provided here are not aligned with the devset.
I think that there is some random seed which shuffles the data when we pass -m flag to input/convert.py.
Could you provide devel-*.txt/sgm for the unshuffled devset so that we could compare our results over the baseline (or provide outputs.txt for unshuffled data).

Installation with PyTreex

Found not to work properly, the culprit seems to be missing version info in PyTreex

requirements.txt needs to includ pandas and future

the convert.py scripts for the e2e-challenge data require the pandas and future libraries to be installed

python 3.7 is not work

import cPickle as pickle
ModuleNotFoundError: No module named 'cPickle'

Remove the num_hidden_units setting

It doesn't have any real usage anymore.

TensorFlow 0.7+ compatibility

The current version is apparently only compatible with 0.6

Regarding SFX restaraunt data

As u have mentioned in the readme file and provided link for the dataset but the data i obtained is not in format of sfxrestaurant.emnlp.json it is been given like train+test+dev.json how we gonna split the dataset and turn them into working.

How to run SeqtoSeq trian?

I run as ./run_tgen.py seq2seq_train -d ./debug1 -w ./workdir -e 10 config train-das train-trees seq2seq-model
get error:IOError: [Errno 2] No such file or directory: 'config'
I install pytreex but no treex

Fix context-aware encoder with TF1.0.1

It looks like there are some bugs left.

Allow empty lines in training files

Empty lines or comments are probably reasonable.

Add warnings for using Treex

Morce::English is very hard to install

mention required python version (3.7) in README

Tensorflow v1 is not compatible with python 3.8+

sfx-restaurant step by step

Hello,
can you please give a short step by step tutorial on how to use the sfx-restaurant?
I am not sure if I have done the preprocessing and trainig right, because I geht the error:

NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint.
Key rerank-/rnn/embedding_wrapper/basic_lstm_cell/bias not found in checkpoint

Can this be caused by not having installed Treex?

Using TreeClassif as a stopping criterion

It does not seem to work, classifying everything as covering the current DA

Fix dependencies

Add requirements.txt, add kenlm

Control workers' RAM in parallel processing

4G is not big enough for larger data

Reduce ranker size in parallel processing

There is no way the ranker should have 900MB; why does it take ages to transfer?

How to run the program?

Hi. I'm trying to run the run_tgen.py script. For the candgen_train() function, there are these 3 parameters:
fname_da_train, fname_ttrees_train, fname_cand_model

Is the correct file for fname_da_train, bagel-data/input/toy-das.txt which is the dialogue acts?
And I guess the fname_ttrees_train refers to the deep-syntactic dependency trees from the input meaning representation. It seems like it needs a pickle or yaml file, but I'm not sure which is the exact file it is referring to.

Thanks!

Type in Makefile

In line 64 of bagel-data/input, please fix
mv $$CV/train.0-bagel.txt > $$CV/train-bagel.txt ; \

to
mv $$CV/train.0-bagel.txt $$CV/train-bagel.txt ; \

about the generate text, How to distinguish that which X is which slot

example:
generate text is:
Take X line X from X to X at X .
da is:
inform(line=X-line)&inform(direction=X-direction)&inform(from_stop=X-from_stop)&inform(departu re_time=X-departure_time)&inform(vehicle=X-vehicle)

how can i distinguish that which x is which slot?

Allow lexicalizer with ensembles

Training ensembles does not support lexicalizer yet

Add more helpful help

There should be a simple tutorial, and the command itself should provide more information on how to proceed

Improve delexicalization

better fuzzy matching (~ allow 1 typo)
allow for repeated values

Use DA and Abst structures consistently

Modify the convert utils of all datasets + all code that uses DAs to use the newly created classes. Remove dependency on Alex.

Does not work without validation

Fails if you turn off validation data

Measuring slot error seems to be off with contexts + tokens?

Slot error is always 1:

Thu Mar 30 20:03:11 2017 INFO: Slot error: 1.000000 (M: 17, S: 0, T: 17)

checkpoint at every epoch

Hello,
I'm working on the E2E Challenge dataset and I'm wondering if there is a way to create a checkpoint of the system at every epoch during training. This way it will be possible to calculate the metrics after every epoch.

Thanks

How to produce output on the test dataset while doing in e2e-challenge dataset

Does not work with generating texts?

Sun Mar 26 17:09:38 2017 INFO: Loading saved TF session from model.tfsess...
Sun Mar 26 17:09:47 2017 INFO: Generating...
Sun Mar 26 17:13:09 2017 INFO: Generated tree 100
Sun Mar 26 17:16:07 2017 INFO: Generated tree 200
Sun Mar 26 17:19:08 2017 INFO: Generated tree 300
Sun Mar 26 17:21:12 2017 INFO: Slot error: 1.000000 (M: 691, S: 0, T: 691)
Sun Mar 26 17:21:12 2017 INFO: Writing output...
Traceback (most recent call last):
  File "../run_tgen.py", line 592, in <module>
    seq2seq_gen(args)
  File "../run_tgen.py", line 513, in seq2seq_gen
    postprocess_tokens(gen_toks, das)
  File "/home/gila/anaconda2/lib/python2.7/tgen-ACL2016/tgen/futil.py", line 249, in postprocess_tokens
    final_punct = '?' if da[0].da_type[0] == '?' else '.'  # '?' for '?request...'
AttributeError: 'list' object has no attribute 'da_type'

sfx-restaurant test output problem

hello,
I'm trying to train and test your model on the sfx-restaurant dataset.
I'm following your short HOWTO to train the model and test it on development data.
Unfortunately I'm not able to get lexicalized text in the output.txt file.
If I run ../run_tgen.py seq2seq_gen --eval-file data/devel-ref.txt --abstr-file data/devel-abst.txt --output-file output.txt model.pickle.gz data/devel-das.txt I get this results on terminal:

The output file shows results like:

can i confirm you are seeking option near X-near ?
the name of the X-type is X-name .

I think I'm missing some stage in the training/test pipeline, but I don't know exactly what.

In seq2seq_train mode, what is rerank_classifier function?

when I run this command:
./run_tgen.py seq2seq_train config-file.py train-das.txt train-text.txt model.pickle.gz
and then run this command to generate text:
./run_tgen.py seq2seq_gen [-w out-text.txt] model.pickle.gz test-das.txt

I only watch the reranker's training, but i don't see seq2seq model use it in both training and generating.
Is it? if not, could you tell me about rerank's function?
thanks.

Will GPU speed up seq2seq in e2e-challenge?

Hi, i'm trying to run (in order to understand how it works) the e2e-challenge as described here https://github.com/UFAL-DSG/tgen/tree/master/e2e-challenge

All works but of course it takes time.

I would know if a configuration with an Nvidia GPU will increase the performance (among a CPU-only configuration) and reduce the time or it will be useless. I'm a beginner, i was reading that maybe LSTM doesn't benefit of GPU, so that's why I'm asking.

Thank you!