atpaino / deep-text-corrector Goto Github PK
View Code? Open in Web Editor NEWDeep learning models trained to correct input errors in short, message-like text
License: Apache License 2.0
Deep learning models trained to correct input errors in short, message-like text
License: Apache License 2.0
In the example at the end of the README, decode is called with test_path but not train_path. (That makes sense to me.)
However, in correct_text.py main, FLAGS.train_path is still required even for the code path that runs when FLAGS.decode is true.
Should I change the README, or correct_text.py?
The code works on my local environment, while the training is too slow so I move it to Google Colab. Then I got 'Variable proj_w already exists, disallowed. ' while the 4th block of the code executing.
I searched and found that it always uses with tf.variable_scope while using tf.get_variable, then I thought it might be worked if I change tf.get_variable to tf.Varable but it didn't. The error became:
ValueError Traceback (most recent call last)
in ()
----> 1 train(data_reader, train_path, val_path, model_path)
/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in train(data_reader, train_path, test_path, model_path)
145 "Creating %d layers of %d units." % (
146 config.num_layers, config.size))
--> 147 model = create_model(sess, False, model_path, config=config)
148
149 # Read data into buckets and compute their sizes.
/content/drive/My Drive/ColabNotebooks/grammarCorrection/correct_text.py in create_model(session, forward_only, model_path, config)
122 use_lstm=config.use_lstm,
123 forward_only=forward_only,
--> 124 config=config)
125 ckpt = tf.train.get_checkpoint_state(model_path)
126 if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):
/content/drive/My Drive/ColabNotebooks/grammarCorrection/text_corrector_models.py in init(self, source_vocab_size, target_vocab_size, buckets, size, num_layers, max_gradient_norm, batch_size, learning_rate, learning_rate_decay_factor, use_lstm, num_samples, forward_only, config, corrective_tokens_mask)
108 if self.target_vocab_size > num_samples > 0:
109 # w = tf.get_variable("proj_w", [size, self.target_vocab_size])
--> 110 w = tf.Variable([size, self.target_vocab_size], 'proj_w')
111 w_t = tf.transpose(w)
112 # b = tf.get_variable("proj_b", [self.target_vocab_size])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1485 constraint=constraint,
1486 synchronization=synchronization,
-> 1487 aggregation=aggregation)
1488
1489
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
1235 constraint=constraint,
1236 synchronization=synchronization,
-> 1237 aggregation=aggregation)
1238
1239 def _get_partitioned_variable(self,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation)
538 constraint=constraint,
539 synchronization=synchronization,
--> 540 aggregation=aggregation)
541
542 def _get_partitioned_variable(self,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation)
490 constraint=constraint,
491 synchronization=synchronization,
--> 492 aggregation=aggregation)
493
494 # Set trainable value based on synchronization value.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
877 raise ValueError("Variable %s does not exist, or was not created with "
878 "tf.get_variable(). Did you mean to set "
--> 879 "reuse=tf.AUTO_REUSE in VarScope?" % name)
880
881 # Create the tensor to initialize the variable with default value.
ValueError: Variable proj_w does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
I'm still stuck in this error, anyone can help?
Getting this error when i run the preprocessing python itself.
Hi atpaino,
I have run your project,but I cannot get the right result like the examples you give.My result likes below:
input:you must have girlfriend
output:you must have
Could you help me to analysis the reason about it,
thanks a lot
(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# ./predict.sh
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "correct_text.py", line 439, in
tf.app.run()
File "/home/env/python3.5/env-0.12.0/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 414, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/github/deep-text-corrector/data_reader.py", line 32, in init
for tokens in self.read_tokens(train_path):
File "/home/github/deep-text-corrector/text_corrector_data_readers.py", line 114, in read_tokens
with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'train'
run script ?
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode
why???????????????????? but FLAGS.train_path is None
I am getting this error:
In Seq2seq
828 top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
829 for e in encoder_outputs]
--> 830 attention_states = array_ops.concat(1, top_states)
Can someone provide me with a compiled and executable version of the project for i can not compile the file as it shows error of module not found for tensorflow and I need the project urgently?
@atpaino This error occured when running text_corrector_models.py
When i tried running
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
--out_file preprocessed_movie_lines.txt
it gives me error
python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt --out_file preprocessed_movie_lines.txt
/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 24, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 18, in main
s = dialog_line.strip().lower().decode("utf-8", "ignore")
AttributeError: 'str' object has no attribute 'decode'
But this is obvious as each line is string but if i remove decode then it dosen't working.
Hi Alex, thanks for your great work!! I tried executing your main execution file ("textcorrector.ipnyb"), but I keep getting this error message: AttributeError: module 'tensorflow.python.ops.rnn_cell' has no attribute '_linear'. I ran your code using Jupyter Notebook, with Python's 3.5 version (latest), and tensorflow's 1.2.1 version (latest too). I don't understand why it keeps saying certain module lacks of the essential attribute to run your code. Could you please help explain why this happens, Alex?
Hello,
I have an issue
decoded = decode_sentence(sess, model, data_reader, "you must have girlfriend", corrective_tokens=corrective_tokens)
Input: you must have girlfriend
Output: you you you you you you you you you you
Any one has an idea please?
Many thanks
when I run this command python correct_text.py --train_path /movie_dialog_train.txt --val_path /movie_dialog_val.txt \ --config DefaultMovieDialogConfig \ --data_reader_type MovieDialogReader \ --model_path /movie_dialog_model
IOError: [Errno 2] No such file or directory: '/movie_dialog_train.txt'
this error is showing up.
Am I missing something here? I cannot find this text file in Cornell corpus also. I'm trying to build a grammar checker for my project. Can anyone help me with this issue?
def init(self, config, train_path=None, token_to_id=None,
dropout_prob=0.25, replacement_prob=0.25, dataset_copies=2):
super(MovieDialogReader, self).init(
config, train_path=train_path, token_to_id=token_to_id,
special_tokens=[
PAD_TOKEN, GO_TOKEN, EOS_TOKEN,
MovieDialogReader.UNKNOWN_TOKEN],
dataset_copies=dataset_copies)
self.dropout_prob = dropout_prob
self.replacement_prob = replacement_prob
self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]
#last line gives error
#I dont understand where UNKNOWN_ID is coming from and what token_to_id actually is
I tried to play with TextCorrector.ipynb but it doesn't work.
After line
from text_correcter_data_readers import PTBDataReader, MovieDialogReader
I got the next error:
ModuleNotFoundError: No module named 'text_correcter_data_readers'
I tried to fix it to adding a path:
import sys
sys.path.append('C:\\my_path\\deep-text-corrector-master')
And adding an empty __init__.py
file in deep-text-corrector-master' directory.
But it didn't help either.
I have the same problem as here
I changed line 46 to self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])
But still got the error:
44 full_token_and_id = zip(vocabulary, range(len(vocabulary)))
45 self.full_token_to_id = dict(full_token_and_id)
---> 46 self.token_to_id = dict((k, self.full_token_to_id[k]) for k in list(self.full_token_to_id.keys())[:max_vocabulary_size])
47
48 self.id_to_token = {v: k for k, v in self.token_to_id.items()}
TypeError: 'zip' object is not subscriptable
How do I create cleaned_dialog_val.txt.
,cleaned_dialog_test.txt
,this model :dialog_correcter_model_testnltk
Hi, I like your model but I want to know how to train customize word :
like U.S..S.A -> U.S.A
When I run your project ,this error occurs. How to solve this problem?
Traceback (most recent call last):
File "correct_text.py", line 438, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/opt/yangzhanku/correct_text/deep-text-corrector-master/text_corrector_data_readers.py", line 88, in init
self.UNKNOWN_ID = self.token_to_id[MovieDialogReader.UNKNOWN_TOKEN]
KeyError: 'UNK'
Have run it for 30K steps, but I am not getting a corrected output. I get the same output as whats fed into the input.
Input : this is table
Output : this is table
I am expecting it to insert the article and give me "this is a table"
How many more steps should I run it for ?
I am getting this error when I try to run data_reader.
"TypeError: 'zip' object is not subscriptable"
When i tried running
python correct_text.py --train_path /movie_dialog_train.txt
--val_path /movie_dialog_val.txt
--config DefaultMovieDialogConfig
--data_reader_type MovieDialogReader
--model_path /movie_dialog_model
it gives me error
File "correct_text.py", line 438, in
tf.app.run()
File "/home/abhinavsingh/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "correct_text.py", line 413, in main
data_reader = MovieDialogReader(config, FLAGS.train_path)
File "/home/abhinavsingh/deep-text-corrector-master/text_corrector_data_readers.py", line 82, in init
dataset_copies=dataset_copies)
File "/home/abhinavsingh/deep-text-corrector-master/data_reader.py", line 46, in init
self.token_to_id = dict(full_token_and_id[:max_vocabulary_size])
TypeError: 'zip' object is not subscriptable
(env-0.12.0) root@op-System-Product-Name:/home/github/deep-text-corrector# cat predict.sh
python correct_text.py --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode
may I run "python correct_text.py --train_path ./movie_dialog_train.txt --test_path ./test.txt --config DefaultMovieDialogConfig --data_reader_type MovieDialogReader --model_path ./movie_dialog_model --decode"????
add -train_path ./movie_dialog_train.txt
????
I trained the model as specified in the readme but cannot replicate the results. The following is what I get.
Input: you must have girlfriend
Output: than than than than than than than than than than
Is this because of the training/dataset?
when i m trying to run python preprocessors/preprocess_movie_dialogs.py --raw_data movie_lines.txt
it gives me an error like file not found -
Traceback (most recent call last):
File "preprocessors/preprocess_movie_dialogs.py", line 23, in
tf.app.run()
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Users\Sarve\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "preprocessors/preprocess_movie_dialogs.py", line 14, in main
open(FLAGS.out_file, "w") as out:
FileNotFoundError: [Errno 2] No such file or directory: ''
can u explain me how to run this?
Can someone tell me how to run this project???
I noticed that the code lowers in preproc.
https://github.com/atpaino/deep-text-corrector/search?utf8=%E2%9C%93&q=lower%28%29&type=
Because of this:
Did you try it without lowering at first, and there were problems?
(My instinct would be to avoid canonicalisation, and fight the out-of-dataset tokens with data.)
Hi
I tried running this code with multiple tensorflow versions (1.13, 1.1, 0.12) but it keeps giving some error or the other, specifically related to rnn_cell. (cannot import name rnn_cell). Even if I resolve it using contrib package, then I keep getting subsequent errors.
Can someone please tell me which version of tensorflow does this code work with without any errors?
Also, does it work with a specific version of python as well?
Thanks
Aayushee
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.