nusnlp / crosentgec Goto Github PK
View Code? Open in Web Editor NEWCode for cross-sentence grammatical error correction using multilayer convolutional seq2seq models (ACL 2019)
License: GNU General Public License v3.0
Code for cross-sentence grammatical error correction using multilayer convolutional seq2seq models (ACL 2019)
License: GNU General Public License v3.0
The language model is too big, so I don't want to use it, so what adjustments do I need to make for the weight file?
Hi Shamil Chollampatt, Weiqi Wang, and Hwee Tou Ng
I'm referring to the CrossSent paper and trying out this approach. When I run the code for a smaller dataset it ran perfectly but when I increase my dataset by 10 times. I'm getting the following error.
pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
File "fairseq/train.py", line 352, in <module>
multiprocessing_main(args)
File "fairseq/multiprocessing_train.py", line 40, in main
p.join()
File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
res = self._popen.wait(timeout)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "fairseq/multiprocessing_train.py", line 82, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --
Traceback (most recent call last):
File "fairseq/multiprocessing_train.py", line 46, in run
single_process_main(args)
File "fairseq/train.py", line 87, in main
train(args, trainer, task, epoch_itr)
File "fairseq/train.py", line 125, in train
log_output = trainer.train_step(sample, update_params=True)
File "fairseq/fairseq/trainer.py", line 117, in train_step
loss, sample_size, logging_output, oom_fwd = self._forward(sample)
File "fairseq/fairseq/trainer.py", line 205, in _forward
raise e
File "fairseq/fairseq/trainer.py", line 197, in _forward
loss, sample_size, logging_output_ = self.task.get_loss(self.model, self.criterion, sample)
File "fairseq/fairseq/tasks/fairseq_task.py", line 49, in get_loss
return criterion(model, sample)
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
net_output = model(**sample['net_input'])
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/models/fairseq_model.py", line 146, in forward
auxencoder_out = self.auxencoder(ctx_tokens, ctx_lengths)
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 193, in forward
if not encoder_padding_mask.any():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317
As mentioned in a few GitHub issues and PyTorch forums questions, I have run my code using CUDA_LAUNCH_BLOCKING=1
and the following is my error log
Traceback (most recent call last):
File "/fairseq/train.py", line 352, in <module>
multiprocessing_main(args)
File "/fairseq/multiprocessing_train.py", line 40, in main
p.join()
File "/opt/anaconda/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/fairseq/multiprocessing_train.py", line 82, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --
Traceback (most recent call last):
File "/fairseq/multiprocessing_train.py", line 46, in run
single_process_main(args)
File "/fairseq/train.py", line 35, in main
load_dataset_splits(args, task, ['train', 'valid'])
File "/fairseq/train.py", line 333, in load_dataset_splits
task.load_dataset(split_k)
File "/fairseq/fairseq/tasks/translation_ctx.py", line 105, in load_dataset
ctx_dataset = indexed_dataset(prefix + 'ctx', self.ctx_dict)
File "/fairseq/fairseq/tasks/translation_ctx.py", line 98, in indexed_dataset
return IndexedRawTextDataset(path, dictionary)
File "/fairseq/fairseq/data/indexed_dataset.py", line 130, in __init__
self.read_data(path, dictionary)
File "/fairseq/fairseq/data/indexed_dataset.py", line 136, in read_data
self.lines.append(line.strip('\n'))
MemoryError
According to me if there is memory constrain then CUDA should throw out of memory error and not this error. Based on the reading, I came to know cuda runtime error (59) : device-side asset
error triggered due to out-of-bound issue or due to faulty loss function. This shouldn't be a case here because the entire code is running smoothly for the smaller dataset and failing to process large datasets. hence, I'm putting this question here for getting further help.
Is there anything that I need to check in order to resolve this issue or something am I missing?
pip
, source): pipAny help, support, and direction is highly appreciable
Thanks
Do we need to install Cuda, is version 10 works well with trainned model?
Thanks.
Hi there,
When training the Cross Sentence GEC model on 1 GPU, it proceeds successfully. As soon as I try to use more than 1 GPU, I get the following error:
Traceback (most recent call last):
File "/usr/lib/python3.6/pdb.py", line 1667, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python3.6/pdb.py", line 1548, in _runscript
self.run(statement)
File "/usr/lib/python3.6/bdb.py", line 434, in run
exec(cmd, globals, locals)
File "", line 1, in
File "/mnt/efs/crossSentGEC/fairseq/train.py", line 9, in
import collections
File "/mnt/efs/crossSentGEC/fairseq/multiprocessing_train.py", line 37, in main
procs[i].start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'
I am pretty sure that the underlying fairseq and Python multiprocessing modules are operational as I ran some sanity checks.
Any suggestions regarding how to solve/bypass this error please? And how I can run this with multi-GPUs?
Thank you all!
./decode.sh conll13st-test models/bpe/mlconvgec_aaai18_bpe.model models/dicts
This is the error
Traceback (most recent call last):
File "fairseq/interactive_multi.py", line 195, in <module>
main(args)
File "fairseq/interactive_multi.py", line 102, in main
models, model_args = utils.load_ensemble_for_inference(model_paths, task)
File "/home/liferay172/Documents/SundeepPidugu/crosentgec/fairseq/fairseq/utils.py", line 153, in load_ensemble_for_inference
state = torch.load(filename, map_location=lambda s, l: default_restore_location(s, 'cpu'))
File "/home/liferay172/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/liferay172/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 603, in _load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK
Please let me know if iam passing the parameters correctly and providing an example is much appreciated.
I received an error as the following when ruuning decoder: "./decode.sh conll13st-test models/crosent/model1 models/dicts 1"
++ CUDA_VISIBLE_DEVICES=1
++ python fairseq/interactive_multi.py --no-progress-bar --path models/crosent/model1/checkpoint_best.pt --beam 12 --nbest 12 --replace-unk --source-lang src --target-lang trg --input-files models/crosent/model1/outputs/tmp.conll13st-test.1565940766/input.src models/crosent/model1/outputs/tmp.conll13st-test.1565940766/input.ctx --num-shards 12 --task translation_ctx models/dicts
Traceback (most recent call last):
File "fairseq/interactive_multi.py", line 195, in
main(args)
File "fairseq/interactive_multi.py", line 102, in main
models, model_args = utils.load_ensemble_for_inference(model_paths, task)
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 163, in load_ensemble_for_inference
model = task.build_model(state['args'])
File "/home/hdeng/nsu/fairseq/fairseq/tasks/fairseq_task.py", line 43, in build_model
return models.build_model(args, self)
File "/home/hdeng/nsu/fairseq/fairseq/models/init.py", line 25, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/hdeng/nsu/fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 76, in build_model
encoder_embed_dict = utils.parse_embedding(args.encoder_embed_path)
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 267, in parse_embedding
embed_dict[pieces[0]] = torch.Tensor([float(weight) for weight in pieces[1:]])
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 267, in
embed_dict[pieces[0]] = torch.Tensor([float(weight) for weight in pieces[1:]])
ValueError: could not convert string to float: 'Not'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.