nusnlp / crosentgec Goto Github PK

View Code? Open in Web Editor NEW

50.0 50.0 12.0 2.44 MB

Code for cross-sentence grammatical error correction using multilayer convolutional seq2seq models (ACL 2019)

License: GNU General Public License v3.0

Shell 3.81% Python 95.17% C++ 0.46% Lua 0.57%

crosentgec's People

Contributors

Stargazers

Watchers

Forkers

edwardzh drbugkiller ptolemyre xiaoshengjun nonva bfsujason marcelrobeer pidugusundeep dut-liuyang kiminh shamilcm michaelcaohn

crosentgec's Issues

language mode

The language model is too big, so I don't want to use it, so what adjustments do I need to make for the weight file?

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317

❓ Questions and Help

Hi Shamil Chollampatt, Weiqi Wang, and Hwee Tou Ng

I'm referring to the CrossSent paper and trying out this approach. When I run the code for a smaller dataset it ran perfectly but when I increase my dataset by 10 times. I'm getting the following error.

pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "fairseq/train.py", line 352, in <module>
    multiprocessing_main(args)
  File "fairseq/multiprocessing_train.py", line 40, in main
    p.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "fairseq/multiprocessing_train.py", line 82, in signal_handler
    raise Exception(msg)
Exception:

-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "fairseq/multiprocessing_train.py", line 46, in run
    single_process_main(args)
  File "fairseq/train.py", line 87, in main
    train(args, trainer, task, epoch_itr)
  File "fairseq/train.py", line 125, in train
    log_output = trainer.train_step(sample, update_params=True)
  File "fairseq/fairseq/trainer.py", line 117, in train_step
    loss, sample_size, logging_output, oom_fwd = self._forward(sample)
  File "fairseq/fairseq/trainer.py", line 205, in _forward
    raise e
  File "fairseq/fairseq/trainer.py", line 197, in _forward
    loss, sample_size, logging_output_ = self.task.get_loss(self.model, self.criterion, sample)
  File "fairseq/fairseq/tasks/fairseq_task.py", line 49, in get_loss
    return criterion(model, sample)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
    net_output = model(**sample['net_input'])
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/models/fairseq_model.py", line 146, in forward
    auxencoder_out = self.auxencoder(ctx_tokens, ctx_lengths)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 193, in forward
    if not encoder_padding_mask.any():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317

What have you tried?

As mentioned in a few GitHub issues and PyTorch forums questions, I have run my code using CUDA_LAUNCH_BLOCKING=1 and the following is my error log

Traceback (most recent call last):                                                                                                                      
File "/fairseq/train.py", line 352, in <module>
    multiprocessing_main(args)
  File "/fairseq/multiprocessing_train.py", line 40, in main
    p.join()
  File "/opt/anaconda/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/fairseq/multiprocessing_train.py", line 82, in signal_handler
    raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "/fairseq/multiprocessing_train.py", line 46, in run
    single_process_main(args)
  File "/fairseq/train.py", line 35, in main
    load_dataset_splits(args, task, ['train', 'valid'])
  File "/fairseq/train.py", line 333, in load_dataset_splits
    task.load_dataset(split_k)
  File "/fairseq/fairseq/tasks/translation_ctx.py", line 105, in load_dataset
    ctx_dataset = indexed_dataset(prefix + 'ctx', self.ctx_dict)
  File "/fairseq/fairseq/tasks/translation_ctx.py", line 98, in indexed_dataset
    return IndexedRawTextDataset(path, dictionary)
  File "/fairseq/fairseq/data/indexed_dataset.py", line 130, in __init__
    self.read_data(path, dictionary)
  File "/fairseq/fairseq/data/indexed_dataset.py", line 136, in read_data
    self.lines.append(line.strip('\n'))
MemoryError

Question/Help?

According to me if there is memory constrain then CUDA should throw out of memory error and not this error. Based on the reading, I came to know cuda runtime error (59) : device-side asset error triggered due to out-of-bound issue or due to faulty loss function. This shouldn't be a case here because the entire code is running smoothly for the smaller dataset and failing to process large datasets. hence, I'm putting this question here for getting further help.

Is there anything that I need to check in order to resolve this issue or something am I missing?

What's your environment?

fairseq Version (e.g., 1.0 or master): 0.5
PyTorch Version (e.g., 1.0) : 0.4.1
OS (e.g., Linux): Ubutu 18.04
How you installed fairseq (pip, source): pip
Build command you used (if compiling from source): NA
Python version: 3.6
CUDA/cuDNN version: 9.2
GPU models and configuration: 8 GPUs (V100)

Any help, support, and direction is highly appreciable

Thanks

How to prepare a new testing dataset?

Is there any particular format to create that?
Is there any prewritten script that can be used?
How do I create a document-level context for any document?

Pretrained Models not found

GPU environment

Do we need to install Cuda, is version 10 works well with trainned model?

Thanks.

Multiprocessing error when attempting to run training on multiple GPUs

Hi there,

When training the Cross Sentence GEC model on 1 GPU, it proceeds successfully. As soon as I try to use more than 1 GPU, I get the following error:

Traceback (most recent call last):
File "/usr/lib/python3.6/pdb.py", line 1667, in main
pdb._runscript(mainpyfile)
File "/usr/lib/python3.6/pdb.py", line 1548, in _runscript
self.run(statement)
File "/usr/lib/python3.6/bdb.py", line 434, in run
exec(cmd, globals, locals)
File "", line 1, in
File "/mnt/efs/crossSentGEC/fairseq/train.py", line 9, in
import collections
File "/mnt/efs/crossSentGEC/fairseq/multiprocessing_train.py", line 37, in main
procs[i].start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/lib/python3.6/multiprocessing/spawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'

I am pretty sure that the underlying fairseq and Python multiprocessing modules are operational as I ran some sanity checks.

Any suggestions regarding how to solve/bypass this error please? And how I can run this with multi-GPUs?

Thank you all!

Decode development/test sets issues

./decode.sh conll13st-test models/bpe/mlconvgec_aaai18_bpe.model models/dicts
This is the error

Traceback (most recent call last):
  File "fairseq/interactive_multi.py", line 195, in <module>
    main(args)
  File "fairseq/interactive_multi.py", line 102, in main
    models, model_args = utils.load_ensemble_for_inference(model_paths, task)
  File "/home/liferay172/Documents/SundeepPidugu/crosentgec/fairseq/fairseq/utils.py", line 153, in load_ensemble_for_inference
    state = torch.load(filename, map_location=lambda s, l: default_restore_location(s, 'cpu'))
  File "/home/liferay172/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 426, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/liferay172/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 603, in _load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: could not find MARK

Please let me know if iam passing the parameters correctly and providing an example is much appreciated.

Traning unable to detect GPU

Tried using p2.xlarge to run the training and it was not able to identify the GPU and always shows me can't run on the CPU machine, Even tried using the $optionalgpu and no luck

How do I get to fix this?

Embedded model error message

I received an error as the following when ruuning decoder: "./decode.sh conll13st-test models/crosent/model1 models/dicts 1"
++ CUDA_VISIBLE_DEVICES=1
++ python fairseq/interactive_multi.py --no-progress-bar --path models/crosent/model1/checkpoint_best.pt --beam 12 --nbest 12 --replace-unk --source-lang src --target-lang trg --input-files models/crosent/model1/outputs/tmp.conll13st-test.1565940766/input.src models/crosent/model1/outputs/tmp.conll13st-test.1565940766/input.ctx --num-shards 12 --task translation_ctx models/dicts
Traceback (most recent call last):
File "fairseq/interactive_multi.py", line 195, in
main(args)
File "fairseq/interactive_multi.py", line 102, in main
models, model_args = utils.load_ensemble_for_inference(model_paths, task)
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 163, in load_ensemble_for_inference
model = task.build_model(state['args'])
File "/home/hdeng/nsu/fairseq/fairseq/tasks/fairseq_task.py", line 43, in build_model
return models.build_model(args, self)
File "/home/hdeng/nsu/fairseq/fairseq/models/init.py", line 25, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/hdeng/nsu/fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 76, in build_model
encoder_embed_dict = utils.parse_embedding(args.encoder_embed_path)
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 267, in parse_embedding
embed_dict[pieces[0]] = torch.Tensor([float(weight) for weight in pieces[1:]])
File "/home/hdeng/nsu/fairseq/fairseq/utils.py", line 267, in
embed_dict[pieces[0]] = torch.Tensor([float(weight) for weight in pieces[1:]])
ValueError: could not convert string to float: 'Not'

Environment setup for trained model

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.