cofe-ai / fast-gector Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 10.0 763 KB

License: Apache License 2.0

Python 97.60% Shell 2.40%

fast-gector's People

Contributors

Stargazers

Watchers

Forkers

aneeshbhat23 junumoon jus1mple fangtao-123 kent0304 duongvutuanminh damien2012eng ksteimel peter-sk rynarzzz

fast-gector's Issues

Could not use GPU for training

Hi,

I'm having trouble with training using an RTX 3090. No matter what I tried, it doesn't seem that training is happening on the GPU.

When I start training, these is a log saying it's using GPU: setup device: cuda:0, but GPU memory usage does not change at all, and CPU usage is very high on 1 core.

I'm using conda with Python 3.7, torch 0.11.0, which I setup with this command:

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

Am I missing anything? Please help me if possible. Thank you!

special_tokens_fix

parameter special_tokens_fix is specially for roberta? if use bert model, should set 0?

Run inference on a GPU EC2 instance

Hello Jason,

I am using the fine-tuned Roberta model from the GECToR repo to run the inference over a sentences file. I am able to make predictions using the original GECToR. However, I am having issues with the fast-gector.

This is what I have for the predict.sh:

#!/bin/bash
mkdir result
deepspeed --include localhost:0 --master_port 42991 predict.py
--batch_size 256
--iteration_count 5
--min_len 3
--max_len 128
--min_error_probability 0.0
--additional_confidence 0.0
--sub_token_mode "average"
--max_pieces_per_token 5
--model_dir "/home/ec2-user/fast-gector"
--ckpt_id "roberta_1_gectorv2.th"
--detect_vocab_path "./data/vocabulary/d_tags.txt"
--correct_vocab_path "./data/vocabulary/labels.txt"
--pretrained_transformer_path "roberta-base"
--input_path "sentences_sample_100.txt"
--out_path "result/sentences_sample_100.preds"
--special_tokens_fix 1
--detokenize 1
--deepspeed_config "./configs/ds_config_zero1.json"

The error message has been listed below. It looks like the deepspeed is looking for pt files under the model directory, how can I use the fine-tuned model to make predictions in this case?

Traceback (most recent call last):
File "/home/ec2-user/fast-gector/predict.py", line 101, in
main(args)
File "/home/ec2-user/fast-gector/predict.py", line 39, in main
predictor = Predictor(args)
File "/home/ec2-user/fast-gector/src/predictor.py", line 38, in init
self.model = self.init_model(args)
File "/home/ec2-user/fast-gector/src/predictor.py", line 59, in init_model
ds_engine.load_checkpoint(args.model_dir, args.ckpt_id)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2759, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2793, in _load_checkpoint
sd_loader = SDLoaderFactory.get_sd_loader(
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 44, in get_sd_loader
return MegatronSDLoader(ckpt_list, version, checkpoint_engine)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 216, in init
super().init(ckpt_list, version, checkpoint_engine)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 56, in init
self.check_ckpt_list()
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 179, in check_ckpt_list
assert len(self.ckpt_list) > 0
AssertionError

Run inference on cpu

Hey Jason,

Thanks for making this AllenNLP-free Gector library.
I am trying to run the inference on a cpu machine, and I received an error message saying no gpu resources available from the deepspeed.
Does it mean that I have to remove the deepspeed dependency in the codebase if I prefer to run on the cpu only?

how to export onnx

how to export onnx inference

Reproduced results

Hello. What are your reproduced results on the test sets?

Error in inference

0it [00:00, ?it/s]/home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from c10_cuda_check_implementation at /home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/c10/cuda/CUDAException.cpp:31 (most recent call first):

Train the model

@Jason3900
Hi, first thankyou for removing allennlp dependency from GECToR because i had problem for using that.
Actually I started to run this your version but had some problem and questions if you answer me I would be thankful.

I didn't used CUDA and NVIDIA. Does it make problem in running. I should say I'm running the code in colab and using T4 gpu.
also I decided to just use part of the synthetic data to just see if it works but it is still about 2 hours and still running. Is it normal?
After it finished it gave this error:

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2330722it [1:57:15, 26.72s/it][2023-10-25 16:50:16,631] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 6518
[2023-10-25 16:50:16,632] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--deepspeed', '--deepspeed_config', 'configs/ds_config_zero1_fp16.json', '--num_epochs', '10', '--max_num_tokens', '128', '--valid_batch_size', '256', '--cold_step_count', '0', '--warmup', '0.1', '--cold_lr', '1e-3', '--skip_correct', '0', '--skip_complex', '0', '--sub_token_mode', 'average', '--special_tokens_fix', '1', '--unk2keep', '0', '--tp_prob', '1', '--tn_prob', '1', '--detect_vocab_path', './data/vocabulary/d_tags.txt', '--correct_vocab_path', './data/vocabulary/labels.txt', '--do_eval', '--train_path', '/content/drive/MyDrive/GEC_test/gector/OUTPUT_FILE', '--valid_path', '/content/drive/MyDrive/GEC_test/gector/OUTPUT_Test', '--save_dir', 'ckpts/ckpt_20231025_14:52:16', '--use_cache', '0', '--log_interval', '1', '--eval_interval', '50', '--save_interval', '50', '--pretrained_transformer_path', 'roberta-base', '--tensorboard_dir', 'logs/tb/gector_20231025_14:52:16'] exits with return code = -9
Should I change "ckpt_path="ckpts/globalstep-xxxx"" in predict.sh ? What should it be?

Maybe my questions are too nave but I'm kind of new in GEC. So if you help me it will really help.

What dataset?

Hello sir,
i was wondering how fast-gector is capturing context of long sentences when it is only trained on some english tokens/words.

Questions about the code

Hello! First of all thank you for this implementation
I'm currently trying to apply it on Arabic, and I have a few questions please
if you could answer my curiosities, I would be super thankful!

In the def get_target_sent_by_levels function, the very first edits level (level 0) is not considered hence the instruction rest_labels = label_list[1:], why is that?
In my data, there's only one level of edits (one error per sentence), so is it wrong to take label_list[0:]?
How is the labels vocab generated? should I just take the words that are included in my training data or can I use a vocabulary from another source? What would you recommend?

Sorry if my questions are too much! I just want to make sure that I'm training my model correctly
Thank you so much in advance!

Inference error

Received the following error message wehn doing the inference:
Looks like the MisMatchedTokenizer now takes three positional arguments here, only two are given in the predictor class here.

Traceback (most recent call last):
File "/home/ec2-user/gector/fast-gector/predict.py", line 101, in
main(args)
File "/home/ec2-user/gector/fast-gector/predict.py", line 49, in main
pred_batch, cnt = predictor.handle_batch(batch_text)
File "/home/ec2-user/gector/fast-gector/src/predictor.py", line 76, in handle_batch
batch_input_dict = self.preprocess(ori_batch)
File "/home/ec2-user/gector/fast-gector/src/predictor.py", line 114, in preprocess
input_ids, offsets = self.mismatched_tokenizer.encode(tokens)
File "/home/ec2-user/gector/fast-gector/utils/mismatched_utils.py", line 22, in encode
wordpiece_ids = [self.tokenizer_vocab[wordpiece]
File "/home/ec2-user/gector/fast-gector/utils/mismatched_utils.py", line 22, in
wordpiece_ids = [self.tokenizer_vocab[wordpiece]
TypeError: 'int' object is not subscriptable

Slow Seq2EditDataset creation

Thank you for offering this AllenNLP-free version of gector. I was trying to play with it but realized the Seq2EditDataset can be quite slow as ~8it/s, which makes it impossible to process original dataset use by the paper (9M for pretrain). I wonder if this is normal, or I may miss something important to accelerate it?