Coder Social home page Coder Social logo

fast-gector's People

Contributors

jason3900 avatar keshuichonglx avatar thuwyq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fast-gector's Issues

Could not use GPU for training

Hi,

I'm having trouble with training using an RTX 3090. No matter what I tried, it doesn't seem that training is happening on the GPU.

When I start training, these is a log saying it's using GPU: setup device: cuda:0, but GPU memory usage does not change at all, and CPU usage is very high on 1 core.

I'm using conda with Python 3.7, torch 0.11.0, which I setup with this command:

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

Am I missing anything? Please help me if possible. Thank you!

special_tokens_fix

parameter special_tokens_fix is specially for roberta? if use bert model, should set 0?

Run inference on a GPU EC2 instance

Hello Jason,

I am using the fine-tuned Roberta model from the GECToR repo to run the inference over a sentences file. I am able to make predictions using the original GECToR. However, I am having issues with the fast-gector.

This is what I have for the predict.sh:

#!/bin/bash
mkdir result
deepspeed --include localhost:0 --master_port 42991 predict.py
--batch_size 256
--iteration_count 5
--min_len 3
--max_len 128
--min_error_probability 0.0
--additional_confidence 0.0
--sub_token_mode "average"
--max_pieces_per_token 5
--model_dir "/home/ec2-user/fast-gector"
--ckpt_id "roberta_1_gectorv2.th"
--detect_vocab_path "./data/vocabulary/d_tags.txt"
--correct_vocab_path "./data/vocabulary/labels.txt"
--pretrained_transformer_path "roberta-base"
--input_path "sentences_sample_100.txt"
--out_path "result/sentences_sample_100.preds"
--special_tokens_fix 1
--detokenize 1
--deepspeed_config "./configs/ds_config_zero1.json"

The error message has been listed below. It looks like the deepspeed is looking for pt files under the model directory, how can I use the fine-tuned model to make predictions in this case?

Traceback (most recent call last):
File "/home/ec2-user/fast-gector/predict.py", line 101, in
main(args)
File "/home/ec2-user/fast-gector/predict.py", line 39, in main
predictor = Predictor(args)
File "/home/ec2-user/fast-gector/src/predictor.py", line 38, in init
self.model = self.init_model(args)
File "/home/ec2-user/fast-gector/src/predictor.py", line 59, in init_model
ds_engine.load_checkpoint(args.model_dir, args.ckpt_id)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2759, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2793, in _load_checkpoint
sd_loader = SDLoaderFactory.get_sd_loader(
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 44, in get_sd_loader
return MegatronSDLoader(ckpt_list, version, checkpoint_engine)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 216, in init
super().init(ckpt_list, version, checkpoint_engine)
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 56, in init
self.check_ckpt_list()
File "/home/ec2-user/miniconda/envs/gector_env/lib/python3.9/site-packages/deepspeed/runtime/state_dict_factory.py", line 179, in check_ckpt_list
assert len(self.ckpt_list) > 0
AssertionError

Run inference on cpu

Hey Jason,

Thanks for making this AllenNLP-free Gector library.
I am trying to run the inference on a cpu machine, and I received an error message saying no gpu resources available from the deepspeed.
Does it mean that I have to remove the deepspeed dependency in the codebase if I prefer to run on the cpu only?

Error in inference

0it [00:00, ?it/s]/home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from c10_cuda_check_implementation at /home/coulombc/wheels_builder/tmp.17380/python-3.10/torch/c10/cuda/CUDAException.cpp:31 (most recent call first):

Train the model

@Jason3900
Hi, first thankyou for removing allennlp dependency from GECToR because i had problem for using that.
Actually I started to run this your version but had some problem and questions if you answer me I would be thankful.

  1. I didn't used CUDA and NVIDIA. Does it make problem in running. I should say I'm running the code in colab and using T4 gpu.

  2. also I decided to just use part of the synthetic data to just see if it works but it is still about 2 hours and still running. Is it normal?

  3. After it finished it gave this error:
    image
    Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    2330722it [1:57:15, 26.72s/it][2023-10-25 16:50:16,631] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 6518
    [2023-10-25 16:50:16,632] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--deepspeed', '--deepspeed_config', 'configs/ds_config_zero1_fp16.json', '--num_epochs', '10', '--max_num_tokens', '128', '--valid_batch_size', '256', '--cold_step_count', '0', '--warmup', '0.1', '--cold_lr', '1e-3', '--skip_correct', '0', '--skip_complex', '0', '--sub_token_mode', 'average', '--special_tokens_fix', '1', '--unk2keep', '0', '--tp_prob', '1', '--tn_prob', '1', '--detect_vocab_path', './data/vocabulary/d_tags.txt', '--correct_vocab_path', './data/vocabulary/labels.txt', '--do_eval', '--train_path', '/content/drive/MyDrive/GEC_test/gector/OUTPUT_FILE', '--valid_path', '/content/drive/MyDrive/GEC_test/gector/OUTPUT_Test', '--save_dir', 'ckpts/ckpt_20231025_14:52:16', '--use_cache', '0', '--log_interval', '1', '--eval_interval', '50', '--save_interval', '50', '--pretrained_transformer_path', 'roberta-base', '--tensorboard_dir', 'logs/tb/gector_20231025_14:52:16'] exits with return code = -9

  4. Should I change "ckpt_path="ckpts/globalstep-xxxx"" in predict.sh ? What should it be?

Maybe my questions are too nave but I'm kind of new in GEC. So if you help me it will really help.

What dataset?

Hello sir,
i was wondering how fast-gector is capturing context of long sentences when it is only trained on some english tokens/words.

Questions about the code

Hello! First of all thank you for this implementation
I'm currently trying to apply it on Arabic, and I have a few questions please
if you could answer my curiosities, I would be super thankful!

  1. In the def get_target_sent_by_levels function, the very first edits level (level 0) is not considered hence the instruction rest_labels = label_list[1:], why is that?
    In my data, there's only one level of edits (one error per sentence), so is it wrong to take label_list[0:]?

  2. How is the labels vocab generated? should I just take the words that are included in my training data or can I use a vocabulary from another source? What would you recommend?

Sorry if my questions are too much! I just want to make sure that I'm training my model correctly
Thank you so much in advance!

Inference error

Received the following error message wehn doing the inference:
Looks like the MisMatchedTokenizer now takes three positional arguments here, only two are given in the predictor class here.

Traceback (most recent call last):
File "/home/ec2-user/gector/fast-gector/predict.py", line 101, in
main(args)
File "/home/ec2-user/gector/fast-gector/predict.py", line 49, in main
pred_batch, cnt = predictor.handle_batch(batch_text)
File "/home/ec2-user/gector/fast-gector/src/predictor.py", line 76, in handle_batch
batch_input_dict = self.preprocess(ori_batch)
File "/home/ec2-user/gector/fast-gector/src/predictor.py", line 114, in preprocess
input_ids, offsets = self.mismatched_tokenizer.encode(tokens)
File "/home/ec2-user/gector/fast-gector/utils/mismatched_utils.py", line 22, in encode
wordpiece_ids = [self.tokenizer_vocab[wordpiece]
File "/home/ec2-user/gector/fast-gector/utils/mismatched_utils.py", line 22, in
wordpiece_ids = [self.tokenizer_vocab[wordpiece]
TypeError: 'int' object is not subscriptable

Slow Seq2EditDataset creation

Thank you for offering this AllenNLP-free version of gector. I was trying to play with it but realized the Seq2EditDataset can be quite slow as ~8it/s, which makes it impossible to process original dataset use by the paper (9M for pretrain). I wonder if this is normal, or I may miss something important to accelerate it?

Result for another language

Hi,
I wanted to use this method for another language and i wanted to make the dataset myself.
Can i use it or this method and model has good result only for rich language and large dataset? I would be thankful if you answer me.

Deepspeed-free branch

Hello
Is that possible to publish a branch without the deepspeed dependence? Our use case requires a cpu usage, which cannot be compatible with deepspeed I afraid. Thanks!

How to obtain the Data

Hi,
How to get the data as mention in the prepare data script
SUBSET="train-stage2"
SOURCE="../gec_private_train_data/${SUBSET}.src"
TARGET="../gec_private_train_data/${SUBSET}.trg"
OUTPUT="../gec_private_train_data/${SUBSET}.edits"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.