geekjuruo / lead Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 1.0 2.65 MB

A Chinese Spell Checking Model Released on EMNLP2022.

Python 99.81% Shell 0.19%

lead's People

Contributors

Stargazers

Watchers

Forkers

thukelab

lead's Issues

How to train the model on multiple GPUs?

Hi, I want to train this model on multiple GPUs, but I cannot find the configuration for adding multiple GPUs in the training process.
Apart from this, I found that the utilization of GPU reaches only 10% on my NVIDIA 3090.

代码有放出来吗

代码阅读：关于对比学习与纠错模型

感觉论文关于对比学习和融入字典的想法很厉害，请问如果我想了解代码中关于这部分的代码应该重点阅读哪一部分？

论文模型图和代码的对应关系

论文模型图

我的理解是：
图中CSC Encoder EC Fine-Tuned with ℒCSC对应的代码是下图的SpellBERT.py
Ev对应GlyphClassifier.py，Ed对应WordClassifier.py，请问Ep对应的代码是哪一部分？或者是我的理解整体出现了问题？因为论文提到语音编码器和字典编码器使用的是一致的bert，而图形编码器是GCC，我似乎无法让代码和论文的模型部分完全对应？可以解答一下模型图和代码部分的对应关系吗？十分感谢您的回答！

TypeError:

The code “from reader.BasicReaderWithDict import BasicReader” in file WordDictReader.py reported an error. The error was that I could not find the “.BasicReaderWithDict”. After I changed it to “from reader import BasicReader”, I reported an error “File "/home/nlp/MyProject/LEAD/LEAD-main/LEAD-main/reader/HybridReader.py", line 6, in
from reader.WordDictReader import WordDictReader
File "/home/nlp/MyProject/LEAD/LEAD-main/LEAD-main/reader/WordDictReader.py", line 15, in
class WordDictReader(BasicReader):
TypeError: module() takes at most 2 arguments (3 given)” again. May I ask why? How to solve it.

公式疑惑

请问在损失函数Lk中，s代表最小batch中第s个字是错误的，这个s是在数据集中指出的那个位置，还是通过别的方法确定的呢？

EOF error

运行错误

运行时出现这样的错误，请问我该怎么处理它呢？
100%|██████████| 284201/284201 [00:01<00:00, 192911.68it/s]
Train Size: 284201, Valid Train Size: 0
Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1496, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/remote-home/cs_tcci_renjun/RENJUN/LEAD-main/main.py", line 49, in
pipeline.initialize()
File "/remote-home/cs_tcci_renjun/RENJUN/LEAD-main/pipeline/BasicPipeline.py", line 107, in initialize
self.get_loader()
File "/remote-home/cs_tcci_renjun/RENJUN/LEAD-main/pipeline/BasicPipeline.py", line 69, in get_loader
self.data_loaders[key] = DataLoader(dataset=value, collate_fn=lambda data: self.processor.process(data, key), shuffle=shuffle, drop_last=False, batch_size=batch_size)
File "/remote-home/cs_tcci_renjun/envs/rjlead/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 277, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type]
File "/remote-home/cs_tcci_renjun/envs/rjlead/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 97, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

GCC模型参数不匹配

Download the glyph-enhanced pretrained model from GCC and put the model files in resources/glyph
从该网址下载的pytorch_model.bin文件会报错模型参数不匹配，忽略该错误后，纠错出来的结果会出现很多unk，效果很差，是否有其他可替换该模型参数的其他下载网址

run error

Run times error:
Traceback (most recent call last):
File "/home/nlp/MyProject/LEAD/main.py", line 49, in
pipeline.initialize()
File "/home/nlp/MyProject/LEAD/pipeline/BasicPipeline.py", line 104, in initialize
self.init_model()
File "/home/nlp/MyProject/LEAD/pipeline/MultiModelPipeline.py", line 17, in init_model
super(MultiModelPipeline, self).init_model()
File "/home/nlp/MyProject/LEAD/pipeline/BasicPipeline.py", line 84, in init_model
model = model_class()
File "/home/nlp/MyProject/LEAD/model/GlyphClassifier.py", line 13, in init
self.bert = GlyphEncoder()
File "/home/nlp/MyProject/LEAD/model/GlyphEncoder.py", line 16, in init
bert_config = BertConfig.from_pretrained(os.path.join(glyph_path, "config.json"))
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/transformers/configuration_utils.py", line 501, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/transformers/configuration_utils.py", line 550, in get_config_dict
configuration_file = get_configuration_file(
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/transformers/configuration_utils.py", line 841, in get_configuration_file
all_files = get_list_of_files(
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/transformers/file_utils.py", line 1952, in get_list_of_files
return list_repo_files(path_or_repo, revision=revision, token=token)
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "/home/nlp/anaconda3/envs/lead/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './resources/glyph/config.json'. Use repo_type argument if needed.
, it seems that the reason for the error because no. / resources/glyph/config. The json 'file, please glyph - enhanced pretrained model specific which one is to download, is GCC using the training model in the link or GCC model after fine-tuning, And where do you get the required configuration files? Or can you provide it?

geekjuruo / lead Goto Github PK

lead's People

Contributors

Stargazers

Watchers

Forkers

lead's Issues

How to train the model on multiple GPUs?

代码有放出来吗

代码阅读：关于对比学习与纠错模型

论文模型图和代码的对应关系

TypeError:

公式疑惑

EOF error

运行错误

GCC模型参数不匹配

run error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent