hemingkx / wordseg Goto Github PK

A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .

Python 100.00%

bert pytorch roberta chinese-word-segmentation bilstm-crf bert-crf

wordseg's Introduction

Hi there, I'm Heming Xia 👋

Hi, I'm Heming Xia, a Ph.D. student at The Hong Kong Polytechnic University 🇭🇰🇨🇳, supervised by Prof. Wenjie Li.

My research mainly focuses on 1) efficient and effective NLP 2) tool learning, and 3) cross vision and language understanding. Previously, I completed my master’s degree at Peking University, advised by Prof. Zhifang Sui, and my bachelor’s degree at the School of Physics, Peking University.

🍻 Ph.D. at NLP Group @PolyU.
🔭 I'm interested in Natural Language Processing.
💻 Coding with Python, familiar with PyTorch, Fairseq, Keras, Tensorflow.
💬 Welcome to contact me by email ↙️.

* These nice badges are generated by Shields.io and Substats.

wordseg's People

Contributors

Stargazers

Watchers

Forkers

barrybean joshua0128 503718696 linhong00316 zurichrain feiward big-data-ai viola-yuan kkkc3231 damon98 marlo-li zhangyingxin98 voscar-zhang tinyyu433 startgis nick-2008 xspring14 esoff xhw20010111 haheh lc1997622 sunyiwen1998 benyang0506 ruby-g0 mikasa-changfang hunterkai fangchuanzhi uwuneng sharpboy2008 xichunling michal-olek bella-lyt krzz2q jiayu-123 yaoyonstudio ayanngg danmo121 feiyangw ancientwaiting cccarloooo rogerstao

wordseg's Issues

单卡bert训练出现tensor维度的报错，应该在哪里修改呢？

RuntimeError: The expanded size of the tensor (513) must match the existing size (512) at non-singleton dimension 1. Target sizes: [6, 513]. Tensor sizes: [1, 512]

No such file or directory: '/BiLSTM-CRF/experiments/model_5.pth'

There is no such file in experiments directory called "model_5.pth". Could you please tell me how to find this file or where to download it? Thank you so much!

提供训练之后的model

您好，很高兴您提供了如此棒的开源代码，请问一下您稍后会提供训练之后的model吗？

AttributeError

在对BERT-BiLSTM-CRF进行训练的时候一直报错显示AttributeError，有解决的参考意见吗，谢谢。

跑bert模型遇到问题，求助

在运行bert+softmax模型的过程中，一直再报一个错误，迟迟没有解决，想问一下，大家运行的时候遇到过吗，这个问题有知道该怎么解决的吗，谢谢各位啦
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

test error:ValueError: cannot copy sequence with size 19 to array axis with dimension 18

File "run.py", line 49, in test
val_metrics = evaluate(test_loader, model, mode='test')
File "/zhangleisx4614/code/WordSeg-main/BERT-CRF/train.py", line 85, in evaluate
for idx, batch_samples in enumerate(dev_loader):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/zhangleisx4614/code/WordSeg-main/BERT-CRF/data_loader.py", line 96, in collate_fn
batch_labels[j][:cur_tags_len] = labels[j]
ValueError: cannot copy sequence with size 19 to array axis with dimension 18

关于data_loader中操作的询问

作者你好，在collate_fn中，为什么需要计算max_label_len呢

您好，数据集的网盘链接失效了，能补一个吗，谢谢(^▽^)

FileNotFoundError: [Errno 2] No such file or directory:

Traceback (most recent call last):
File "E:\Program Files\PycharmProjects\bert\WordSeg-main\BERT-Softmax\metrics.py", line 189, in
output2res()
File "E:\Program Files\PycharmProjects\bert\WordSeg-main\BERT-Softmax\metrics.py", line 167, in output2res
with open(config.output_dir, 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'E:\Program Files\PycharmProjects\bert\WordSeg-main\BERT-Softmax/data/output.txt'

请问这种报错，是需要我提前建一个output的txt文档吗？

error

pretrained_bert_models/bert-base-chinese/
bert_config.json
bert_model.ckpt.index
bert_model.ckpt.meta
config.json
pytorch_model.bin
readme
vocab.txt

device: cpu
--------Process Done!--------
Model name 'pretrained_bert_models/bert-base-chinese/' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming 'pretrained_bert_models/bert-base-chinese/' is a path or url to a directory containing tokenizer files.
Didn't find file pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it.
Didn't find file pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it.
Didn't find file pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.
loading file pretrained_bert_models/bert-base-chinese/vocab.txt
loading file None
loading file None
loading file None
Model name 'pretrained_bert_models/bert-base-chinese/' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming 'pretrained_bert_models/bert-base-chinese/' is a path or url to a directory containing tokenizer files.
Didn't find file pretrained_bert_models/bert-base-chinese/added_tokens.json. We won't load it.
Didn't find file pretrained_bert_models/bert-base-chinese/special_tokens_map.json. We won't load it.
Didn't find file pretrained_bert_models/bert-base-chinese/tokenizer_config.json. We won't load it.
loading file pretrained_bert_models/bert-base-chinese/vocab.txt
loading file None
loading file None
loading file None
--------Dataset Build!--------