When I run the following command,
python -u crf.py train -b -c configs/conll12.crf.srl.bert.ini -d 0 -p exp/conll12.crf.srl.bert/model --batch-size=1000 --encoder bert --bert bert-large-cased --cache --binarize
I suffer from the error below:
tdteach@tdteach-u2004:~/workspace/crfsrl$ python -u crf.py train -b -c configs/conll12.crf.srl.bert.ini -d 0 -p exp/conll12.crf.srl.bert/model --batch-size=1000 --encoder bert --bert bert-large-cased --cache --binarize
2022-07-16 17:21:18 INFO
---------------------+-------------------------------
Param | Value
---------------------+-------------------------------
bert | bert-large-cased
n_bert_layers | 4
mix_dropout | 0.0
bert_pooling | mean
encoder_dropout | 0.1
n_edge_mlp | 500
n_role_mlp | 100
mlp_dropout | 0.1
lr | 5e-05
lr_rate | 20
mu | 0.9
nu | 0.9
eps | 1e-12
weight_decay | 0
clip | 5.0
min_freq | 2
fix_len | 20
epochs | 10
warmup | 0.1
update_steps | 1
batch_size | 1000
prd | False
mode | train
path | exp/conll12.crf.srl.bert/model
device | 0
seed | 1
threads | 16
workers | 0
cache | True
binarize | True
amp | False
feat | None
build | True
checkpoint | False
finetune | False
encoder | bert
max_len | None
buckets | 32
train | /home/tdteach/.cache/supar/data/srl/conll12/train.conllu
dev | /home/tdteach/.cache/supar/data/srl/conll12/dev.conllu
test | /home/tdteach/.cache/supar/data/srl/conll12/test.conllu
embed | glove-6b-100
---------------------+-------------------------------
2022-07-16 17:21:18 INFO Building the fields
Using bos_token, but it is not set yet.
2022-07-16 17:21:19 INFO CoNLL(
(words): SubwordField(vocab_size=28996, pad=[PAD], unk=[UNK], bos=[CLS])
(edges): ChartField(use_vocab=False)
(roles): ChartField(vocab_size=1, unk=O)
(spans): RawField()
)
2022-07-16 17:21:19 INFO Building the model
2022-07-16 17:21:25 INFO CRFSemanticRoleLabelingModel(
(encoder): TransformerEmbedding(bert-large-cased, n_layers=4, n_out=1024, stride=256, pooling=mean, pad_index=0, finetune=True)
(encoder_dropout): Dropout(p=0.1, inplace=False)
(edge_mlp_d): MLP(n_in=1024, n_out=500, dropout=0.1)
(edge_mlp_h): MLP(n_in=1024, n_out=500, dropout=0.1)
(role_mlp_d): MLP(n_in=1024, n_out=100, dropout=0.1)
(role_mlp_h): MLP(n_in=1024, n_out=100, dropout=0.1)
(edge_attn): Biaffine(n_in=500, bias_x=True)
(role_attn): Biaffine(n_in=100, bias_x=True, bias_y=True)
)
2022-07-16 17:21:27 INFO Loading the data
2022-07-16 17:21:27 INFO Seeking to cache the data to /home/tdteach/.cache/supar/data/srl/conll12/train.conllu.pt first
Traceback (most recent call last):
File "crf.py", line 43, in
main()
File "crf.py", line 39, in main
init(parser)
File "/media/tdteach/LinuxWork/workspace/crfsrl/supar/cmds/cmd.py", line 34, in init
parse(0 if torch.cuda.is_available() else -1, args)
File "/media/tdteach/LinuxWork/workspace/crfsrl/supar/cmds/cmd.py", line 54, in parse
parser.train(**args)
File "/media/tdteach/LinuxWork/workspace/crfsrl/crfsrl/parser.py", line 72, in train
train = Dataset(self.transform, args.train, **args).build(batch_size, buckets, True, dist.is_initialized(), workers)
File "/media/tdteach/LinuxWork/workspace/crfsrl/supar/utils/data.py", line 160, in build
with cache(self.transform.load(self.data, **self.kwargs)) as chunks, mp.Pool(32) as pool:
File "/home/tdteach/anaconda3/envs/allennlp/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/media/tdteach/LinuxWork/workspace/crfsrl/supar/utils/data.py", line 145, in cache
sentences = binarize({'sentences': progress_bar(sentences)}, fs)[1]['sentences']
KeyError: 'sentences'