cliang1453 / bond Goto Github PK
View Code? Open in Web Editor NEWBOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
License: Apache License 2.0
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
License: Apache License 2.0
About data format, it is said that we can transform file, e.g., BIO-format data, into json by referring to dataset/BC5CDR-chem/turn.py
(in the semi_script
dir). However, this file is not available. Could you help me with that?
Many thanks in advance.
Thank you very much for the work! I have a question,
If keeping your framework unchanged, just use BERT+CRF instead of BERT+Linear_layer as the NER model, is it available? Thank you very much!
In the paper, its mentioned that ' we select samples based on the prediction confidence of
the student model to further improve the quality of soft labels.' But its also mentioned that 'we discard all pseudo-labels from the (t-1)-th iteration, and only train the student model using pseudo-labels generated by the teacher model at the t-th iteration'.
Is the fist statement talking about calculating the loss of student model only on the high confidence pseudo labels or its something else because in the code i could'nt find any other justification for this line. Please suggest.
Thank you for the great work. Would you be able to provide the Gazetteers data and distant label generation code? I would like to try BOND on new datasets. @cliang1453 @HMJiangGatech
Thank you in advance
self_training_hp_label
in the code mean?hp_label
in https://github.com/cliang1453/BOND/blob/32f26988a58ee44eb4f50772c6d6c6eb116c83cf/data_utils.py#L111 mean?Congratulations on this paper getting accepted into KDD 2020!
I'm a Computer Science masters student at the National University of Singapore. I'm hoping to explore whether a better F1 score can be produced by improving the process of distant label generation. Would you have the gazetteers and code used to generate the distant labels to share?
Happy to share my progress with this as it is a semester project I am looking at. I am reachable at [email protected]
Thank you,
Jeanne
Hi,
Thanks for providing the code.
The code works fine with datasets given in the repository, but, if I want to use it to process some other dataset, how should I construct the "words" list given in your current datasets, which has values for the attention-mask?
thanks for sharing! i want to ask two quesions:
1ใ the teacher model just initialized with the student model and generate pseudo labels,so why use it ? why not just use the student model to generate pseudo labels?
2ใ if I have small full annotated data how to comine with you model?
thanks
Hi,
In the /data_utils.py ,there is a strange point i would like to ask before trying implement stage 1.
In the function of "read_examples_from_file", there is a command hp_labels = item["tags_hp"]
However, i cannot search '"tags_hp" in the train.json of the five datasets you provided.
Would you mind to explain the function of this 'tags_hp' ? Thanks a lot.
def read_examples_from_file(data_dir, mode):
file_path = os.path.join(data_dir, "{}.json".format(mode))
guid_index = 1
examples = []
with open(file_path, 'r') as f:
data = json.load(f)
for item in data:
words = item["str_words"]
labels = item["tags"]
if "tags_hp" in labels:
hp_labels = item["tags_hp"]
else:
hp_labels = [None]*len(labels)
examples.append(InputExample(guid="%s-%d".format(mode, guid_index), words=words, labels=labels, hp_labels=hp_labels))
guid_index += 1
return examples
Marcus
Thank you for the great work! Just wondering if you may consider to compared BOND with the paper "Distantly Supervised Named Entity Recognition using Positive-Unlabeled Learning", since you share similar motivations and challenges of partially matched labels in distantly supervised NER.
I did notice that you may have used different entity dictionaries and datasets, but it would be great if you can share some comparison results.
Thanks!
I'm having trouble reproducing your results on CoNNL dataset.
All I have changed is batch sizes:
TRAIN_BATCH=8
EVAL_BATCH=8
Loss on eval consistently goes up on second stage of self training. Can you help me figure out what am I doing wrong?
Hi,
Thanks for the amazing work.
Just one simple question, if i would like to use electra model from transformers library , besides, the BertConfig, tokenizers, and model load are all converted into electra tools, any other code i have to edit in order to make the electra into Bond model?
Thanks a lot.
Marcus
Thanks for your great work! Could you please provide the gazetteers information and distant label generation code?
Thank you for the great work. Would you be able to provide the Gazetteers data and distant label generation code? I would like to try BOND on new datasets. My email address is [email protected].
Thank you in advance!
when i was running my own dataset, the following problem arised:
Traceback (most recent call last):
File "run_self_training_ner.py", line 752, in
main()
File "run_self_training_ner.py", line 681, in main
model, global_step, tr_loss, best_dev, best_test = train(args, train_dataset, model_class, config, tokenizer, labels, pad_token_label_id)
File "run_self_training_ner.py", line 353, in train
loss.backward()
File "/anaconda3/envs/bond/lib/python3.7/site-packages/torch/tensor.py", line 150, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/anaconda3/envs/bond/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered
Are there any specific requirements for the data format, such as the sentence length?
Hi thanks for the work! I have some question on some implementations for stage 2
https://github.com/cliang1453/BOND/blob/master/run_self_training_ner.py#L204-L215
From the code, I can see stage 1 and stage 2 share the same scheduler, which means the learning rate for stage 2 is very small. Is this designed deliberately? The alternative is that I first train a baseline teacher's model, and pass the model to stage 2. And stage 2 can have its own learning rate scheduler then.
I am asking because I think learning rate is very important to BERT model training. Thanks.
Hello, thanks for your good work. I am a beginner of the NLP. I want to know how to reproduce the results reported in the paper. what's the version of the transformers used?
what is the format of the dataset? what are json keys, and how are IDs assigned? Do these correspond to BERT tokenizer or something else?
how to convert any new dataset into this format?
Line 240 in 3651a92
Hello, a few questions came to mind when I read the code dealing with "soft labels", and I wonder if you could kindly help:
What's the difference between "label_mask" and "attention_mask" here? And since BERT forward doesn't take "label_mask" as input, including the keyword "label_mask" seems to cause an unexpected keyword argument error when inputs is passed to model
It seems to me that pred_labels is of shape (batch_size , sequence_length , num_labels), which is different than what is accepted by BERT labels, i.e. (batch_size, sequence_length)(thus potential source of size does not match error when passed to BERT forward). And accoding to the paper, losses associated with soft labels are calculate in a different way than BERT loss, but in the code, soft label loss is also calculated by passing inputs to BERT model, so I'm a bit confused.
It would be grateful if you could kindly help, and have a nice day!
Could you please provide the codes for matching distant labels?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.