The codebase to implement 3 baseline methods for the EMNLP paper ``Few-Shot Named Entity Recognition: An Empirical Baseline Study''
To install the required packages by following commands:
$ pip3 install -r requirements.txt
Download the models pre-trained on WiFine (Wikipedia) to src/pretrained_models/
from checkpoint.
To load model pre-trained on WiFine (Wikipedia) and fine-tune on CONLL2003 dataset,
cd src
bash ./train_lc.sh
By default, this runs 10 rounds of experiments with different sets of 5-shot seeds and allows self-training on the whole dataset.
To run multiple rounds of experiments on various few-shot seeds (e.g., 10 rounds), set
--train_text few_shot_5 --train_ner few_shot_5 --few_shot_sets 10
in the command. ''few_shot_5'' is the common file name of the seed files. The average results of F1-score will be output at the end.
If only one round is needed, you need to set the complete file names for training
--train_text train.words --train_ner train.ner
Set the files for self-training by
--unsup_text train.words --unsup_ner train.ner
The labels in ''unsup_ner'' are not used in training, but will be used for evaluation before self-training to give you a hint on how much potential you can get from self-training.
To disallow self-training, just remove the two relevant flags.
If you want to load your own pre-trained model, set
--load_model True --load_model_name path/to/your/model
If you want to load the original pre-trained Roberta model (https://arxiv.org/abs/1907.11692), set
--load_model False
You can use prototype-based methods by running the following command
bash ./train_proto.sh
In this script, you can also allow or disallow multiple runs, and customize pre-trained models.
In our paper, we studied the result on 10 benchmark datasets. For the public ones, we provide our few-shot seed sets and the whole dataset here. For the other datasets which require license for access, if you want the same set of few-shot seeds, please first get the license for the whole dataset and then ask the first author for the sampled few-shot seeds.
Dataset | Domain | Included here |
---|---|---|
CoNLL | News | ✔️ |
Onto | General | ✖️ |
WikiGold | General | ✔️ |
WNUT17 | Social Media | ✔️ |
MITMovie | Review | ✔️ |
MITRestaurant | Review | ✔️ |
SNIPS | Dialogue | ✔️ |
ATIS | Dialogue | ✔️ |
Multiwoz | Dialogue | ✔️ |
i2b2 | Medical | ✖️ |