titipata / detecting-scientific-claim Goto Github PK

Extracting scientific claims from biomedical abstracts (powered by AllenNLP), demo:

Home Page: http://dolores.seas.upenn.edu:5001/

Python 76.35% Shell 0.16% CSS 11.03% HTML 12.45%

natural-language-processing deep-learning allennlp sentence-classification

detecting-scientific-claim's Introduction

Claim Extraction for Scientific Publications

Detecting claim from scientific publication using discourse model and transfer learning. Models are trained using AllenNLP library.

Installing as a package

You can install the package using PIP, which will help you use the discourse classes inside a module

pip install git+https://github.com/titipata/detecting-scientific-claim.git

you will be able to use them as

import discourse
predictor = discourse.DiscourseCRFClassifierPredictor()

Training discourse model

Running AllenNLP to train a discourse model using PubmMedRCT dataset as follows

allennlp train experiments/pubmed_rct.json -s output --include-package discourse

We point data location to Amazon S3 directly in pubmed_rct.json so you do not need to download the data locally. Change cuda_device to -1 in pubmed_rct.json if you want to run on CPU. There are more experiments available in experiments folder.

Note that you have to remove output folder first before running.

Predicting discourse

We trained the Bidirectional LSTM model on structured abstracts from Pubmed to predict discourse probability (RESULTS, METHODS, CONCLUSIONS, BACKGROUND, OBJECTIVE) of a given sentence. You can download trained model from Amazon S3

wget https://s3-us-west-2.amazonaws.com/pubmed-rct/model.tar.gz # or model_crf.tar.gz for pretrained model with CRF layer

and run web service for discourse prediction task as follow

bash web_service.sh

To test the train model with provided examples fixtures.json, simply run the following to predict labels.

allennlp predict model.tar.gz \
    pubmed-rct/PubMed_200k_RCT/fixtures.json \
    --include-package discourse \
    --predictor discourse_predictor

or run the following for

allennlp predict model_crf.tar.gz \
    pubmed-rct/PubMed_200k_RCT/fixtures_crf.json \
    --include-package discourse \
    --predictor discourse_crf_predictor

To evaluate discourse model, you can run the following command

allennlp evaluate model.tar.gz \
  https://s3-us-west-2.amazonaws.com/pubmed-rct/test.json \
  --include-package discourse

Predicting claim (web service)

We use transfer learning with fine tuning to train claim extraction model from pre-trained discourse model. The schematic of the training can be seen below.

You can run the demo web application to detect claims as follows

export FLASK_APP=main.py
flask run --host=0.0.0.0 # this will serve at port 5000

The interface will look something like this

And output will look something like the following (highlight means claim, tag behind the sentence is discourse prediction)

Expertly annotated dataset We release the dataset of annotated 1,500 abstracts containing 11,702 sentences (2,276 annotated as claim sentences) sampled from 110 biomedical journals. The final dataset are the majority vote from three experts. The annotations are hosted on Amazon S3 and can be found from these given URLs.

Requirements

Citing the repository

You can cite our paper available on arXiv as

Achakulvisut, Titipat, Chandra Bhagavatula, Daniel Acuna, and Konrad Kording. "Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning." arXiv preprint arXiv:1907.00962 (2019).

or using BibTeX

@article{achakulvisut2019claim,
  title={Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning},
  author={Achakulvisut, Titipat and Bhagavatula, Chandra and Acuna, Daniel and Kording, Konrad},
  journal={arXiv preprint arXiv:1907.00962},
  year={2019}
}

Acknowledgement

This project is done at the Allen Institute for Artificial Intelligence and Konrad Kording lab, University of Pennsylvania

detecting-scientific-claim's People

Contributors

Stargazers

Watchers

Forkers

afcarl anuragraj antgr olabknbit hitman56 drexelcitationrecommendation sjoerdapp danielkentwood amitshn arianpasquali billchenxi sunupdata 97harsh gullyburns tresca-msw akshitsehgal lisaterumi xuperx nadimkaysar

detecting-scientific-claim's Issues

Minor html fix for demo page

Adding footnote with Github link and organizations
Change font as in bootstrap 4
Make navigation header a bit bigger
Having space between text input and header
Output prediction with bar
Color the box for discourse prediction (e.g. OBJECTIVE) - use color palette from spacy demo: BACKGROUND: #f6f3b4, OBJECTIVE: #fb607f, METHODS: #7cb9e8, CONCLUSIONS: #ffa525, RESULTS: #81ff76
Remove highlight when hovering or clicking the button.

@bluenex!

TODOs for the experiment

List of improvements and things to do as @csbhagav suggested

Create a snippet to prepare the dataset for the annotation
Loss for the CRF layer is too high >> check the loss function
Adding "evaluate_on_test": true, to the experiment JSON file
Trying trainable = true for Glove embeddings
Adding Learning Rate Schedule to the experiment
Sequence prediction output for CRF models
Alternate training between discourse and claim

Trouble running web_service.sh and following examples from README

I've been trying to use your classifier but encountered a couple of problems:

In your README line wget https://s3-us-west-2.amazonaws.com/pubmed-rct/model.tar.gz downloads the model to the main dir, not output/ and the later commands assume model will be in the output dir.
The commands (and web_service.sh) all use discourse_classifier as a predictor, where I believe they should use discourse_predictor instead. discourse_classifier gives error allennlp.common.checks.ConfigurationError: 'discourse_classifier is not a registered name for Predictor'
As long as I was able to detect the previous problems quite quickly and fixing them was trivial, I stumbled across a problem I couldn't overcome: The commands for starting the web server do not start it for me. I get

(p37) detecting-scientific-claim$ flask run --host=0.0.0.0
 * Serving Flask app "main.py"
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
/home/student/anaconda/envs/p37/lib/python3.7/site-packages/sklearn/utils/linear_assignment_.py:21: DeprecationWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
  DeprecationWarning)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 355287325/355287325 [01:43<00:00, 3420728.39B/s]
/home/student/anaconda/envs/p37/lib/python3.7/site-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 381931118/381931118 [01:36<00:00, 3946648.30B/s]
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

(p37) student@gpunode2:~/byczynskaa/detecting-scientific-claim$ bash web_service.sh 
/home/student/anaconda/envs/p37/lib/python3.7/site-packages/sklearn/utils/linear_assignment_.py:21: DeprecationWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
  DeprecationWarning)
/home/student/anaconda/envs/p37/lib/python3.7/site-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
Model loaded, serving demo on port 8000

But when I go the the localhost:port, I get This site can’t be reached :(

Also, since I downloaded the latest allennlp the flags might have changed, and the evaluate command should be of the form allennlp evaluate model.tar.gz https://s3-us-west-2.amazonaws.com/pubmed-rct/test.json --include-package discourse

Could you help me fix the issue 3.?
Thanks!

BNN Classifier

I cannot find any reference to DiscourseBNNClassifier (used in e.g. pubmed_rct_bnn.json) in the "Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning" paper. Can you provide more information or links to papers about this architecture?

'DiscourseCrfClassifier' object has no attribute 'classifier_feedforward'

So I tried running your transfer_learning_crf.py script and it throws back this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-fd6f08fec54e> in <module>
     47     num_classes, constraints, include_start_end_transitions = 2, None, False
     48     model.classifier_feedforward._linear_layers = ModuleList([torch.nn.Linear(2 * EMBEDDING_DIM, EMBEDDING_DIM), 
---> 49                                                               torch.nn.Linear(EMBEDDING_DIM, num_classes)])
     50     model.crf = ConditionalRandomField(num_classes, constraints, 
     51                                        include_start_end_transitions=include_start_end_transitions)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    592                 return modules[name]
    593         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 594             type(self).__name__, name))
    595 
    596     def __setattr__(self, name, value):

AttributeError: 'DiscourseCrfClassifier' object has no attribute 'classifier_feedforward'

In fact, DiscourseCrfClassifier doesn't have that attribute, as it was removed in an earlier commit.

I tried commenting the line that tries to use the attribute, but it then gives em a different error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-331d1897d4d1> in <module>
    102         cuda_device=-1
    103     )
--> 104     trainer.train()
    105 
    106     # unfreeze most layers and continue training

~/anaconda3/lib/python3.7/site-packages/allennlp/training/trainer.py in train(self)
    476         for epoch in range(epoch_counter, self._num_epochs):
    477             epoch_start_time = time.time()
--> 478             train_metrics = self._train_epoch(epoch)
    479 
    480             # get peak of memory usage

~/anaconda3/lib/python3.7/site-packages/allennlp/training/trainer.py in _train_epoch(self, epoch)
    318             self.optimizer.zero_grad()
    319 
--> 320             loss = self.batch_loss(batch_group, for_training=True)
    321 
    322             if torch.isnan(loss):

~/anaconda3/lib/python3.7/site-packages/allennlp/training/trainer.py in batch_loss(self, batch_group, for_training)
    259             batch = batch_group[0]
    260             batch = nn_util.move_to_device(batch, self._cuda_devices[0])
--> 261             output_dict = self.model(**batch)
    262 
    263         try:

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/media/sf_COVID19KTool/COVID19KTool/discourse/models/discourse_crf_model.py in forward(self, sentences, labels)
     77 
     78         # CRF prediction
---> 79         logits = self.label_projection_layer(encoded_sentences) # size: (n_batch, n_sents, n_classes)
     80         best_paths = self.crf.viterbi_tags(logits, sentence_masks)
     81         predicted_labels = [x for x, y in best_paths]

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/allennlp/modules/time_distributed.py in forward(self, pass_through, *inputs, **kwargs)
     49             reshaped_kwargs[key] = value
     50 
---> 51         reshaped_outputs = self._module(*reshaped_inputs, **reshaped_kwargs)
     52 
     53         if some_input is None:

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1608     if input.dim() == 2 and bias is not None:
   1609         # fused op is marginally faster
-> 1610         ret = torch.addmm(bias, input, weight.t())
   1611     else:
   1612         output = input.matmul(weight.t())

RuntimeError: size mismatch, m1: [640 x 600], m2: [400 x 2] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

Am I doing something wrong? And could you update the transfer learning script to fit this change?

Thanks.

ConfigurationError: 'key "type" is required at location "dataset_reader."

I'm trying to replicate the results in this paper. I installed allennlp via pip and tried running the command allennlp train experiments/pubmed_rct.json -s output --include-package discourse but I keep getting the following error:

File "/Users/praghav/anaconda3/lib/python3.6/site-packages/allennlp/commands/train.py", line 138, in datasets_from_params
    dataset_reader = DatasetReader.from_params(params.pop('dataset_reader'))
  File "/Users/praghav/anaconda3/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 115, in from_params
    return cls.by_name(choice).from_params(params)
  File "/Users/praghav/anaconda3/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 114, in from_params
    choice = params.pop_choice('type', cls.list_available())
  File "/Users/praghav/anaconda3/lib/python3.6/site-packages/allennlp/common/params.py", line 180, in pop_choice
    value = self.pop(key, default)
  File "/Users/praghav/anaconda3/lib/python3.6/site-packages/allennlp/common/params.py", line 96, in pop
    raise ConfigurationError("key \"{}\" is required at location \"{}\"".format(key, self.history))
allennlp.common.checks.ConfigurationError: 'key "type" is required at location "dataset_reader."'

Why isn't it reading the dataset_reader from the discourse package as I guess was intended? Any idea how to fix this?

Edit annotation tool

Now, we have a light version in annotation_tool folder.

TODOS

move start tagging button in the middle
edit submit button in tagging page to the middle
edit nav bar on laptop screen and mobile
make progress bar checking how many tags already done by a user
fix log-in page which has weird color right now
display CLAIM tag to sentence after clicking or highlight the sentence

I want to install the discourse classes, using PyPi

I was trying to install the scieitific-claim-detection using pip, i.e. using pip install git+

I noticed there is no setup file defined?
Can I open an PR which has this change, so that it can be installed, using the said method ??

Prediction error, tuple has no attribute fields

We got the same issue as stated in allenai/allennlp#1467

TypeError: init() missing 1 required positional argument: 'dataset_reader'

When i use the sentence "predictor = discourse.DiscourseCRFClassifierPredictor()", it throws back this error: TypeError: init() missing 1 required positional argument: 'dataset_reader'.
What can i do to solve this problem?
Thanks.

Running demo in environment

This is just a note to run the demo in given Python 3.6 environment

source activate py34
export FLASK_APP=main.py
python3 -m flask run --host=0.0.0.0 --port 5000 # in case default is python 2

hardware requirements

Hi, I run the experiment in my machine and also in colab (https://colab.research.google.com/drive/10z-ZpmTRBIegicA4p9ueA_BOLet-7fHJ),
but my machine halts (1810932it [37:24, 2382,33it/s]) and so does colab (1748626it [21:51, 1.29s/it]).
So, what are the hardware requirements to run it smoothly?

404 error - "https://s3-us-west-2.amazonaws.com/pubmed-rct/model_crf.tar.gz"

The model link results in a 404 error. Has it been moved to a different bucket or not hosted any longer?

Wrong masking in sentences to vector encoding?

Here, it seems that you want to extract the mask per sentence rather than across sentences:

detecting-scientific-claim/discourse/models/discourse_crf_model.py

Lines 62 to 68 in fc1e00a

    
           sentence_masks = util.get_text_field_mask(sentences) 
        
           # get sentence embedding 
        
           encoded_sentences = [] 
        
           n_sents = embedded_sentences.size()[1] # size: (n_batch, n_sents, n_tokens, n_embedding) 
        
           for i in range(n_sents): 
        
               encoded_sentences.append(self.sentence_encoder(embedded_sentences[:, i, :, :], sentence_masks))

This can be done by changing line 62 to

sentence_masks = util.get_text_field_mask(sentences, 1)

and line 68 to

encoded_sentences.append(self.sentence_encoder(embedded_sentences[:, i, :, :], sentence_masks[i]))

Claim prediction showing either 0 or 100

For any sentence with claims detected, the probability score is either 0 or 100, never a value inbetween. On the image you show, there are decimal places.

Collect corrections

Would be so cool if people could see the result of the parsing, but that the results give you a drop-down menu to correct them. Which then, regularly, could be used to improve the parsings. Because that way we could get better over time.

Error loading model using load_archive from local path

This is the same as: allenai/allennlp#4431 (comment)

Description

I'm trying to load a model saved to a local path in the Kaggle environment, using the allennlp.models.load_archive() method but I get the below error:
"ConfigurationError: discourse_classifier is not a registered name for Model. You probably need to use the --include-package flag to load your custom code. Alternatively, you can specify your choices using fully-qualified paths, e.g. {"model": "my_module.models.MyModel"} in which case they will be automatically imported correctly."

Model location: https://s3-us-west-2.amazonaws.com/pubmed-rct/model.tar.gz

Traceback:

<ipython-input-21-1d9004d63a1a> in <module>
----> 1 archive = load_archive("/kaggle/input/modelbase") ## available at github
      2 predictor = Predictor.from_archive(archive, 'discourse_crf_predictor')
      3 gc.collect()

/opt/conda/lib/python3.7/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, opt_level, overrides, weights_file)
    195         serialization_dir=serialization_dir,
    196         cuda_device=cuda_device,
--> 197         opt_level=opt_level,
    198     )
    199 

/opt/conda/lib/python3.7/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device, opt_level)
    389         # This allows subclasses of Model to override _load.
    390 
--> 391         model_class: Type[Model] = cls.by_name(model_type)  # type: ignore
    392         if not isinstance(model_class, type):
    393             # If you're using from_archive to specify your model (e.g., for fine tuning), then you

/opt/conda/lib/python3.7/site-packages/allennlp/common/registrable.py in by_name(cls, name)
    135         """
    136         logger.debug(f"instantiating registered subclass {name} of {cls}")
--> 137         subclass, constructor = cls.resolve_class_name(name)
    138         if not constructor:
    139             return subclass

/opt/conda/lib/python3.7/site-packages/allennlp/common/registrable.py in resolve_class_name(cls, name)
    183             # is not a qualified class name
    184             raise ConfigurationError(
--> 185                 f"{name} is not a registered name for {cls.__name__}. "
    186                 "You probably need to use the --include-package flag "
    187                 "to load your custom code. Alternatively, you can specify your choices "

Environment

OS: Kaggle kernel

Python version: 3.7.6

Output of pip freeze:

absl-py==0.9.0
adal==1.2.2
affine==2.3.0
aiohttp==3.6.2
alabaster==0.7.12
albumentations==0.4.5
alembic==1.4.2
allennlp==1.0.0
altair==4.1.0
anaconda-client==1.7.2
anaconda-project==0.8.3
annoy==1.16.3
ansiwrap==0.8.4
appdirs==1.4.3
argh==0.26.2
arrow==0.15.5
arviz==0.8.3
asn1crypto==1.3.0
astroid==2.3.3
astropy==4.0.1.post1
astunparse==1.6.3
async-generator==1.10
async-timeout==3.0.1
atomicwrites==1.3.0
attrs==19.3.0
audioread==2.1.8
autopep8==1.5.1
Babel==2.8.0
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
Baker==1.3
basemap==1.2.1
bayesian-optimization @ git+https://github.com/fmfn/BayesianOptimization.git@cb28df83f757c5c7406b2730eac3a67a2d0270a5
bayespy==0.5.19
bcolz==1.2.1
beautifulsoup4==4.9.0
binaryornot==0.4.4
biopython==1.77
bitarray==1.2.1
bkcharts==0.2
black==19.10b0
bleach==3.1.4
blinker==1.4
blis==0.4.1
bokeh==2.0.1
Boruta==0.3
boto==2.49.0
boto3==1.14.6
botocore==1.17.6
Bottleneck==1.3.2
-e git+https://github.com/SohierDane/BigQuery_Helper@8615a7f6c1663e7f2d48aa2b32c2dbcb600a440f#egg=bq_helper
bqplot==0.12.12
branca==0.4.1
brewer2mpl==1.4.1
brotlipy==0.7.0
cachetools==3.1.1
cairocffi==1.1.0
CairoSVG==2.4.2
Cartopy @ file:///home/conda/feedstock_root/build_artifacts/cartopy_1588596947365/work
catalogue==1.0.0
catalyst==20.6
catboost==0.23.2
category-encoders @ git+https://github.com/scikit-learn-contrib/categorical-encoding.git@ea9428a896bef77baf8b26159f7030cee924e916
certifi==2020.4.5.2
cesium==0.9.12
cffi==1.14.0
cftime==1.1.3
chainer==7.4.0
chainer-chemistry==0.7.0
chainercv==0.13.1
chardet==3.0.4
cleverhans==3.0.1
click==7.1.1
click-plugins==1.1.1
cliff==3.3.0
cligj==0.5.0
cloud-tpu-client==0.10
cloudpickle==1.3.0
clyent==1.2.2
cmaes==0.5.0
cmd2==1.1.0
cmdstanpy==0.4.0
cmudict==0.4.4
colorama==0.4.3
colorcet==2.0.2
colorlog==4.1.0
colorlover==0.3.0
conda==4.8.3
conda-package-handling==1.6.0
ConfigArgParse==1.2.3
configparser==5.0.0
confuse==1.1.0
contextily==1.0.0
contextlib2==0.6.0.post1
convertdate==2.2.1
conx==3.7.10
cookiecutter==1.7.0
coverage==5.1
crc32c==2.0
cryptography==2.8
cssselect2==0.3.0
cufflinks==0.17.3
CVXcanon==0.1.1
cvxpy==1.1.1
cycler==0.10.0
cymem==2.0.3
cysignals==1.10.2
Cython==0.29.20
cytoolz==0.10.1
dask==2.18.1
dask-glm==0.2.0
dask-ml==1.5.0
dask-xgboost==0.1.10
datashader==0.11.0
datashape==0.5.2
deap==1.3.1
decorator==4.4.2
deepdish==0.3.6
defusedxml==0.6.0
Delorean==1.0.0
Deprecated @ file:///home/conda/feedstock_root/build_artifacts/deprecated_1589409885623/work
deprecation==2.1.0
descartes==1.1.0
diff-match-patch==20181111
dill==0.3.2
dipy==1.1.1
distributed==2.14.0
dlib==19.20.0
docker==4.2.0
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.15.2
earthengine-api==0.1.226
ecos==2.0.7.post1
eli5==0.10.1
emoji==0.5.4
en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.2.5/en_core_web_lg-2.2.5.tar.gz
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
entrypoints==0.3
ephem==3.7.7.1
essentia==2.1b6.dev234
et-xmlfile==1.0.1
fancyimpute==0.5.4
fastai==1.0.61
fastcache==1.1.0
fastprogress==0.2.3
fasttext==0.9.2
fbpca==1.0
fbprophet==0.6
feather-format==0.4.1
featuretools==0.16.0
filelock==3.0.10
Fiona==1.8.13
fitter==1.2.1
flake8==3.7.9
flashtext==2.7
Flask==1.1.2
folium==0.11.0
fsspec==0.7.2
funcy==1.14
fury==0.5.1
future==0.18.2
fuzzywuzzy==0.18.0
gast==0.3.3
gatspy==0.3
gcsfs==0.6.1
GDAL==3.0.4
gensim==3.8.3
geographiclib==1.50
Geohash==1.0
geojson==2.5.0
geopandas==0.6.3
geoplot==0.4.1
geopy==1.22.0
geoviews==1.8.1
gevent==1.5.0
ggplot @ https://github.com/hbasria/ggpy/archive/0.11.5.zip
gitdb==4.0.4
GitPython==3.1.1
glob2==0.7
gluoncv==0.7.0
gluonnlp==0.9.1
gmpy2==2.1.0b1
google==2.0.3
google-api-core==1.17.0
google-api-python-client==1.8.0
google-auth==1.14.0
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.1
google-cloud-automl==0.10.0
google-cloud-bigquery==1.12.1
google-cloud-bigtable==1.2.1
google-cloud-core==1.3.0
google-cloud-dataproc==0.7.0
google-cloud-datastore==1.12.0
google-cloud-firestore==1.6.2
google-cloud-kms==1.4.0
google-cloud-language==1.3.0
google-cloud-logging==1.15.0
google-cloud-pubsub==1.4.3
google-cloud-scheduler==1.2.1
google-cloud-spanner==1.15.1
google-cloud-speech==1.3.2
google-cloud-storage==1.27.0
google-cloud-tasks==1.5.0
google-cloud-translate==2.0.1
google-cloud-videointelligence==1.14.0
google-cloud-vision==1.0.0
google-pasta==0.2.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
gplearn==0.4.1
gpxpy==1.4.1
gql==0.2.0
graphql-core==1.1
graphviz==0.8.4
greenlet==0.4.15
grpc-google-iam-v1==0.12.3
grpcio==1.29.0
grpcio-gcp==0.2.2
gym==0.17.2
h2o==3.30.0.5
h5py==2.10.0
haversine==2.2.0
heamy==0.0.7
HeapDict==1.0.1
hep-ml==0.6.1
hmmlearn==0.2.3
holidays==0.10.2
holoviews==1.13.2
hpsklearn==0.1.0
html5lib==1.0.1
htmlmin==0.1.12
httplib2==0.17.2
httplib2shim==0.0.3
humanize==2.4.0
hunspell==0.5.5
husl==4.0.3
hyperopt==0.2.4
hypertools==0.6.2
hypothesis==5.10.0
ibis-framework==1.3.0
idna==2.9
imagecodecs==2020.5.30
ImageHash==4.1.0
imageio==2.8.0
imagesize==1.2.0
imbalanced-learn==0.7.0
imgaug==0.2.6
implicit==0.4.2
importlib-metadata==1.6.0
intervaltree==3.0.2
ipykernel==5.1.1
ipython==7.13.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.5.1
iso3166==1.0.1
isort==4.3.21
isoweek==1.3.3
itsdangerous==1.1.0
Janome==0.3.10
jdcal==1.4.1
jedi==0.15.2
jeepney==0.4.3
jieba==0.42.1
Jinja2==2.11.2
jinja2-time==0.2.0
jmespath==0.10.0
joblib==0.14.1
json5==0.9.0
jsonnet==0.16.0
jsonpickle==1.4.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-aihub-deploy-extension==0.1
jupyter-client==6.1.3
jupyter-console==6.1.0
jupyter-core==4.6.3
jupyter-http-over-ws==0.0.8
jupyterlab==1.2.10
jupyterlab-git==0.10.0
jupyterlab-server==1.1.1
kaggle==1.5.6
kaggle-environments==1.0.8
Keras==2.4.0
Keras-Preprocessing==1.1.2
keras-tuner==1.0.1
keyring==21.1.1
kiwisolver==1.2.0
kmapper==1.2.0
kmeans-smote==0.1.2
kmodes==0.10.2
knnimpute==0.1.0
korean-lunar-calendar==0.2.1
kornia==0.3.1
kubernetes==10.1.0
langid==1.1.6
Lasagne @ git+git://github.com/Lasagne/Lasagne.git@5d3c63cb315c50b1cbd27a6bc8664b406f34dd99
lazy-object-proxy==1.4.3
learntools @ git+https://github.com/Kaggle/learntools@886f5c215de079287f21e2d3a92bd852fb95d105
leven==1.0.4
libarchive-c==2.9
librosa==0.7.2
lief==0.9.0
lightfm==1.15
lightgbm==2.3.1
lime==0.2.0.0
line-profiler==3.0.2
llvmlite==0.31.0
lml==0.0.9
locket==0.2.0
LunarCalendar==0.0.9
lxml==4.5.0
Mako==1.1.3
mapclassify==2.3.0
marisa-trie==0.7.5
Markdown==3.2.1
markovify==0.8.2
MarkupSafe==1.1.1
matplotlib==3.2.1
matplotlib-venn==0.11.5
mccabe==0.6.1
memory-profiler==0.57.0
mercantile==1.1.5
missingno==0.4.2
mistune==0.8.4
mizani==0.7.1
mkl-fft==1.1.0
mkl-random==1.1.0
mkl-service==2.3.0
ml-metrics==0.1.4
mlcrate==0.2.0
mlens==0.2.3
mlxtend==0.17.2
mmh3==2.5.1
mne==0.20.7
mnist==0.2.2
mock==3.0.5
more-itertools==8.2.0
mpld3==0.5.1
mplleaflet==0.0.5
mpmath==1.1.0
msgpack==0.6.2
msgpack-numpy==0.4.6.post0
multidict==4.7.6
multipledispatch==0.6.0
multiprocess==0.70.10
munch==2.5.0
murmurhash==1.0.2
mxnet==1.6.0
mypy-extensions==0.4.3
nb-conda==2.2.1
nb-conda-kernels==2.2.3
nbclient==0.2.0
nbconvert==5.6.1
nbdime==2.0.0
nbformat==5.0.6
nbpresent==3.0.2
nervananeon @ file:///usr/local/src/neon
nest-asyncio==1.3.2
netCDF4==1.5.3
networkx==2.4
nibabel==3.1.0
nilearn==0.6.2
nltk==3.2.4
nnabla==1.8.0
nolearn==0.6.1
nose==1.3.7
notebook==5.5.0
notebook-executor==0.2
numba==0.48.0
numexpr==2.7.1
numpy==1.18.1
numpydoc==0.9.2
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.0.1
odfpy==1.4.1
olefile==0.46
onnx==1.7.0
opencv-python==4.2.0.34
openpyxl==3.0.3
openslide-python @ git+git://github.com/rosbo/openslide-python.git@6bb6e3dbae448fe9ccf21a5a2078e9d7e890153c
opt-einsum==3.2.1
optuna==1.5.0
orderedmultidict==1.0.1
ortools==7.7.7810
osmnx==0.14.1
osqp==0.6.1
overrides==3.0.0
OWSLib @ file:///home/conda/feedstock_root/build_artifacts/owslib_1591376955812/work
packaging==20.1
palettable==3.3.0
pandas==1.0.3
pandas-datareader==0.8.1
pandas-profiling==2.6.0
pandas-summary==0.0.7
pandasql==0.7.3
pandoc==1.0.2
pandocfilters==1.4.2
panel==0.9.5
papermill==2.1.0
param==1.9.3
parso==0.5.2
partd==1.1.0
path==13.1.0
path.py==12.4.0
pathlib2==2.3.5
pathos==0.2.6
pathspec==0.8.0
pathtools==0.1.2
patsy==0.5.1
pbr==5.4.5
pdf2image==1.13.1
PDPbox @ git+https://github.com/SauceCat/PDPbox@73c69665f1663b53984e187c7bc8996e25fea18e
pep8==1.7.1
pexpect==4.8.0
phik==0.9.11
pickleshare==0.7.5
Pillow==5.4.1
pkginfo==1.5.0.1
plac==1.1.3
plotly==4.8.1
plotly-express==0.4.1
plotnine==0.7.0
pluggy==0.13.0
ply==3.11
polyglot==16.7.4
portalocker==1.7.0
posix-ipc==1.0.4
pox==0.2.8
poyo==0.5.0
ppca==0.0.4
ppft==1.6.6.2
preprocessing==0.1.13
preshed==3.0.2
prettytable==0.7.2
prometheus-client==0.7.1
promise==2.3
prompt-toolkit==3.0.5
pronouncing==0.2.0
protobuf==3.12.2
psutil==5.7.0
ptyprocess==0.6.0
pudb==2019.2
py==1.8.1
py-cpuinfo==6.0.0
py-lz4framed==0.14.0
py-spy==0.3.3
py-stringmatching==0.4.1
py-stringsimjoin==0.3.1
pyahocorasick==1.4.0
pyaml==20.4.0
PyArabic==0.6.7
pyarrow==0.16.0
pyasn1==0.4.8
pyasn1-modules==0.2.7
PyAstronomy==0.15.0
pybind11==2.5.0
PyBrain==0.3
pycairo==1.19.1
pycodestyle==2.5.0
pycosat==0.6.3
pycountry==19.8.18
pycparser==2.20
pycrypto==2.6.1
pyct==0.4.6
pycurl==7.43.0.5
pydash==4.8.0
pydicom==2.0.0
pydocstyle==5.0.2
pydot==1.4.1
pydub==0.24.1
pyemd==0.5.1
pyepsg==0.4.0
pyexcel-io==0.5.20
pyexcel-ods==0.5.6
pyfasttext==0.4.6
pyflakes==2.1.1
pyglet==1.5.0
Pygments==2.6.1
PyJWT==1.7.1
pykalman==0.9.5
pyLDAvis==2.1.2
pylint==2.4.4
pymc3==3.9.1
PyMeeus==0.3.7
pymongo==3.10.1
Pympler==0.8
pyocr==0.7.2
pyodbc==4.0.30
pyOpenSSL==19.1.0
pypandoc==1.5
pyparsing==2.4.7
pyPdf==1.13
pyperclip==1.8.0
PyPrind==2.11.2
pyproj @ file:///home/conda/feedstock_root/build_artifacts/pyproj_1588596070165/work
PyQt5==5.12.3
PyQt5-sip==4.19.18
PyQtWebEngine==5.12.1
pyrsistent==0.16.0
pysal==2.1.0
pyshp==2.1.0
PySocks==1.7.1
pystan==2.19.1.1
pytagcloud==0.3.5
pytesseract==0.3.4
pytest==5.4.1
pytest-arraydiff==0.3
pytest-astropy==0.7.0
pytest-astropy-header==0.1.2
pytest-cov==2.10.0
pytest-doctestplus==0.4.0
pytest-mock==3.1.1
pytest-openfiles==0.4.0
pytest-remotedata==0.3.1
pytext-nlp==0.1.2
python-dateutil==2.8.1
python-editor==1.0.4
python-igraph @ file:///home/conda/feedstock_root/build_artifacts/python-igraph_1588168236405/work
python-jsonrpc-server==0.3.4
python-language-server==0.31.10
python-Levenshtein==0.12.0
python-louvain==0.14
python-slugify==4.0.0
pytorch-ignite==0.3.0
pytz==2019.3
PyUpSet==0.1.1.post7
pyviz-comms==0.7.5
PyWavelets==1.1.1
pyxdg==0.26
PyYAML==5.3.1
pyzmq==19.0.0
QDarkStyle==2.8.1
qgrid==1.3.1
QtAwesome==0.7.1
qtconsole==4.7.3
QtPy==1.9.0
randomgen==1.16.6
rasterio==1.1.5
ray==0.8.5
redis==3.4.1
regex==2020.4.4
requests==2.23.0
requests-oauthlib==1.2.0
resampy==0.2.2
retrying==1.3.3
rgf-python==3.8.0
rope==0.16.0
rsa==4.0
Rtree==0.9.4
ruamel-yaml==0.15.80
s2sphere==0.2.5
s3fs==0.4.2
s3transfer==0.3.3
sacred==0.8.1
sacremoses==0.0.43
scattertext==0.0.2.65
scikit-image==0.16.2
scikit-learn==0.23.1
scikit-multilearn==0.2.0
scikit-optimize==0.7.4
scikit-plot==0.3.7
scikit-surprise==1.1.0
scipy==1.4.1
scs==2.1.2
seaborn==0.10.0
SecretStorage==3.1.2
Send2Trash==1.5.0
sentencepiece==0.1.91
sentry-sdk==0.15.1
setuptools-git==1.2
shap==0.35.0
Shapely==1.7.0
shortuuid==1.0.1
simplegeneric==0.8.1
SimpleITK==1.2.4
simplejson==3.17.0
singledispatch==3.4.0.3
sip==4.19.20
six==1.14.0
sklearn==0.0
sklearn-contrib-py-earth @ git+git://github.com/scikit-learn-contrib/py-earth.git@dde5f899255411a7b9cbbabf93a817eff4b02e5e
sklearn-pandas==1.8.0
smart-open==2.0.0
smhasher==0.150.1
smmap==3.0.2
snowballstemmer==2.0.0
snuggs==1.4.7
sortedcollections==1.1.2
sortedcontainers==2.1.0
SoundFile==0.10.3.post1
soupsieve==1.9.4
spacy==2.2.4
spectral==0.21
Sphinx==3.0.2
sphinx-rtd-theme==0.2.4
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
sphinxcontrib-websupport==1.2.1
spyder==4.1.2
spyder-kernels==1.9.0
SQLAlchemy==1.3.16
sqlparse==0.3.1
squarify==0.4.3
srsly==1.0.2
statsmodels==0.11.1
stemming==1.0.1
stevedore==2.0.0
stop-words==2018.7.23
stopit==1.1.2
subprocess32==3.5.4
svgwrite==1.4
sympy==1.5.1
tables==3.6.1
tabulate==0.8.7
tangled-up-in-unicode==0.0.4
tblib==1.6.0
tenacity==6.1.0
tensorboard==2.2.2
tensorboard-plugin-wit==1.6.0.post3
tensorboardX==2.0
tensorflow @ file:///tmp/tensorflow_cpu/tensorflow-2.2.0-cp37-cp37m-linux_x86_64.whl
tensorflow-addons @ file:///tmp/tfa_cpu/tensorflow_addons-0.10.0-cp37-cp37m-linux_x86_64.whl
tensorflow-datasets==3.1.0
tensorflow-estimator==2.2.0
tensorflow-gcs-config @ file:///tmp/tensorflow_gcs_config/tensorflow_gcs_config-2.1.7-py3-none-any.whl
tensorflow-hub==0.8.0
tensorflow-metadata==0.22.2
tensorflow-probability==0.10.0
Tensorforce==0.5.5
tensorpack==0.10.1
termcolor==1.1.0
terminado==0.8.3
terminalplot==0.3.0
terminaltables==3.1.0
testpath==0.4.4
text-unidecode==1.3
textblob==0.15.3
texttable==1.6.2
textwrap3==0.9.2
Theano==1.0.4
thinc==7.4.0
threadpoolctl==2.1.0
tifffile==2020.6.3
tinycss2==1.0.2
tokenizers==0.7.0
toml==0.10.0
toolz==0.10.0
torch==1.5.0
torchaudio==0.5.0a0+738ccba
torchtext==0.6.0
torchvision==0.6.0a0+35d732a
tornado==5.0.2
TPOT==0.11.5
tqdm==4.45.0
traitlets==4.3.3
traittypes==0.2.1
transformers==2.11.0
trueskill==0.4.5
tsfresh==0.16.0
typed-ast==1.4.1
typeguard==2.9.1
typing==3.7.4.1
typing-extensions==3.7.4.1
tzlocal==2.1
ujson==1.35
umap-learn @ https://api.github.com/repos/lmcinnes/umap/tarball/5f9488a9540d1e0ac149e2dd42ebf03c39706110
unicodecsv==0.14.1
Unidecode==1.1.1
update-checker==0.17
uritemplate==3.0.1
urllib3==1.24.3
urwid==2.1.0
vecstack==0.4.0
visions==0.4.1
vowpalwabbit==8.8.1
vtk==8.1.2
Wand==0.5.3
wandb==0.9.1
wasabi==0.6.0
watchdog==0.10.2
wavio==0.0.4
wcwidth==0.1.9
webencodings==0.5.1
websocket-client==0.57.0
Werkzeug==1.0.1
wfdb==3.0.1
whichcraft==0.6.1
widgetsnbextension==3.5.1
Wordbatch==1.4.6
wordcloud==1.7.0
wordsegment==1.3.1
wrapt==1.11.2
wurlitzer==2.0.0
xarray==0.15.1
xgboost==1.1.1
xlrd==1.2.0
XlsxWriter==1.2.8
xlwt==1.3.0
xvfbwrapper==0.2.9
yapf==0.29.0
yarl==1.4.2
yellowbrick==1.1
zict==2.0.0
zipp==3.1.0

Steps to reproduce

Example source:

from allennlp.models.archival import load_archive
archive = load_archive(<insert-model-path>)

Annotation crashes from bad request

We got some bad request like the following:

165.123.5.225 - - [18/Sep/2018 08:40:36] code 400, message Bad request syntax ('\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x01ûÎ©?^?ß¾sk\x92þ%[årÅkbwðW\x103\x9cð"\x81lqÛÅ\x00\x00\x94À\x14À')
165.123.5.225 - - [18/Sep/2018 08:40:36] "üûÎ©?^?ß¾skþ%[årÅkbwðW3ð"lqÛÅÀÀ" HTTPStatus.BAD_REQUEST -
165.123.5.225 - - [18/Sep/2018 08:40:51] code 400, message Bad HTTP/0.9 request type ('\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03*²Ë\x18V\x10sé')
WïÏ(Î(=ER§&3½Qm4ª+äÀ0À,À(À$ÀÀ" HTTPStatus.BAD_REQUEST -
165.123.5.225 - - [18/Sep/2018 08:41:02] code 400, message Bad request version ('\x8a¼¦Ö§\x91\x92~Ð\x9cB\x97\x05§\x92\x14l\x03:\x00\x00\x94À\x14À')
165.123.5.225 - - [18/Sep/2018 08:41:02] "üû-ç
@e¾¼¦Ö§~ÐB§l:ÀÀ" HTTPStatus.BAD_REQUEST -

crf_pubmed_rct config

Should crf_pubmed_rct.json have

"classifier_feedforward": {
"input_dim": 600,
"num_layers": 2,
"hidden_dims": [300, 5],
"activations": ["relu", "linear"],
"dropout": [0.2, 0.0]
}

The config. runs only after removing this; otherwise it gives an error: allennlp.common.checks.ConfigurationError: "Extra parameters passed to DiscourseCrfClassifier:

Also, do any of the config json files correspond to a Bi LSTM CRF? They seem to be either a Bi-LSTM or a CRF?

	sentence_masks = util.get_text_field_mask(sentences)

	# get sentence embedding
	encoded_sentences = []
	n_sents = embedded_sentences.size()[1] # size: (n_batch, n_sents, n_tokens, n_embedding)
	for i in range(n_sents):
	encoded_sentences.append(self.sentence_encoder(embedded_sentences[:, i, :, :], sentence_masks))