sapienzanlp / extend Goto Github PK

Entity Disambiguation as text extraction (ACL 2022)

License: Other

Dockerfile 1.37% Shell 5.63% Python 93.00%

natural-language-processing nlp entity-disambiguation entity-linking entity-disambiguation-models text-extraction pytorch acl acl2022

extend's People

Contributors

Stargazers

Watchers

Forkers

danielbis databill86 sidney1994 stjordanis hawksilent andrew-henry thiswhub eatinghungry tommasopetrolito arnabkar

extend's Issues

Web service

Hi! I just followed the steps for running the Docker image. The UI works in port 22001 but I can't reach the API in 22002. It says "Empty reply from server" when sending data and also with the /docs.
Any idea about how to solve it?
Thanks

chinese dataset

Do you support disambiguation of Chinese entities

File not found Error: While adding extend to spacy nlp pipeline

Using the same classy version mentioned in requirements.txt

Traceback (most recent call last):
File "spacy_extend.py", line 13, in
nlp.add_pipe("extend", after="ner", config=extend_config)
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/spacy/language.py", line 792, in add_pipe
pipe_component = self.create_pipe(
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/spacy/language.py", line 674, in create_pipe
resolved = registry.resolve(cfg, validate=validate)
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 746, in resolve
resolved, _ = cls._make(
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 795, in _make
filled, _, resolved = cls._fill(
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 867, in fill
getter_result = getter(*args, **kwargs)
File "/home/vasista/extend/extend/spacy_component.py", line 86, in init
self.model = load_checkpoint(checkpoint_path, device)
File "/home/vasista/extend/extend/spacy_component.py", line 22, in load_checkpoint
model = load_classy_module_from_checkpoint(checkpoint_path)
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/classy/utils/lightning.py", line 57, in load_classy_module_from_checkpoint
conf = load_training_conf_from_checkpoint(checkpoint_path)
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/classy/utils/lightning.py", line 23, in load_training_conf_from_checkpoint
conf = OmegaConf.load(f"{experiment_folder}/.hydra/{conf_file}")
File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 183, in load
with io.open(os.path.abspath(file), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/vasista/extend/.hydra/config.yaml'

Spacy example returns None

Hi, I'm trying out your system and after installing everything the right way (some packages needed to be down/upgraded), I ran the spacy example on the longformer with le&titov's candidate sets.

The code is the following and the output warning message is below.

It says that dataset.base is empty. Might that be the problem?

Output: [('Japan', None), ('Syria', None), ('Friday', None)]

import spacy
from extend import spacy_component

nlp = spacy.load("en_core_web_sm")

extend_config = dict(
    checkpoint_path="../extend-longformer-large/2021-10-22/09-11-39/checkpoints/best.ckpt",
    mentions_inventory_path="../le-and-titov-2018-inventory.min-count-2.sqlite3",
    device=0,
    tokens_per_batch=4000,
)

nlp.add_pipe("extend", after="ner", config=extend_config)

input_sentence = "Japan began the defence of their title " \
                 "with a lucky 2-1 win against Syria " \
                 "in a championship match on Friday."

doc = nlp(input_sentence)

# [(Japan, Japan National Footbal Team), (Syria, Syria National Footbal Team)]
disambiguated_entities = [(ent.text, ent._.disambiguated_entity) for ent in doc.ents]






2022-05-05 13:39:07.458 WARNING classy.data.dataset.base: Token batch size 4000 < max length 4096. This might result in batches with only 1 sample that contain more token than the specified token batch size
2022-05-05 13:39:07.459 WARNING classy.data.dataset.base: Dataset empty

Error when training the model

Hi, I have tried to train the model with the command you provided
classy train qa ./Datasets/aida -n my-model-name --profile aida-longformer-large-gam -pd extend

I have also prepared the training data with the right name and extension, but the command gives me this error
Traceback (most recent call last): File "/opt/miniconda3/envs/extend/bin/classy", line 8, in <module> sys.exit(main()) File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/classy/scripts/cli/__init__.py", line 141, in main import_module_and_submodules(to_import) File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/classy/scripts/cli/utils.py", line 201, in import_module_and_submodules import_module_and_submodules(subpackage, exclude=exclude) File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/classy/scripts/cli/utils.py", line 201, in import_module_and_submodules import_module_and_submodules(subpackage, exclude=exclude) File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/classy/scripts/cli/utils.py", line 190, in import_module_and_submodules module = importlib.import_module(package_name) File "/opt/miniconda3/envs/extend/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 680, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 850, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "/home/agazzi/extend/extend/demo/serve.py", line 2, in <module> import spacy File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/__init__.py", line 14, in <module> from . import pipeline # noqa: F401 File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/pipeline/__init__.py", line 1, in <module> from .attributeruler import AttributeRuler File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py", line 6, in <module> from .pipe import Pipe File "spacy/pipeline/pipe.pyx", line 8, in init spacy.pipeline.pipe File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/training/__init__.py", line 11, in <module> from .callbacks import create_copy_from_base_model # noqa: F401 File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/training/callbacks.py", line 3, in <module> from ..language import Language File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/language.py", line 25, in <module> from .training.initialize import init_vocab, init_tok2vec File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/training/initialize.py", line 14, in <module> from .pretrain import get_tok2vec_ref File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/training/pretrain.py", line 16, in <module> from ..schemas import ConfigSchemaPretrain File "/opt/miniconda3/envs/extend/lib/python3.9/site-packages/spacy/schemas.py", line 216, in <module> class TokenPattern(BaseModel): File "pydantic/main.py", line 299, in pydantic.main.ModelMetaclass.__new__ File "pydantic/fields.py", line 411, in pydantic.fields.ModelField.infer File "pydantic/fields.py", line 342, in pydantic.fields.ModelField.__init__ File "pydantic/fields.py", line 451, in pydantic.fields.ModelField.prepare File "pydantic/fields.py", line 545, in pydantic.fields.ModelField._type_analysis File "pydantic/fields.py", line 550, in pydantic.fields.ModelField._type_analysis File "/opt/miniconda3/envs/extend/lib/python3.9/typing.py", line 852, in __subclasscheck__ return issubclass(cls, self.__origin__) TypeError: issubclass() arg 1 must be a class

REST service not working

Hi,

I tried to run a REST service with the docker image that you make available. However, I do not receive any result when I try to disambiguate even simple sentences.

Example:

resp = requests.post(url="http://127.0.0.1:22002/", data='[{"text":"Bob Dylan is a famous singer."}]')

The json result is

[{'text': 'Bob Dylan is a singer.', 'disambiguated_entities': []}]

train the model

I am very interested in your work. I follow your training script in the readme "classy train qa data/aida -n ex --profile aida-longformer-large-gam -pd extend" to train the file, but the error "FileExistsError: [Errno 17] File exists: '/home/ta/zgb/extend-main/experiments/ex/2023-08-03/23-40-46/resources/allenai/longformer-large-4096'". But I didn't create such a file before this. Can you help me with this question, thank you very much.

Can we use this to entity linking for non ambiguous entities

Hi,

How can i use this code to name entities which are non-ambiguous and disambiguate entities which are ambiguous? Currently this code seems to resolve entities which are ambiguous. I want to name entities in the same code which are non-ambiguous too.
Pls suggest.

Torch not compiled with CUDA enabled

Hi,

Run spacy with the "extend" but got below error.

(extend) ubuntu@ip-172-31-3-241:~/extend$ python spacy_test.py 
Some weights of the model checkpoint at allenai/longformer-large-4096 were not used when initializing LongformerForQuestionAnswering: ['lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LongformerForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LongformerForQuestionAnswering were not initialized from the model checkpoint at allenai/longformer-large-4096 and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "spacy_test.py", line 13, in <module>
    nlp.add_pipe("extend", after="ner", config=extend_config)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/spacy/language.py", line 792, in add_pipe
    pipe_component = self.create_pipe(
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/spacy/language.py", line 674, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/thinc/config.py", line 746, in resolve
    resolved, _ = cls._make(
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/thinc/config.py", line 795, in _make
    filled, _, resolved = cls._fill(
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/thinc/config.py", line 867, in _fill
    getter_result = getter(*args, **kwargs)
  File "/home/ubuntu/extend/extend/spacy_component.py", line 86, in __init__
    self.model = load_checkpoint(checkpoint_path, device)
  File "/home/ubuntu/extend/extend/spacy_component.py", line 24, in load_checkpoint
    model.to(torch.device(device))
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 111, in to
    return super().to(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 852, in to
    return self._apply(convert)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 552, in _apply
    param_applied = fn(param)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/nn/modules/module.py", line 850, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/ubuntu/miniconda3/envs/extend/lib/python3.8/site-packages/torch/cuda/__init__.py", line 166, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I think we can't install Cuda without GPU on the machine.
Can we solve this without GPU?

Thanks

Failure to train the model using classy

Hi, nice work!

I would like to reproduce your training process. I installed the dependencies and downloaded the dataset following the README. I have renamed aida-train-kilt.jsonl to train.jsonl etc. I used the following command in the root directory

classy train qa data/aida -n my-model-name --profile aida-longformer-large-gam -pd extend

and got the following error

Error executing job with overrides: ['device=cuda', 'exp_name=my-model-name', 'data.datamodule.dataset_path=data/aida']
Traceback (most recent call last):
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/scripts/cli/train.py", line 620, in <lambda>
    lambda cfg: _main_mock(cfg, blames=blames if args.print else None)
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/scripts/cli/train.py", line 208, in _main_mock
    train(cfg)
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/scripts/model/train.py", line 22, in train
    pl_data_module.prepare_data()
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
    fn(*args, **kwargs)
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/data/data_modules.py", line 161, in prepare_data
    shuffle_and_store_dataset(
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/utils/data.py", line 39, in shuffle_and_store_dataset
    samples = shuffle_dataset(dataset_path, data_driver)
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/utils/data.py", line 29, in shuffle_dataset
    samples = load_dataset(dataset_path, data_driver)
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/utils/data.py", line 22, in load_dataset
    return list(data_driver.read_from_path(dataset_path))
  File "/home/ICT2000/jxu/miniconda3/envs/extend/lib/python3.8/site-packages/classy/data/data_drivers.py", line 620, in read
    yield QASample(**json.loads(line))
TypeError: __init__() missing 2 required positional arguments: 'context' and 'question'

I see that in extend/data you have another data_drivers, but classy still used their version of it. Since I am new to classy I am not sure what should I proceed next.
Thank you!

sapienzanlp / extend Goto Github PK

extend's People

Contributors

Stargazers

Watchers

Forkers

extend's Issues

Web service

chinese dataset

File not found Error: While adding extend to spacy nlp pipeline

Spacy example returns None

Error when training the model

REST service not working

train the model

Can we use this to entity linking for non ambiguous entities

Torch not compiled with CUDA enabled

Failure to train the model using classy

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent