dfki-nlp / fewie Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 1.0 38.28 MB

Few-shot named entity recognition

License: MIT License

Makefile 0.60% Python 64.67% Jupyter Notebook 34.73%

fewie's People

Stargazers

Watchers

Forkers

sanjanasri

fewie's Issues

initial "python evaluate.py --help" fails

When the project is intially set up (git clone, create environment, pip install), the command python evaluate.py --help fails with:

Traceback (most recent call last):
  File "/mnt/DATA/DEVELOPING/dfki/lenovo/fewie/evaluate.py", line 26, in <module>
    evaluate()
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/utils.py", line 327, in _run_hydra
    hydra.app_help(config_name=config_name, args_parser=args_parser, args=args)
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 328, in app_help
    cfg = self.compose_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
    cfg = self._merge_defaults_into_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 805, in _merge_defaults_into_config
    hydra_cfg = merge_defaults_list_into_config(hydra_cfg, user_list)
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 777, in merge_defaults_list_into_config
    merged_cfg = self._merge_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 715, in _merge_config
    raise MissingConfigException(msg, new_cfg, options)
hydra.errors.MissingConfigException: Could not load dataset_processor/transformers.
Available options:
	bert
	spanbert
	transformer

Interestingly, the command works if another one like python evaluate.py dataset=conll2003 dataset_processor=bert encoder=bert evaluation/dataset=nway_kshot_5_1 was successfully called at least once.

Fix german encoders

gottbert-base
using uklfr/gottbert-base, implemented but seems to require custom tokenizer/processor
Update: Fixed by inheriting from Roberta tokenizer.

    Error executing job with overrides: ['dataset=smartdata', 'encoder=gottbert-base', 'dataset_processor=gottbert-base', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
  File "evaluate.py", line 20, in evaluate
    evaluation_results = evaluate_config(cfg)
  File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
    processed_dataset = dataset_processor(dataset)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 36, in __call__
    return dataset.map(self.tokenize_and_align_labels, batched=True)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 39, in tokenize_and_align_labels
    tokenized_inputs = self.tokenizer(
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2368, in __call__
    return self.batch_encode_plus(
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2553, in batch_encode_plus
    return self._batch_encode_plus(
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/tokenization_gpt2_fast.py", line 158, in _batch_encode_plus
    assert self.add_prefix_space or not is_split_into_words, (
AssertionError: You need to instantiate RobertaTokenizerFast with add_prefix_space=True to use it with pretokenized inputs.

xlm-clm-ende-1024
implemented
Update: ignore this encoder since it is troublesome to assign word_ids manually

    Error executing job with overrides: ['dataset=smartdata', 'encoder=xlm-ende', 'dataset_processor=xlm-ende', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
  File "evaluate.py", line 20, in evaluate
    evaluation_results = evaluate_config(cfg)
  File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
    processed_dataset = dataset_processor(dataset)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 36, in __call__
    return dataset.map(self.tokenize_and_align_labels, batched=True)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 50, in tokenize_and_align_labels
    word_ids = tokenized_inputs.word_ids(batch_index=i)
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 353, in word_ids
    raise ValueError("word_ids() is not available when using Python-based tokenizers")
ValueError: word_ids() is not available when using Python-based tokenizers

Questions about how to make your code work

I was looking at your code and had a question, so I opened an issue.

I want to apply this code to a new dataset I created.

However, I saw that only evaluate.py exists. If you have a code like train.py, can you share it with us?

Thank you

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.