dfki-nlp / fewie Goto Github PK
View Code? Open in Web Editor NEWFew-shot named entity recognition
License: MIT License
Few-shot named entity recognition
License: MIT License
When the project is intially set up (git clone, create environment, pip install), the command python evaluate.py --help
fails with:
Traceback (most recent call last):
File "/mnt/DATA/DEVELOPING/dfki/lenovo/fewie/evaluate.py", line 26, in <module>
evaluate()
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/main.py", line 32, in decorated_main
_run_hydra(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/utils.py", line 327, in _run_hydra
hydra.app_help(config_name=config_name, args_parser=args_parser, args=args)
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 328, in app_help
cfg = self.compose_config(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
cfg = self.config_loader.load_configuration(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
return self._load_configuration(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
cfg = self._merge_defaults_into_config(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 805, in _merge_defaults_into_config
hydra_cfg = merge_defaults_list_into_config(hydra_cfg, user_list)
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 777, in merge_defaults_list_into_config
merged_cfg = self._merge_config(
File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 715, in _merge_config
raise MissingConfigException(msg, new_cfg, options)
hydra.errors.MissingConfigException: Could not load dataset_processor/transformers.
Available options:
bert
spanbert
transformer
Interestingly, the command works if another one like python evaluate.py dataset=conll2003 dataset_processor=bert encoder=bert evaluation/dataset=nway_kshot_5_1
was successfully called at least once.
Error executing job with overrides: ['dataset=smartdata', 'encoder=gottbert-base', 'dataset_processor=gottbert-base', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
File "evaluate.py", line 20, in evaluate
evaluation_results = evaluate_config(cfg)
File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
processed_dataset = dataset_processor(dataset)
File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 36, in __call__
return dataset.map(self.tokenize_and_align_labels, batched=True)
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
return self._map_single(
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
batch = apply_function_on_filtered_inputs(
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 39, in tokenize_and_align_labels
tokenized_inputs = self.tokenizer(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2368, in __call__
return self.batch_encode_plus(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2553, in batch_encode_plus
return self._batch_encode_plus(
File "/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/tokenization_gpt2_fast.py", line 158, in _batch_encode_plus
assert self.add_prefix_space or not is_split_into_words, (
AssertionError: You need to instantiate RobertaTokenizerFast with add_prefix_space=True to use it with pretokenized inputs.
word_ids
manually Error executing job with overrides: ['dataset=smartdata', 'encoder=xlm-ende', 'dataset_processor=xlm-ende', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
File "evaluate.py", line 20, in evaluate
evaluation_results = evaluate_config(cfg)
File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
processed_dataset = dataset_processor(dataset)
File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 36, in __call__
return dataset.map(self.tokenize_and_align_labels, batched=True)
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
return self._map_single(
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
batch = apply_function_on_filtered_inputs(
File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 50, in tokenize_and_align_labels
word_ids = tokenized_inputs.word_ids(batch_index=i)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 353, in word_ids
raise ValueError("word_ids() is not available when using Python-based tokenizers")
ValueError: word_ids() is not available when using Python-based tokenizers
I was looking at your code and had a question, so I opened an issue.
I want to apply this code to a new dataset I created.
However, I saw that only evaluate.py exists. If you have a code like train.py, can you share it with us?
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.