Coder Social home page Coder Social logo

fewie's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sanjanasri

fewie's Issues

initial "python evaluate.py --help" fails

When the project is intially set up (git clone, create environment, pip install), the command python evaluate.py --help fails with:

Traceback (most recent call last):
  File "/mnt/DATA/DEVELOPING/dfki/lenovo/fewie/evaluate.py", line 26, in <module>
    evaluate()
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/main.py", line 32, in decorated_main
    _run_hydra(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/utils.py", line 327, in _run_hydra
    hydra.app_help(config_name=config_name, args_parser=args_parser, args=args)
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 328, in app_help
    cfg = self.compose_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 256, in _load_configuration
    cfg = self._merge_defaults_into_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 805, in _merge_defaults_into_config
    hydra_cfg = merge_defaults_list_into_config(hydra_cfg, user_list)
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 777, in merge_defaults_list_into_config
    merged_cfg = self._merge_config(
  File "/home/arne/miniconda3/envs/fewie/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 715, in _merge_config
    raise MissingConfigException(msg, new_cfg, options)
hydra.errors.MissingConfigException: Could not load dataset_processor/transformers.
Available options:
	bert
	spanbert
	transformer

Interestingly, the command works if another one like python evaluate.py dataset=conll2003 dataset_processor=bert encoder=bert evaluation/dataset=nway_kshot_5_1 was successfully called at least once.

Fix german encoders

  • gottbert-base
    using uklfr/gottbert-base, implemented but seems to require custom tokenizer/processor
    Update: Fixed by inheriting from Roberta tokenizer.
    Error executing job with overrides: ['dataset=smartdata', 'encoder=gottbert-base', 'dataset_processor=gottbert-base', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
  File "evaluate.py", line 20, in evaluate
    evaluation_results = evaluate_config(cfg)
  File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
    processed_dataset = dataset_processor(dataset)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 36, in __call__
    return dataset.map(self.tokenize_and_align_labels, batched=True)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/gottbert.py", line 39, in tokenize_and_align_labels
    tokenized_inputs = self.tokenizer(
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2368, in __call__
    return self.batch_encode_plus(
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2553, in batch_encode_plus
    return self._batch_encode_plus(
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/gpt2/tokenization_gpt2_fast.py", line 158, in _batch_encode_plus
    assert self.add_prefix_space or not is_split_into_words, (
AssertionError: You need to instantiate RobertaTokenizerFast with add_prefix_space=True to use it with pretokenized inputs.
  • xlm-clm-ende-1024
    implemented
    Update: ignore this encoder since it is troublesome to assign word_ids manually
    Error executing job with overrides: ['dataset=smartdata', 'encoder=xlm-ende', 'dataset_processor=xlm-ende', 'evaluation/dataset=nway_kshot_5_1']
Traceback (most recent call last):
  File "evaluate.py", line 20, in evaluate
    evaluation_results = evaluate_config(cfg)
  File "/opt/conda/lib/python3.8/site-packages/fewie/eval.py", line 37, in evaluate_config
    processed_dataset = dataset_processor(dataset)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 36, in __call__
    return dataset.map(self.tokenize_and_align_labels, batched=True)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2016, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/opt/conda/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/fewie/dataset_processors/xlm-ende.py", line 50, in tokenize_and_align_labels
    word_ids = tokenized_inputs.word_ids(batch_index=i)
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 353, in word_ids
    raise ValueError("word_ids() is not available when using Python-based tokenizers")
ValueError: word_ids() is not available when using Python-based tokenizers

Questions about how to make your code work

I was looking at your code and had a question, so I opened an issue.

I want to apply this code to a new dataset I created.

However, I saw that only evaluate.py exists. If you have a code like train.py, can you share it with us?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.