allenai / catwalk Goto Github PK
View Code? Open in Web Editor NEWThis project studies the performance and robustness of language models and task-adaptation methods.
License: Apache License 2.0
This project studies the performance and robustness of language models and task-adaptation methods.
License: Apache License 2.0
In the Fewshot
branch of Catwalk, run python experiments/num_shots.py -w beaker://ai2/catwalk
. It will fail:
[06/08/22 20:48:05] ERROR Uncaught exception logging.py:373
Traceback (most recent call last):
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/integrations/beaker/workspace.py",
line 91, in step_info
dataset = self.beaker.dataset.get(step_dataset_name(step_or_unique_id))
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/beaker/services/dataset.py", line 51, in get
return _get(dataset)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/beaker/services/dataset.py", line 43, in
_get
self.request(
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/beaker/services/service_client.py", line 98,
in request
raise exceptions_for_status[response.status_code]
beaker.exceptions.DatasetNotFound: 'tango-step-PredictStep-001-YVpCvU4dbvUqC3z2ARe1r8kecyqZuipg': Make sure you're
using a valid Beaker dataset ID or the *full* name of the dataset (with the account prefix, e.g.
'username/dataset_name')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dirkg/catwalk/experiments/num_shots.py", line 73, in <module>
main()
File "/home/dirkg/catwalk/experiments/num_shots.py", line 59, in main
result = metrics.result(workspace)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 534, in result
return self._run_with_work_dir(workspace, needed_by=needed_by)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 362, in
_run_with_work_dir
kwargs = self._replace_steps_with_results(self.kwargs, workspace)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 509, in
_replace_steps_with_results
return {
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 510, in <dictcomp>
key: self._replace_steps_with_results(value, workspace) for key, value in o.items()
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 493, in
_replace_steps_with_results
return o.result(workspace, self)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 534, in result
return self._run_with_work_dir(workspace, needed_by=needed_by)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/step.py", line 376, in
_run_with_work_dir
workspace.step_starting(self)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/integrations/beaker/workspace.py",
line 118, in step_starting
step_info = self.step_info(step)
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/integrations/beaker/workspace.py",
line 106, in step_info
raise KeyError(step_or_unique_id)
KeyError: <catwalk.steps.PredictStep object at 0x7f38e2d834c0>
Motivation: Full fine-tuning is a baseline, or rather an upper bound, in many zero-shot and few-shot experiments. @pdasigi has explicitly asked for this.
As part of this work, we'll add a new Tango step to Catwalk that trains a model on a given task/dataset, or on multiple tasks/datasets at the same time. It should call into Tango's training functions to do so. We'll also need to add a method or two to Catwalk's Model
class to make this happen. Then we'll do a full evaluation on all reasonable tasks and all reasonable models, to establish good baselines across the board. This might make for a good blog post, too.
As a stretch goal, we should also try to train adaptation methods like prompt tuning, prefix tuning, or even IA3. There are some very nice implementations of some methods at https://github.com/r-three/t-few/tree/master/src.
@lolipopshock wrote us a nice guide on how to get started here: https://docs.google.com/document/d/1lBHt5S0wMfLlNNNbXCSaNE4kJRTOjrY7GJ5nWG958FI
The short of it is, we'll do this on top of the Megatron/metaseq codebase, not on top of Huggingface. We should make it an optional "integration" for catwalk, like we have in Tango, because of all the dependencies it pulls in.
@dirkgr noticed that the use of mypy_extensions.KwArg here could instead make use of typing.Protocol as in here.
This may allow us to support python >= 3.8.
See this SO post for a quick example.
This should probably T0?
We'll need a way to specify default prompts for model/task combinations. #2 (comment)
... official numbers from EAI, that is
This probably depends on #4.
This probably depends on #4.
I have a reliable repro where the WandB workspace puts itself into a state that it can't recover from.
In this repo, on the Fewshot
branch, I run python experiments/num_shots.py -w wandb://allennlp/catwalk --batch_size 4
.
It will complain like this:
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/tango/integrations/wandb/workspace.py", line
147, in step_starting
raise StepStateError(
tango.common.exceptions.StepStateError: Step 'PredictStep-001-aUzzaKky7tw1rXhp38mMWuboeag8cCGw' is in unexpected
state 'running' If you are certain the step is not running somewhere else, delete the lock file at
/home/dirkg/.cache/tango/wandb_workspace/PredictStep-001-aUzzaKky7tw1rXhp38mMWuboeag8cCGw/lock.
Exception ignored in: <function BaseFileLock.__del__ at 0x7f401c167dc0>
Traceback (most recent call last):
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/filelock/_api.py", line 234, in __del__
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/filelock/_api.py", line 204, in release
File "/home/dirkg/miniconda3/envs/catwalk/lib/python3.9/site-packages/filelock/_unix.py", line 49, in _release
TypeError: 'NoneType' object is not callable
When I remove /home/dirkg/.cache/tango/wandb_workspace/PredictStep-001-aUzzaKky7tw1rXhp38mMWuboeag8cCGw/lock
, the error comes back.
As noticed by @epwalsh the load_weights
argument in which is documented in cached_transformers.get() is not actually implemented. I checked and it is also not a supported kwarg of AutoModel.from_pretrained(). I'll look into supporting this or removing it.
Motivation: It's a good baseline that should be easy to implement in the catwalk context, but nobody has asked for it.
Described by Liu at al like this:
Min et al. [21] proposed ensemble ICL, where instead of using the output probability from concatenating the k training examples, the output probabilities of the model on each training example (i.e. 1-shot ICL for each of the k examples) are multiplied together. This lowers the memory cost by a factor of k/2 but increases the computational cost by a factor of 2. In terms of task performance, Min et al. [21] find that ensemble ICL outperforms the standard concatenative variant.
This depends on first getting normal few-shot ICL working on Catwalk.
Took a while to debug why google/t5-v1_1-small
wasn't working even though it's registered in models/__init__.py
. It's not obvious how the shortened name is particularly beneficial whereas the cost is that it's hard to know the right flags for model names to pass to catwalk.
Recommend either removing this shortener or at least always supporting the full name of the model.
Haven't tested this for Tasks, but I imagine similar type of issue.
Motivation: Various people have asked for various additions to Catwalk already. It's risky because nobody is using Catwalk yet. But we have several people who said they want to (Pradeep, Matt/Hamish, Iz?, Ludwig).
Here are the sub-projects in order of importance:
The trainable_copy
method of RankClassificationModel
should probably be initialized with its self.override_weights_file
. This would look like:
def trainable_copy(self) -> TrainableModel:
return TrainableRankClassificationModel(
self._make_model(self.pretrained_model_name_or_path, override_weights_file=self.override_weights_file),
cached_transformers.get_tokenizer(AutoTokenizer, self.pretrained_model_name_or_path),
self.predict_chunk
)
I need to investigate how trainable_copy
is used though to make sure this is correct.
CrossFit has a somewhat unified format for their tasks. We could use it to get a bunch of tasks with very little code.
Here is a list of patterns that @ibeltagy found in CrossFit:
classification
- plan input / output: https://github.com/INK-USC/CrossFit/blob/master/tasks/ade_classification.py
- title: .... [SEP] content: ... https://github.com/INK-USC/CrossFit/blob/master/tasks/amazon_polarity.py
- premise: ... [SEP] hypothesis: ....https://github.com/INK-USC/CrossFit/blob/master/tasks/anli.py
- observation1: ...[SEP] observation2: ... [SEP] hypothesis1: .... ..... https://github.com/INK-USC/CrossFit/blob/master/tasks/art.py
- question: .... [SEP] context: .... https://github.com/INK-USC/CrossFit/blob/master/tasks/boolq.py
- ... [SEP] .... https://github.com/INK-USC/CrossFit/blob/master/tasks/scicite.py
- ... and many more similar to above with different field names
text to text
- summarize: .....
- https://github.com/INK-USC/CrossFit/blob/master/tasks/gigaword.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/multi_news.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/reddit_tifu.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/samsum.py
-
- question: ... context: ...
- https://github.com/INK-USC/CrossFit/blob/master/tasks/adversarial_qa.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/ropes.py
- (Most follow this template)
- question: ... [SEP] category: ...
- https://github.com/INK-USC/CrossFit/blob/master/tasks/jeopardy.py
- very few follow this template
- ... [SEP] ....
- https://github.com/INK-USC/CrossFit/blob/master/tasks/ade_effect.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/definite_pronoun_resolution.py
- ..<question string>.. [SEP] ..<context string>.. [SEP] ..<choices>... https://github.com/INK-USC/CrossFit/blob/master/tasks/cosmos_qa.py
- <question string>. <choices>.
- https://github.com/INK-USC/CrossFit/blob/master/tasks/ai2_arc.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/hellaswag.py
- should have been converted to classification
- (multiple choice datasets is a huge mess)
- question: ... https://github.com/INK-USC/CrossFit/blob/master/tasks/break.py
-
sequence tagging:
- ... [SEP] acronym: .... https://github.com/INK-USC/CrossFit/blob/master/tasks/acronym_identification.py
- <string>
- input: <string>
- output: <entity> [SEP] <entity> ....
- https://github.com/INK-USC/CrossFit/blob/master/tasks/limit.py
-
-
regression
- review: ... https://github.com/INK-USC/CrossFit/blob/master/tasks/app_reviews.py
- https://github.com/INK-USC/CrossFit/blob/master/tasks/google_wellformed_query.py
- question: .... [SEP] context: ... https://github.com/INK-USC/CrossFit/blob/master/tasks/mocha.py
-
Other:
- https://github.com/INK-USC/CrossFit/blob/master/tasks/numer_sense.py
EAI tasks that are not on CrossFit.
Total task files in EAI: 47
Missing from CrossFit: 28
Not on HF dataset
Available on HF dataset
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.