huggingface / optimum-graphcore Goto Github PK

Blazing fast training of 🤗 Transformers on Graphcore IPUs

License: Apache License 2.0

Makefile 0.07% Python 50.90% Jupyter Notebook 48.34% C++ 0.69%

machine-learning pytorch training transformers graphcore fine-tuning

optimum-graphcore's Introduction

Optimum Graphcore

🤗 Optimum Graphcore is the interface between the 🤗 Transformers library and Graphcore IPUs. It provides a set of tools enabling model parallelization and loading on IPUs, training, fine-tuning and inference on all the tasks already supported by 🤗 Transformers while being compatible with the 🤗 Hub and every model available on it out of the box.

What is an Intelligence Processing Unit (IPU)?

Quote from the Hugging Face blog post:

IPUs are the processors that power Graphcore’s IPU-POD datacenter compute systems. This new type of processor is designed to support the very specific computational requirements of AI and machine learning. Characteristics such as fine-grained parallelism, low precision arithmetic, and the ability to handle sparsity have been built into our silicon.

Instead of adopting a SIMD/SIMT architecture like GPUs, Graphcore’s IPU uses a massively parallel, MIMD architecture, with ultra-high bandwidth memory placed adjacent to the processor cores, right on the silicon die.

This design delivers high performance and new levels of efficiency, whether running today’s most popular models, such as BERT and EfficientNet, or exploring next-generation AI applications.

Poplar SDK setup

A Poplar SDK environment needs to be enabled to use this library. Please refer to Graphcore's Getting Started guides.

Install

To install the latest release of this package:

pip install optimum-graphcore

Optimum Graphcore is a fast-moving project, and you may want to install from source.

pip install git+https://github.com/huggingface/optimum-graphcore.git

Installing in developer mode

If you are working on the optimum-graphcore code then you should use an editable install by cloning and installing optimum and optimum-graphcore:

git clone https://github.com/huggingface/optimum --branch v1.6.1-release
git clone https://github.com/huggingface/optimum-graphcore
pip install -e optimum -e optimum-graphcore

Now whenever you change the code, you'll be able to run with those changes instantly.

Running the examples

There are a number of examples provided in the examples directory. Each of these contains a README with command lines for running them on IPUs with Optimum Graphcore.

Please install the requirements for every example:

cd <example-folder>
pip install -r requirements.txt

How to use Optimum Graphcore

🤗 Optimum Graphcore was designed with one goal in mind: make training and evaluation straightforward for any 🤗 Transformers user while leveraging the complete power of IPUs. It requires minimal changes if you are already using 🤗 Transformers.

To immediately use a model on a given input (text, image, audio, ...), we support the pipeline API:

->>> from transformers import pipeline
+>>> from optimum.graphcore import pipeline

# Allocate a pipeline for sentiment-analysis
->>> classifier = pipeline('sentiment-analysis', model="distilbert-base-uncased-finetuned-sst-2-english")
+>>> classifier = pipeline('sentiment-analysis', model="distilbert-base-uncased-finetuned-sst-2-english", ipu_config = "Graphcore/distilbert-base-ipu")
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996947050094604}]

It is also super easy to use the Trainer API:

-from transformers import Trainer, TrainingArguments
+from optimum.graphcore import IPUConfig, IPUTrainer, IPUTrainingArguments

-training_args = TrainingArguments(
+training_args = IPUTrainingArguments(
     per_device_train_batch_size=4,
     learning_rate=1e-4,
+    # Any IPUConfig on the Hub or stored locally
+    ipu_config_name="Graphcore/bert-base-ipu",
+)
+
+# Loading the IPUConfig needed by the IPUTrainer to compile and train the model on IPUs
+ipu_config = IPUConfig.from_pretrained(
+    training_args.ipu_config_name,
 )

 # Initialize our Trainer
-trainer = Trainer(
+trainer = IPUTrainer(
     model=model,
+    ipu_config=ipu_config,
     args=training_args,
     train_dataset=train_dataset if training_args.do_train else None,
     ...  # Other arguments

For more information, refer to the full 🤗 Optimum Graphcore documentation.

Supported models

The following model architectures and tasks are currently supported by 🤗 Optimum Graphcore:

	Pre-Training	Masked LM	Causal LM	Seq2Seq LM (Summarization, Translation, etc)	Sequence Classification	Token Classification	Question Answering	Multiple Choice	Image Classification	CTC
BART	✅		❌	✅	✅		❌
BERT	✅	✅	❌		✅	✅	✅	✅
ConvNeXt	✅								✅
DeBERTa	✅	✅			✅	✅	✅
DistilBERT	❌	✅			✅	✅	✅	✅
GPT-2	✅		✅		✅	✅
GroupBERT	✅	✅	❌		✅	✅	✅	✅
HuBERT	❌				✅					✅
LXMERT	❌						✅
RoBERTa	✅	✅	❌		✅	✅	✅	✅
T5	✅			✅
ViT	❌								✅
Wav2Vec2	✅									✅
Whisper	❌			✅

If you find any issue while using those, please open an issue or a pull request.

optimum-graphcore's People

Contributors

Stargazers

Watchers

optimum-graphcore's Issues

Allow setting max-weight-norm optimizer parameter

[From AlexC in GC]
The reference implementation uses a value of 10 for this parameter. However the implementation in GC-Optimum passes None, with no possibility for the user to set this parameter

create training command line option --max-weight-norm it can default to None for backwards compatibility
in IPUTrainer.create_optimizer pass this argument to LAMB optimizer as part of optimizer_kwargs

Bug in saving and loading BERT checkpoints during training

I found a bug with our BERT checkpointing with the deparallelize and parallelize. If it outputs checkpoints during the training then the weights on the host are somehow corrupted and the eval scores are really bad.
After many experiments I’ve narrowed it down to the fused-qkv replacement we do in parallelize.

Can reproduce by running BERT for SQuAD with checkpointing enabled:

python examples/question-answering/run_qa.py \
  --model_name_or_path bert-base-uncased \
  --ipu_config_name Graphcore/bert-base-ipu \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --num_train_epochs 1 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --learning_rate 6e-5 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --seed 42\
  --lr_scheduler_type linear \
  --loss_scaling 64 \
  --weight_decay 0.01 \
  --warmup_ratio 0.1 \
  --logging_steps 10 \
  --save_steps 100 \
  --dataloader_num_workers 64 \
  --output_dir squad_bert_base \
  --overwrite_output_dir

If you set --save_steps -1 you should get a reasonable validation score because checkpointing is disabled. While if you set it to say 100 then the eval score should be broken like F1 ~50%.

error of 'WhisperConfig for this kind of AutoModel: AutoModelForCTC'

Hi, I follow the readme.md to run the whisper example program,
pip install optimum[graphcore] optuna
and my Python package is like the below:
transformers==4.25.1
optimum==1.6.1
optimum-graphcore ==0.6.1
but when I run the run_whisper_pipeline.py, I got the error:

[05:08:47.252] [poptorch::python] [critical] ValueError: Unrecognized configuration class <class 'transformers.models.whisper.configuration_whisper.WhisperConfig'> for this kind of AutoModel: AutoModelForCTC.
Model type should be one of Data2VecAudioConfig, HubertConfig, MCTCTConfig, SEWConfig, SEWDConfig, UniSpeechConfig, UniSpeechSatConfig, Wav2Vec2Config, Wav2Vec2ConformerConfig, WavLMConfig.

Traceback (most recent call last):
  File "run_whisper_pipeline.py", line 104, in <module>
    ipu_pipeline = pipeline(
  File "/localdata/cn-customer-engineering/xudongz/onnxfs2/fs2_/lib/python3.8/site-packages/optimum/graphcore/pipelines/__init__.py", line 366, in pipeline
    model = SUPPORTED_TASKS[targeted_task]["class"][0].from_pretrained(model_id, revision=revision)
  File "/localdata/cn-customer-engineering/xudongz/onnxfs2/fs2_/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 466, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.whisper.configuration_whisper.WhisperConfig'> for this kind of AutoModel: AutoModelForCTC.
Model type should be one of Data2VecAudioConfig, HubertConfig, MCTCTConfig, SEWConfig, SEWDConfig, UniSpeechConfig, UniSpeechSatConfig, Wav2Vec2Config, Wav2Vec2ConformerConfig, WavLMConfig.

Define metrics to validate performance of BERT on the pretraining task

Let's run the full training to measure:

The time it takes to reach different milestones (seqlen 128, seqlen 384, seqlen 512) [minutes]
The (min, max) loss values within all the milestones [loss - N/A]
The ppl (min, max) values within all the milestones [perplexity - N/A]
The avg speed within all the milestones [samples/s]

Optional:

Device usage [percent %]

Add DeBERTa for Fine-tuning

Tasks:

Sequence Classification
Token Classification
Question Answering

Integrate BERT example

Let's reuse what GraphCore already have done towards BERT training. The goal of this enablement is to have the structure in place and everything that works without changing too much things from the GC implementation.

Modeling optimum.graphcore.models.XBertForY (with X : [GraphCore or IPU wdyt @whobbes @jimypbr and Y the task)
Data optimum.graphcore.data (let's use this package for interoperability with datasets and GC Wikipedia)

Error when running BART/T5 example evaluation

Running example:

python examples/translation/run_translation.py \
    --model_name_or_path t5-small \
    --ipu_config_name Graphcore/t5-small-ipu \
    --do_train False \
    --do_eval \
    --source_lang en \
    --target_lang ro \
    --source_prefix "translate English to Romanian: " \
    --dataset_name wmt16 \
    --dataset_config_name ro-en \
    --output_dir /tmp/tst-translation \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --pod_type pod16 \
    --overwrite_output_dir \
    --predict_with_generate

Training works, but at evaluation I get the error at the start of compilation:

  File "examples/translation/run_translation.py", line 626, in <module>
    main()
  File "examples/translation/run_translation.py", line 567, in main
    metrics = trainer.evaluate(max_length=max_length, num_beams=num_beams, metric_key_prefix="eval")
  File "/localdata/jamesbr/dev/optimum-graphcore/optimum/graphcore/trainer_seq2seq.py", line 69, in evaluate
    return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "/localdata/jamesbr/dev/optimum-graphcore/optimum/graphcore/trainer.py", line 1731, in evaluate
    metric_key_prefix=metric_key_prefix,
  File "/localdata/jamesbr/dev/optimum-graphcore/optimum/graphcore/trainer.py", line 1826, in evaluation_loop
    self._compile_model(model, next(iter(dataloader)), log=True)
  File "/localdata/jamesbr/dev/optimum-graphcore/optimum/graphcore/trainer.py", line 329, in _compile_model
    start_compile = time.perf_counter()
  File "/localdata/jamesbr/sdks/venv/poplar_sdk-ubuntu_18_04-2.5.0+929-ff2d91ab1f/2.5.0+929_poptorch/lib/python3.6/site-packages/poptorch/_poplar_executor.py", line 645, in compile
    in_tensors = self._args_parser(args, kwargs, False)
  File "/localdata/jamesbr/sdks/venv/poplar_sdk-ubuntu_18_04-2.5.0+929-ff2d91ab1f/2.5.0+929_poptorch/lib/python3.6/site-packages/poptorch/_args_parser.py", line 95, in __call__
    len(args) + len(kwargs))
AssertionError: Too many arguments provided: expected ['encoder_outputs', 'decoder_input_ids', 'attention_mask'] (3) but got 4

Add LXMERT model

Inference speed metrics takes compilation time into account.

When recording the speed of an inference loop, the start_time used for the speed metrics is recorded before the call to evaluation_loop (https://github.com/huggingface/optimum-graphcore/blob/main/optimum/graphcore/trainer.py#L1440).
However, evaluation_loop contains the model compilation (self._wrap_and_compile_model_for_evaluation) which can be a lot longer than the actual execution time of the evaluation loop. (In the case of training, the compilation occurs before start_time is recorded).
Another thing, the very first call to the dataloader :
for step, inputs in enumerate(dataloader):
can take several seconds to start. Again, this can be a lot compared to the actual evaluation loop execution time. (Especially when we have just a few steps to execute). I'm not sure if this should be excluded from the total time as well.

Enable GraphCloud CI integration

Install Github Runner client and remote controller on the target machine(s)
Deploy .github.yaml actions to be run on commit basis for "fast" tests
Deploy .github.yaml actions to be run on a scheduled basis for "slow" tests (checked periodicity with GC)
Include basic set of tests to ensure the platform is correctly discovered on the remote

Support all the poptorch optimizers

We currently only support AdamW and LAMB.

In IPUTrainer:

    def _pytorch_optimizer_to_poptorch(self, optimizer: optim.Optimizer):
        # TODO: implement this function
        pytorch_to_poptorch_mapping = {
            optim.SGD: poptorch.optim.SGD,
            optim.Adam: poptorch.optim.Adam,
            optim.AdamW: poptorch.optim.AdamW,
            optim.RMSpop: poptorch.optim.RMSprop,
        }
        pass

Is there a quick or convenient way to convert from the original model of transformers to PipelinedDebertaV2/V3ForSequenceClassification

Hi,
Is there a quick or convenient way to convert from the original model of transformers to PipelinedDebertaV2/V3ForSequenceClassification
Although it is very intimate, there is already a debta-base

# ipu_config = IPUConfig.from_pretrained("Graphcore/deberta-base-ipu")

logger = logging.get_logger(__name__)

class PipelinedDebertaV2ForSequenceClassification(DebertaV2ForSequenceClassification, PipelineMixin):
    def parallelize(self):
        super().parallelize()
        logger.info("---------- Device Allocation -----------")
        logger.info("Embedding  --> IPU 0")
        self.deberta.embeddings = poptorch.BeginBlock(self.deberta.embeddings, "Embedding", ipu_id=0)

        # layer_ipu = get_layer_ipu(self.ipu_config.layers_per_ipu)
        layer_ipu = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
        # print(layer_ipu)
        for index, layer in enumerate(self.deberta.encoder.layer):
            if self.ipu_config.recompute_checkpoint_every_layer:
                # Put checkpoints on every encoder layer
                h = recomputation_checkpoint(layer)
                self._hooks.append(h)
            # print(index)
            ipu = layer_ipu[index]
            logger.info(f"Encoder {index:<2} --> IPU {ipu}")
            self.deberta.encoder.layer[index] = poptorch.BeginBlock(layer, f"Encoder{index}", ipu_id=ipu)

        last_ipu = self.ipu_config.ipus_per_replica - 1
        logger.info(f"Head       --> IPU {last_ipu}")
        logger.info("---------------------------------------")
        # self.deberta.layernorm = poptorch.BeginBlock(self.deberta.layernorm, "LayerNorm", ipu_id=last_ipu)
        self.classifier = poptorch.BeginBlock(self.classifier, "Classifier", ipu_id=last_ipu)
        return self

num_labels = 273
model = PipelinedDebertaV2ForSequenceClassification.from_pretrained("microsoft/deberta-v3-base", num_labels=num_labels)

but trainner.train got error :
AssertionError: Torch doesn't support passing tensors (labels) after the following parameters have defaulted to None. position_ids (3), inputs_embeds (4)

Set LayerNorm's eps to a number that is larger than 6e-5

Currently, many LayerNorm's eps are smaller than 6.1e-5 (smallest fp16 value), which might cause underflow.

compile_only doesn't make the program exit after compilation

[From AlexC in GC] As the title says. The execution keeps on going, suggesting that the exposed argument doesn't work.

Is there fine-tuning for stable diffusion 1.5?

HuBERT integration available?

When I try training HubertForCTC, I get the following error from IPUTrainer:

KeyError: 'HubertForCTC pipelined version not found in registry.'

Add BART model

Set more wandb options in training_args.py

It's not clear how to enable wandb logging from optimum. By default it's off, but it can be turned on with the following environment variable: WANDB_DISABLED=false. Other envvars are detailed in this page. Ideally we'd like a --wandb in training_args.py that enables it. I've noticed that there's already one flag related to wandb in training_args.py, line 166-168:

run_name: Optional[str] = field(
        default=None, metadata={"help": "An optional descriptor for the run. Notably used for wandb logging."}
    )

Could we add two other wandb args? One to enable it (e.g. wandb) and another to set the project name (e.g. project_name). Thanks.

from optimum.version import version ModuleNotFoundError: No module named 'optimum.version'

[21:09:16.438] [poptorch::python] [critical] ModuleNotFoundError: No module named'optimum.version'

Traceback (most recent call last):
  File "run_qa.py", line 31, in <module>
    from optimum.graphcore import IPUConfig
  File "/usr/local/lib/python3.6/dist-packages/optimum/graphcore/__init__.py", line 21, in <module>
    from .trainer import IPUTrainer
  File "/usr/local/lib/python3.6/dist-packages/optimum/graphcore/trainer.py", line 33, in <module>
    from optimum.version import __version__
ModuleNotFoundError: No module named 'optimum.version'

Fix:
Update line from dist-packages/optimum/graphcore/trainer.py
from optimum.version import __version__ >> from optimum.graphcore.version import __version__

Add missing tests for generation

Add test_generation_beam_search.py
Add test_generation_beam_constraints.py

deberta base text multiclasss trainer.trian() Target size (torch.Size([16])) must be the same as input size

System Info

Graphcore instance
import transformers
import optimum.graphcore
print(transformers.__version__)
print(optimum.graphcore.__version__)
4.20.0
0.3.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

deberta base text multiclasss trainer.trian() Target size (torch.Size([16])) must be the same as input size

Expected behavior

loss func may be the key

Allow setting betas parameters for LAMB optimizer

[From AlexC in GC]
Currently it is only possible to set betas for ADAMW optimizer. We would like to do it also for LAMB.

Rename training argument from adam_beta1 and adam_beta2 to optimizer_beta1 and optimizer_beta2
In IPUTrainer.create_optimizer pass the value of the above parameters to betas as part of optimizer_kwargs

Bonus point: while you are at it you could rename the parameter adam_epsilon to optimizer_epsilon, since this value is also used by LAMB.

Implement the different generation methods using poptorch.for_loop to make things much faster

Support other models as well

What would it require to support LLaMA-based models as well?

Fix transformers version to match with Graphcore's SDK

The Graphcore's SDK is tested for certain versions of transformers (at the moment the maximum one is 4.17.0). However, optimum requires the latest version of transformers or otherwise throws an error — see for instance this line in examples/question-answering/run_qa.py, where it checks for the most recent version:

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.18.0.dev0")

Is there a way of setting a certain version of transformers in optimum? Ideally we'd like to be able to checkout a certain transformers version and not master. Thanks.

Fix test-examples CI

test-examples is failing on the last test convnext.
I suspect that it's because there it is running out of space on the CI machine and so the last test fails due to this.
If we isolate all the tests so the datasets and running is done inside of a temporary directory then there should be no side-effects and this hopefully fixes the issue.

Running ViT run_image_classification_on_local_data.py produces error on replication factor

[19:14:25.212] [poptorch::python] [critical] AssertionError: Unexpected type <class 'dict'> for option replication_factor. Expected <class 'int'>

Traceback (most recent call last):
  File "run_image_classification_on_local_data.py", line 369, in <module>
    main()
  File "run_image_classification_on_local_data.py", line 333, in main
    data_collator=collate_fn,
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 225, in __init__
    self.opts = self.ipu_config.to_options()
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 251, in to_options
    return self.for_pod_type(pod_type)._to_options(for_inference=for_inference, compile_only=compile_only)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 136, in _to_options
    opts.replicationFactor(self.inference_replication_factor if for_inference else self.replication_factor)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/poptorch/options.py", line 1381, in replicationFactor
    self.set(replication_factor=replication_factor)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/poptorch/_options_impl.py", line 67, in set
    type(value), option, type(self._values[option]))
AssertionError: Unexpected type <class 'dict'> for option replication_factor. Expected <class 'int'>

03/25/2022 19:14:25 - critical - poptorch::python - AssertionError: Unexpected type <class 'dict'> for option replication_factor. Expected <class 'int'>

Traceback (most recent call last):
  File "run_image_classification_on_local_data.py", line 369, in <module>
    main()
  File "run_image_classification_on_local_data.py", line 333, in main
    data_collator=collate_fn,
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 225, in __init__
    self.opts = self.ipu_config.to_options()
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 251, in to_options
    return self.for_pod_type(pod_type)._to_options(for_inference=for_inference, compile_only=compile_only)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 136, in _to_options
    opts.replicationFactor(self.inference_replication_factor if for_inference else self.replication_factor)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/poptorch/options.py", line 1381, in replicationFactor
    self.set(replication_factor=replication_factor)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/poptorch/_options_impl.py", line 67, in set
    type(value), option, type(self._values[option]))
AssertionError: Unexpected type <class 'dict'> for option replication_factor. Expected <class 'int'>

I tried overriding ipu_config_overrides="replication_factor=4,inference_replication_factor=4" and it returns another error:

Overriding IPU config: replication_factor=4,inference_replication_factor=4
[19:19:46.420] [poptorch::python] [critical] ValueError: You can only update int, float, bool, list or string values in the config, got 4 for key replication_factor

Traceback (most recent call last):
  File "run_image_classification_on_local_data.py", line 369, in <module>
    main()
  File "run_image_classification_on_local_data.py", line 333, in main
    data_collator=collate_fn,
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 223, in __init__
    self.ipu_config.update_from_string(args.ipu_config_overrides)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 307, in update_from_string
    f"You can only update int, float, bool, list or string values in the config, got {v} for key {k}"
ValueError: You can only update int, float, bool, list or string values in the config, got 4 for key replication_factor

03/25/2022 19:19:46 - critical - poptorch::python - ValueError: You can only update int, float, bool, list or string values in the config, got 4 for key replication_factor

Traceback (most recent call last):
  File "run_image_classification_on_local_data.py", line 369, in <module>
    main()
  File "run_image_classification_on_local_data.py", line 333, in main
    data_collator=collate_fn,
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 223, in __init__
    self.ipu_config.update_from_string(args.ipu_config_overrides)
  File "/mnt/poddata/workspace/vit-hf/lib/python3.6/site-packages/optimum/graphcore/ipu_configuration.py", line 307, in update_from_string
    f"You can only update int, float, bool, list or string values in the config, got {v} for key {k}"
ValueError: You can only update int, float, bool, list or string values in the config, got 4 for key replication_factor

Add Masked Language modeling for RoBERTa and BERT

Language modelling on custom dataset

Hi.

In examples/language-modelling you say:
The following examples, will run on datasets hosted on our hub or with your own text files for training and validation. We give examples of both below.

I believe the text file example is missing, and it would be a great addition!

Cheers.

Add Paperspace Gradient links to all notebooks

Hi there 👋 !

I'm adding links to all the Graphcore notebooks in the Optimum docs and I noticed that only 3 notebooks have links to run on Paperspace. Any chance we can have links for all notebooks?

Add T5 model

Enable and validate the missing (commented) tests in test_generation_utils.py

There are many tests that were temporarly disabled, either because they did not pass and were not critical, or because they could not be enabled until other PRs were merged (for instance the ones needing GPT-2). These tests need to be enabled.

bert and roberta models

Hi,

BERT-base and RoBERTa-base and most of the other models are not available now (anymore).
Is it planned to add them soon? Is it possible to use older versions of those?

Cheers,

SQuAD eval: list index out of range in postprocess_qa_predictions

Get this error when running postprocessing of squad results:

02/21/2022 22:13:33 - critical - poptorch::python - IndexError: list index out of range

Traceback (most recent call last):
  File "examples/question-answering/run_qa.py", line 671, in <module>
    main()
  File "examples/question-answering/run_qa.py", line 628, in main
    metrics = trainer.evaluate()
  File "/localdata/jamesbr/dev/optimum-graphcore/examples/question-answering/trainer_qa.py", line 57, in evaluate
    eval_preds = self.post_process_function(eval_examples, eval_dataset, output.predictions)
  File "examples/question-answering/run_qa.py", line 573, in post_processing_function
    prefix=stage,
  File "/localdata/jamesbr/dev/optimum-graphcore/examples/question-answering/utils_qa.py", line 152, in postprocess_qa_predictions
    "offsets": (offset_mapping[start_index][0], offset_mapping[end_index][1]),
IndexError: list index out of range

Reproducer:

python examples/question-answering/run_qa.py \
  --ipu_config_name Graphcore/roberta-base-ipu \
  --model_name_or_path roberta-base \
  --dataset_name squad_v2 \
  --version_2_with_negative \
  --do_train False \
  --do_eval \
  --num_train_epochs 2 \
  --per_device_train_batch_size 4 \
  --per_device_eval_batch_size 2 \
  --learning_rate 6e-5 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --seed 1984 \
  --lr_scheduler_type cosine \
  --loss_scaling 64 \
  --weight_decay 0.01 \
  --warmup_ratio 0.25 \
  --logging_steps 10 \
  --save_steps 200 \
  --dataloader_num_workers 64 \
  --output_dir squad_roberta_base \
  --overwrite_output_dir

Language modeling example fails

Using the latest source install (d8b082a), the language modeling example fails.

transformers.version
'4.16.2'
datasets.version
'1.18.3'
tokenizers.version
'1.11.5'

python run_pretraining.py   --config_name bert-base-uncased   --tokenizer_name bert-base-uncased   --ipu_config_name Graphcore/bert-base-ipu   --dataset_name Graphcore/wikipedia-bert-128   --do_train   --do_eval   --output_dir ./output/test-pretraining --cache_dir /localdata/juliens

02/17/2022 09:25:13 - INFO - __main__ - Training new model from scratch
Running tokenizer on every text in dataset:   0%|                                                                                                                                                            | 0/36728 [00:00<?, ?ba/s]
[09:25:16.219] [poptorch::python] [critical] ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

Traceback (most recent call last):
  File "run_pretraining.py", line 626, in <module>
    main()
  File "run_pretraining.py", line 484, in main
    desc="Running tokenizer on every text in dataset",
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/dataset_dict.py", line 512, in map
    for k, dataset in self.items()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/dataset_dict.py", line 512, in <dictcomp>
    for k, dataset in self.items()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2120, in map
    desc=desc,
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 485, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/fingerprint.py", line 413, in wrapper
    out = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2485, in _map_single
    offset=offset,
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2367, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2062, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "run_pretraining.py", line 475, in tokenize_function
    return tokenizer(examples[text_column_name], return_special_tokens_mask=True)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2418, in __call__
    "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

02/17/2022 09:25:16 - critical - poptorch::python - ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

Traceback (most recent call last):
  File "run_pretraining.py", line 626, in <module>
    main()
  File "run_pretraining.py", line 484, in main
    desc="Running tokenizer on every text in dataset",
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/dataset_dict.py", line 512, in map
    for k, dataset in self.items()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/dataset_dict.py", line 512, in <dictcomp>
    for k, dataset in self.items()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2120, in map
    desc=desc,
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 485, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/fingerprint.py", line 413, in wrapper
    out = func(self, *args, **kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2485, in _map_single
    offset=offset,
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2367, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 2062, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "run_pretraining.py", line 475, in tokenize_function
    return tokenizer(examples[text_column_name], return_special_tokens_mask=True)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2418, in __call__
    "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

async rebatched is broken on wav2vec2 pretraining

Add HuBERT for Sequence Classification

Fine-tuning with superb-ks dataset
Fine-tuning with commonlang dataset
Optimise memory and throughput with recomputation checkpoints.

Image classification (cats & dogs) fails

It fails with vanilla transformers and the error is identical, so that's probably not a Graphcore issue at all. Adding it here for visibility.

huggingface/transformers#15698

Allow the encoder outputs to be computed on the CPU for generation

Currently, during generation, the encoder outputs can only be computed after having compiled the encoder separately.
It would be nice to be able to compute the encoder outputs directly from the CPU, and to run the decoding loop on the IPUs.
The IPUConfig attribute for that, and some other features have already been added, but the whole thing does not work for now.

Text classification example fails on several tasks

Using the latest source install (d8b082a), the first snippet in the text classification example fails on three tasks( mrpc, rte and wnli) with the same error message.

transformers.version
'4.16.2'
datasets.version
'1.18.3'

Other tasks run fine on the IPU. The three impacted tasks also run fine with vanilla transformers.

python run_glue.py \
  --model_name_or_path bert-base-cased \
  --ipu_config_name Graphcore/bert-base-ipu \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir ./output/$TASK_NAME/

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
The following columns in the training set  don't have a corresponding argument in `PipelinedBertForSequenceClassification.forward` and have been ignored: sentence1, sentence2, idx.
[08:55:10.779] [poptorch::python] [critical] StopIteration

Traceback (most recent call last):
  File "run_glue.py", line 568, in <module>
    main()
  File "run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 986, in train
    self._compile_model(model, next(iter(train_dataloader)), log=True)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
StopIteration

02/17/2022 08:55:10 - critical - poptorch::python - StopIteration

Traceback (most recent call last):
  File "run_glue.py", line 568, in <module>
    main()
  File "run_glue.py", line 486, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/optimum/graphcore/trainer.py", line 986, in train
    self._compile_model(model, next(iter(train_dataloader)), log=True)
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/home/juliens/optimum-env/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
StopIteration

Add GPT2 model

Fine-tuning tasks (small)
Fine-tuning tasks (medium)
CLM (small)
CLM (medium)

Resuming from checkpoint gives wrong speed metrics results.

When using the argument resume_from_checkpoint to resume training, the results of speed metrics can be wrong. In trainer.py :
metrics = speed_metrics("train", start_time, num_samples=num_train_samples, num_steps=self.state.max_steps)
start_time is not checkpointed and will be set to a new value when training restart, while max_steps an num_train_samples seems to be the total number of steps and samples.
Possible fix:
I suppose we could either:

use the actual remaining number of steps and samples from the Trainer state that is checkpointed.
or checkpoint the start_time

Run on paperspace buttons still point to 2.6 SDK

Need updating to SDK 3.0

Compiler error in BERT MLM pretraining

Hi, I'm trying to run the BERT MLM example (examples/language-modeling/run_mlm.py) on an on-premise POD16 and am hitting a compiler error. I'm trying to debug on my side but I thought I'd make an issue here to see you can spot anything obvious.

Tech stack:
OS: CentOS 7
Python version: 3.6.8
Poplar version: 2.4.0
optimum-graphcore commit: 32eee51
transformers version: 4.18.0

Full stacktrace:

Graph compilation: 18%|████████████████████████████████████████████████████████▋ | 18/100 [01:02<03:23]2022-06-08T17:10:24.574645Z popart:popart 11633.11633 E: snap::Tensor Accum___model.bert.embeddings.word_embeddings.weight of unexpected Type. Poplar tensor type : half. Expected (Ir) tensor type : float. This for tensor Accum___model.bert.embeddings.word_embeddings.weight

[0] popart::popx::PopTensors::insert(std::string, snap::Tensor const&)
[1] popart::popx::InitTensorCreator::initTensor(popart::popx::IrLowering&) const

[17:10:25.174] [poptorch::python] [critical] poptorch.poptorch_core.Error: In poptorch/python/poptorch.cpp:1319: 'popart_exception': snap::Tensor Accum___model.bert.embeddings.word_embeddings.weight of unexpected Type. Poplar tensor type : half. Expected (Ir) tensor type : float. This for tensor Accum___model.bert.embeddings.word_embeddings.weight
Error raised in:
[0] popart::popx::PopTensors::insert(std::string, snap::Tensor const&)
[1] popart::popx::InitTensorCreator::initTensor(popart::popx::IrLowering&) const
[2] popart::Session::prepareDevice: Poplar compilation
[3] Compiler::compileAndPrepareDevice
[4] LowerToPopart::compile

Traceback (most recent call last):
File "examples/language-modeling/run_mlm.py", line 604, in
main()
File "examples/language-modeling/run_mlm.py", line 553, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/oscark/PycharmProjects/huggingface--optimum-graphcore/optimum/graphcore/trainer.py", line 887, in train
self._compile_model(model, next(iter(train_dataloader)), log=True)
File "/home/oscark/PycharmProjects/huggingface--optimum-graphcore/optimum/graphcore/trainer.py", line 355, in _compile_model
model.compile(**sample_batch)
File "/home/oscark/venvs/graphcore24/lib64/python3.6/site-packages/poptorch/_poplar_executor.py", line 622, in compile
self._compile(in_tensors)
File "/home/oscark/venvs/graphcore24/lib64/python3.6/site-packages/poptorch/_poplar_executor.py", line 505, in _compile
*trace_args)
poptorch.poptorch_core.Error: In poptorch/python/poptorch.cpp:1319: 'popart_exception': snap::Tensor Accum___model.bert.embeddings.word_embeddings.weight of unexpected Type. Poplar tensor type : half. Expected (Ir) tensor type : float. This for tensor Accum___model.bert.embeddings.word_embeddings.weight
Error raised in:
[0] popart::popx::PopTensors::insert(std::string, snap::Tensor const&)
[1] popart::popx::InitTensorCreator::initTensor(popart::popx::IrLowering&) const
[2] popart::Session::prepareDevice: Poplar compilation
[3] Compiler::compileAndPrepareDevice
[4] LowerToPopart::compile

06/08/2022 17:10:25 - critical - poptorch::python - poptorch.poptorch_core.Error: In poptorch/python/poptorch.cpp:1319: 'popart_exception': snap::Tensor Accum___model.bert.embeddings.word_embeddings.weight of unexpected Type. Poplar tensor type : half. Expected (Ir) tensor type : float. This for tensor Accum___model.bert.embeddings.word_embeddings.weight
Error raised in:
[0] popart::popx::PopTensors::insert(std::string, snap::Tensor const&)
[1] popart::popx::InitTensorCreator::initTensor(popart::popx::IrLowering&) const
[2] popart::Session::prepareDevice: Poplar compilation
[3] Compiler::compileAndPrepareDevice
[4] LowerToPopart::compile

optuna is not in the requirements.txt

Add optuna to requirements.txt

Test automatic loss scaling and set up option for it

Automatic Loss Scaling (ALS) is a feature in the Poplar SDK which brings stability to training large models in half precision, specially when gradient accumulation and reduction across replicas also happen in half precision.

The objective of this issue is to enable the user to turn it on for any model in optimum. Before landing, it's required to test that optimum models converge fine with ALS, in comparison when using a manually tuned static loss scaling.

More details about automatic loss scaling in: https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.options._TrainingOptions.setAutomaticLossScaling