Coder Social home page Coder Social logo

mht-sharma / optimum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huggingface/optimum

1.0 0.0 0.0 4.64 MB

๐ŸŽ๏ธ Accelerate training and inference of ๐Ÿค— Transformers with easy to use hardware optimization tools

Home Page: https://huggingface.co/docs/optimum/

License: Apache License 2.0

Python 99.91% Makefile 0.08% Shell 0.01%

optimum's Introduction

ONNX Runtime

Hugging Face Optimum

๐Ÿค— Optimum is an extension of ๐Ÿค— Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

๐Ÿค— Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of ๐Ÿค— Optimum, you can install the required dependencies according to the table below:

Accelerator Installation
ONNX Runtime pip install --upgrade-strategy eager optimum[onnxruntime]
Intel Neural Compressor pip install --upgrade-strategy eager optimum[neural-compressor]
OpenVINO pip install --upgrade-strategy eager optimum[openvino,nncf]
AMD Instinct GPUs and Ryzen AI NPU pip install --upgrade-strategy eager optimum[amd]
Habana Gaudi Processor (HPU) pip install --upgrade-strategy eager optimum[habana]
FuriosaAI pip install --upgrade-strategy eager optimum[furiosa]

The --upgrade-strategy eager option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append optimum[accelerator_type] to the above command:

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Accelerated Inference

๐Ÿค— Optimum provides multiple tools to export and run optimized models on various ecosystems:

The export and optimizations can be done both programmatically and with a command line.

Features summary

Features ONNX Runtime Neural Compressor OpenVINO TensorFlow Lite
Graph optimization โœ”๏ธ N/A โœ”๏ธ N/A
Post-training dynamic quantization โœ”๏ธ โœ”๏ธ N/A โœ”๏ธ
Post-training static quantization โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ
Quantization Aware Training (QAT) N/A โœ”๏ธ โœ”๏ธ N/A
FP16 (half precision) โœ”๏ธ N/A โœ”๏ธ โœ”๏ธ
Pruning N/A โœ”๏ธ โœ”๏ธ N/A
Knowledge Distillation N/A โœ”๏ธ โœ”๏ธ N/A

OpenVINO

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade-strategy eager optimum[openvino,nncf]

It is possible to export ๐Ÿค— Transformers and Diffusers models to the OpenVINO format easily:

optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov

If you add --int8, the weights will be quantized to INT8. Static quantization can also be applied on the activations using NNCF, more information can be found in the documentation.

To load a model and run inference with OpenVINO Runtime, you can just replace your AutoModelForXxx class with the corresponding OVModelForXxx class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set export=True when loading your model.

- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
  from transformers import AutoTokenizer, pipeline

  model_id = "distilbert-base-uncased-finetuned-sst-2-english"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained("distilbert_sst2_ov")

  classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
  results = classifier("He's a dreadful magician.")

You can find more examples in the documentation and in the examples.

Neural Compressor

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade-strategy eager optimum[neural-compressor]

Dynamic quantization can be applied on your model:

optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert

To load a model quantized with Intel Neural Compressor, hosted locally or on the ๐Ÿค— hub, you can do as follows :

from optimum.intel import INCModelForSequenceClassification

model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
model = INCModelForSequenceClassification.from_pretrained(model_id)

You can find more examples in the documentation and in the examples.

ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters,onnxruntime]

It is possible to export ๐Ÿค— Transformers and Diffusers models to the ONNX format and perform graph optimization as well as quantization easily:

optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx

The model can then be quantized using onnxruntime:

optimum-cli onnxruntime quantize \
  --avx512 \
  --onnx_model roberta_base_qa_onnx \
  -o quantized_roberta_base_qa_onnx

These commands will export deepset/roberta-base-squad2 and perform O2 graph optimization on the exported model, and finally quantize it with the avx512 configuration.

For more information on the ONNX export, please check the documentation.

Run the exported model using ONNX Runtime

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend:

- from transformers import AutoModelForQuestionAnswering
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
  from transformers import AutoTokenizer, pipeline

  model_id = "deepset/roberta-base-squad2"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
  qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
  question = "What's Optimum?"
  context = "Optimum is an awesome library everyone should use!"
  results = qa_pipe(question=question, context=context)

More details on how to run ONNX models with ORTModelForXXX classes here.

TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters-tf]

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them:

optimum-cli export tflite \
  -m deepset/roberta-base-squad2 \
  --sequence_length 384  \
  --quantize int8-dynamic roberta_tflite_model

Accelerated training

๐Ÿค— Optimum provides wrappers around the original ๐Ÿค— Transformers Trainer to enable training on powerful hardware easily. We support many providers:

  • Habana's Gaudi processors
  • ONNX Runtime (optimized for GPUs)

Habana

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade-strategy eager optimum[habana]
- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForXxx.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
      output_dir="path/to/save/folder/",
+     use_habana=True,
+     use_lazy_mode=True,
+     gaudi_config_name="Habana/bert-base-uncased",
      ...
  )

  # Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use Habana Gaudi processor for training!
  trainer.train()

You can find more examples in the documentation and in the examples.

ONNX Runtime

- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
      output_dir="path/to/save/folder/",
      optim="adamw_ort_fused",
      ...
  )

  # Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use ONNX Runtime for training!
  trainer.train()

You can find more examples in the documentation and in the examples.

optimum's People

Contributors

adamlouly avatar baskrahmer avatar carzh avatar changwangss avatar echarlaix avatar fxmarty avatar ierezell avatar ilyasmoutawwakil avatar jingyahuang avatar jplu avatar kunal-vaishnavi avatar lewtun avatar madlag avatar mfuntowicz avatar mht-sharma avatar michaelbenayoun avatar mishig25 avatar penghuicheng avatar philschmid avatar prathikr avatar regisss avatar rui-ren avatar ryanrussell avatar sunmarc avatar vikparuchuri avatar vivekkhandelwal1 avatar vrdn-23 avatar xenova avatar xin3he avatar younesbelkada avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.