Coder Social home page Coder Social logo

nod-ai / shark Goto Github PK

View Code? Open in Web Editor NEW
1.4K 37.0 165.0 44.65 MB

SHARK - High Performance Machine Learning Distribution

License: Apache License 2.0

Python 72.32% Shell 0.35% CMake 0.47% C++ 22.28% Jupyter Notebook 2.48% C 1.07% MLIR 0.01% PowerShell 0.17% CSS 0.73% JavaScript 0.11%
deep-learning machine-learning mlir pytorch amd apple-silicon nvidia

shark's Introduction

SHARK

High Performance Machine Learning Distribution

We are currently rebuilding SHARK to take advantage of Turbine. Until that is complete make sure you use an .exe release or a checkout of the SHARK-1.0 branch, for a working SHARK

Nightly Release Validate torch-models on Shark Runtime

Prerequisites - Drivers

Install your Windows hardware drivers

  • [AMD RDNA Users] Download the latest driver (23.2.1 is the oldest supported) here.
  • [macOS Users] Download and install the 1.3.216 Vulkan SDK from here. Newer versions of the SDK will not work.
  • [Nvidia Users] Download and install the latest CUDA / Vulkan drivers from here

Linux Drivers

  • MESA / RADV drivers wont work with FP16. Please use the latest AMGPU-PRO drivers (non-pro OSS drivers also wont work) or the latest NVidia Linux Drivers.

Other users please ensure you have your latest vendor drivers and Vulkan SDK from here and if you are using vulkan check vulkaninfo works in a terminal window

Quick Start for SHARK Stable Diffusion for Windows 10/11 Users

Install the Driver from (Prerequisites)[https://github.com/nod-ai/SHARK#install-your-hardware-drivers] above

Download the stable release or the most recent SHARK 1.0 pre-release.

Double click the .exe, or run from the command line (recommended), and you should have the UI in the browser.

If you have custom models put them in a models/ directory where the .exe is.

Enjoy.

More installation notes * We recommend that you download EXE in a new folder, whenever you download a new EXE version. If you download it in the same folder as a previous install, you must delete the old `*.vmfb` files with `rm *.vmfb`. You can also use `--clear_all` flag once to clean all the old files. * If you recently updated the driver or this binary (EXE file), we recommend you clear all the local artifacts with `--clear_all`

Running

  • Open a Command Prompt or Powershell terminal, change folder (cd) to the .exe folder. Then run the EXE from the command prompt. That way, if an error occurs, you'll be able to cut-and-paste it to ask for help. (if it always works for you without error, you may simply double-click the EXE)
  • The first run may take few minutes when the models are downloaded and compiled. Your patience is appreciated. The download could be about 5GB.
  • You will likely see a Windows Defender message asking you to give permission to open a web server port. Accept it.
  • Open a browser to access the Stable Diffusion web server. By default, the port is 8080, so you can go to http://localhost:8080/.
  • If you prefer to always run in the browser, use the --ui=web command argument when running the EXE.

Stopping

  • Select the command prompt that's running the EXE. Press CTRL-C and wait a moment or close the terminal.
Advanced Installation (Only for developers)

Advanced Installation (Windows, Linux and macOS) for developers

Windows 10/11 Users

  • Install Git for Windows from here if you don't already have it.

Check out the code

git clone https://github.com/nod-ai/SHARK.git
cd SHARK

Switch to the Correct Branch (IMPORTANT!)

Currently SHARK is being rebuilt for Turbine on the main branch. For now you are strongly discouraged from using main unless you are working on the rebuild effort, and should not expect the code there to produce a working application for Image Generation, So for now you'll need switch over to the SHARK-1.0 branch and use the stable code.

git checkout SHARK-1.0

The following setup instructions assume you are on this branch.

Setup your Python VirtualEnvironment and Dependencies

Windows 10/11 Users

  • Install the latest Python 3.11.x version from here

Allow the install script to run in Powershell

set-executionpolicy remotesigned

Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...)

./setup_venv.ps1 #You can re-run this script to get the latest version

Linux / macOS Users

./setup_venv.sh
source shark1.venv/bin/activate

Run Stable Diffusion on your device - WebUI

Windows 10/11 Users

(shark1.venv) PS C:\g\shark> cd .\apps\stable_diffusion\web\
(shark1.venv) PS C:\g\shark\apps\stable_diffusion\web> python .\index.py

Linux / macOS Users

(shark1.venv) > cd apps/stable_diffusion/web
(shark1.venv) > python index.py

Access Stable Diffusion on http://localhost:8080/?__theme=dark

webui

Run Stable Diffusion on your device - Commandline

Windows 10/11 Users

(shark1.venv) PS C:\g\shark> python .\apps\stable_diffusion\scripts\main.py --app="txt2img" --precision="fp16" --prompt="tajmahal, snow, sunflowers, oil on canvas" --device="vulkan"

Linux / macOS Users

python3.11 apps/stable_diffusion/scripts/main.py --app=txt2img --precision=fp16 --device=vulkan --prompt="tajmahal, oil on canvas, sunflowers, 4k, uhd"

You can replace vulkan with cpu to run on your CPU or with cuda to run on CUDA devices. If you have multiple vulkan devices you can address them with --device=vulkan://1 etc

The output on a AMD 7900XTX would look something like:

Average step time: 47.19188690185547ms/it
Clip Inference time (ms) = 109.531
VAE Inference time (ms): 78.590

Total image generation time: 2.5788655281066895sec

Here are some samples generated:

tajmahal, snow, sunflowers, oil on canvas_0

a photo of a crab playing a trumpet

Find us on SHARK Discord server if you have any trouble with running it on your hardware.

Binary Installation

Setup a new pip Virtual Environment

This step sets up a new VirtualEnv for Python

python --version #Check you have 3.11 on Linux, macOS or Windows Powershell
python -m venv shark_venv
source shark_venv/bin/activate   # Use shark_venv/Scripts/activate on Windows

# If you are using conda create and activate a new conda env

# Some older pip installs may not be able to handle the recent PyTorch deps
python -m pip install --upgrade pip

macOS Metal users please install https://sdk.lunarg.com/sdk/download/latest/mac/vulkan-sdk.dmg and enable "System wide install"

Install SHARK

This step pip installs SHARK and related packages on Linux Python 3.8, 3.10 and 3.11 and macOS / Windows Python 3.11

pip install nodai-shark -f https://nod-ai.github.io/SHARK/package-index/ -f https://llvm.github.io/torch-mlir/package-index/ -f  https://nod-ai.github.io/SRT/pip-release-links.html --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Run shark tank model tests.

pytest tank/test_models.py

See tank/README.md for a more detailed walkthrough of our pytest suite and CLI.

Download and run Resnet50 sample

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
#Install deps for test script
pip install --pre torch torchvision torchaudio tqdm pillow gsutil --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal

Download and run BERT (MiniLM) sample

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/minilm_jit.py
#Install deps for test script
pip install transformers torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./minilm_jit.py --device="cpu"  #use cuda or vulkan or metal
Development, Testing and Benchmarks

If you want to use Python3.11 and with TF Import tools you can use the environment variables like: Set USE_IREE=1 to use upstream IREE

# PYTHON=python3.11 VENV_DIR=0617_venv IMPORTER=1 ./setup_venv.sh

Run any of the hundreds of SHARK tank models via the test framework

python -m  shark.examples.shark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/test_models.py -k "MiniLM"

How to use your locally built IREE / Torch-MLIR with SHARK

If you are a Torch-mlir developer or an IREE developer and want to test local changes you can uninstall the provided packages with pip uninstall torch-mlir and / or pip uninstall iree-compiler iree-runtime and build locally with Python bindings and set your PYTHONPATH as mentioned here for IREE and here for Torch-MLIR.

How to use your locally built Torch-MLIR with SHARK:

1.) Run `./setup_venv.sh in SHARK` and activate `shark.venv` virtual env.
2.) Run `pip uninstall torch-mlir`.
3.) Go to your local Torch-MLIR directory.
4.) Activate mlir_venv virtual envirnoment.
5.) Run `pip uninstall -r requirements.txt`.
6.) Run `pip install -r requirements.txt`.
7.) Build Torch-MLIR.
8.) Activate shark.venv virtual environment from the Torch-MLIR directory.
8.) Run `export PYTHONPATH=`pwd`/build/tools/torch-mlir/python_packages/torch_mlir:`pwd`/examples` in the Torch-MLIR directory.
9.) Go to the SHARK directory.

Now the SHARK will use your locally build Torch-MLIR repo.

Benchmarking Dispatches

To produce benchmarks of individual dispatches, you can add --dispatch_benchmarks=All --dispatch_benchmarks_dir=<output_dir> to your pytest command line argument. If you only want to compile specific dispatches, you can specify them with a space seperated string instead of "All". E.G. --dispatch_benchmarks="0 1 2 10"

For example, to generate and run dispatch benchmarks for MiniLM on CUDA:

pytest -k "MiniLM and torch and static and cuda" --benchmark_dispatches=All -s --dispatch_benchmarks_dir=./my_dispatch_benchmarks

The given command will populate <dispatch_benchmarks_dir>/<model_name>/ with an ordered_dispatches.txt that lists and orders the dispatches and their latencies, as well as folders for each dispatch that contain .mlir, .vmfb, and results of the benchmark for that dispatch.

if you want to instead incorporate this into a python script, you can pass the dispatch_benchmarks and dispatch_benchmarks_dir commands when initializing SharkInference, and the benchmarks will be generated when compiled. E.G:

shark_module = SharkInference(
        mlir_model,
        device=args.device,
        mlir_dialect="tm_tensor",
        dispatch_benchmarks="all",
        dispatch_benchmarks_dir="results"
    )

Output will include:

  • An ordered list ordered-dispatches.txt of all the dispatches with their runtime
  • Inside the specified directory, there will be a directory for each dispatch (there will be mlir files for all dispatches, but only compiled binaries and benchmark data for the specified dispatches)
  • An .mlir file containing the dispatch benchmark
  • A compiled .vmfb file containing the dispatch benchmark
  • An .mlir file containing just the hal executable
  • A compiled .vmfb file of the hal executable
  • A .txt file containing benchmark output

See tank/README.md for further instructions on how to run model tests and benchmarks from the SHARK tank.

API Reference

Shark Inference API


from shark.shark_importer import SharkImporter

# SharkImporter imports mlir file from the torch, tensorflow or tf-lite module.

mlir_importer = SharkImporter(
    torch_module,
    (input),
    frontend="torch",  #tf, #tf-lite
)
torch_mlir, func_name = mlir_importer.import_mlir(tracing_required=True)

# SharkInference accepts mlir in linalg, mhlo, and tosa dialect.

from shark.shark_inference import SharkInference
shark_module = SharkInference(torch_mlir, device="cpu", mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((input))

Example demonstrating running MHLO IR.

from shark.shark_inference import SharkInference
import numpy as np

mhlo_ir = r"""builtin.module  {
      func.func @forward(%arg0: tensor<1x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {
        %0 = chlo.broadcast_add %arg0, %arg1 : (tensor<1x4xf32>, tensor<4x1xf32>) -> tensor<4x4xf32>
        %1 = "mhlo.abs"(%0) : (tensor<4x4xf32>) -> tensor<4x4xf32>
        return %1 : tensor<4x4xf32>
      }
}"""

arg0 = np.ones((1, 4)).astype(np.float32)
arg1 = np.ones((4, 1)).astype(np.float32)
shark_module = SharkInference(mhlo_ir, device="cpu", mlir_dialect="mhlo")
shark_module.compile()
result = shark_module.forward((arg0, arg1))

Examples Using the REST API

Supported and Validated Models

SHARK is maintained to support the latest innovations in ML Models:

TF HuggingFace Models SHARK-CPU SHARK-CUDA SHARK-METAL
BERT ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
DistilBERT ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
GPT2 ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
BLOOM ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
Stable Diffusion ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
Vision Transformer ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š
ResNet50 ๐Ÿ’š ๐Ÿ’š ๐Ÿ’š

For a complete list of the models supported in SHARK, please refer to tank/README.md.

Communication Channels

Related Projects

IREE Project Channels
MLIR and Torch-MLIR Project Channels

License

nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.

shark's People

Contributors

abhishek-varma avatar amoslewis avatar ayaanshah2204 avatar cstueckrath avatar dan-garvey avatar dependabot[bot] avatar eliasj42 avatar fraserhum avatar godot73 avatar gpetters-amd avatar gpetters94 avatar jinchen62 avatar kuhar avatar m68k-fr avatar makslevental avatar mariecwhite avatar monorimet avatar one-lithe-rune avatar pashu123 avatar phaneeshb avatar powderluv avatar qedawkins avatar raikonenfnu avatar ranvirsv avatar shukla-gaurav avatar sogartar avatar stellaraccident avatar vivekkhandelwal1 avatar xzuyn avatar yzhang93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shark's Issues

Torchvision Models failing for dynamic case on Vulkan backend.

Error log for resnet101 that is similar (if not identical) to the error messages produced from the dynamic vulkan case on a few of our PyTorch models: gist

This error is also encountered for the dynamic vulkan case on the following models:

  1. alexnet_torch
  2. mobilenet_v3_small_torch
  3. resnet101_torch
  4. resnet18_torch
  5. resnet50_torch
  6. squeezenet1_0_torch
  7. wide_resnet50_2

These cases will be xfailed.

Locally generated shark_tank artifacts are not usable for pytests.

In Shark Downloader, we check if local hash for shark_tank artifacts matches upstream hash, and if it doesn't, all artifacts are downloaded from gs://shark_tank for the latest upstream hash, replacing local files.
This becomes a problem if one uses generate_sharktank.py to populate local shark_tank and run tests, as the upstream artifacts are used instead of the local artifacts (in my case, with significant changes).

I think this is of critical importance to our correspondence with the IREE team as well as our SHARK team's development process.
We have a few options to handle this:

  1. add a pytest option to use local files (avoid SHARK downloader)
  2. have shark downloader look in SHARK/gen_shark_tank/ before doing anything with google storage -- if locally generated artifacts are present, don't touch gs://shark_tank and simply use local artifacts.

to reproduce:

python generate_sharktank.py
pytest -s tank/MiniLM-L12-H384-uncased/

it will be evident that the artifacts are replaced by contents of gs://shark_tank/microsoft_MiniLM-L12-H384-uncased_tf/

CUDA memory is not released after individual test cases.

Our SHARK model tests (all gpu cases) do not free some (maybe all) allocated CUDA memory after test execution is completed.

ERROR    root:system_api.py:88 Could not create default driver device cuda
Traceback (most recent call last):
  File "/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 86, in _create_default_iree_driver
    device = driver.create_default_device()
RuntimeError: Error creating default device: iree/runtime/src/iree/hal/drivers/cuda/cuda_device.c:146: INTERNAL; CUDA driver error 'CUDA_ERROR_OUT_OF_MEMORY' (2): out of memory

To reproduce:

  • Setup a system+environment to run GPU tests for SHARK.
  • Run:
pytest tank -k "gpu"
  • (optional but highly recommended to run watch nvidia-smi concurrently to observe in real-time)

TF tiny-random-flaubert numerics issue on Vulkan. (A100)

To reproduce:

pytest -s tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py -k "vulkan"

SHARK results fail to validate against TF golden values:

E       assert True == False
E        +  where False = compare_tensors_tf(<tf.Tensor: shape=(1, 16, 68729), dtype=float32, numpy=\narray([[[ 0.53806955,  0.14671442,  0.        , ..., -0.2818507 ,\n          0.08806332,  0.14761735],\n        [-0.00822675, -0.0385315 ,  0.        , ...,  0.00425125,\n          0.06710303, -0.04765199],\n        [-0.1951161 , -0.1519102 ,  0.        , ...,  0.1955705 ,\n          0.13747491, -0.2091976 ],\n        ...,\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ]]], dtype=float32)>, array([[[ 0.5380049 ,  0.13949418,  0.        , ..., -0.28169703,\n          0.08681311,  0.14958172],\n        [-0.00976601, -0.03920554,  0.        , ...,  0.00616576,\n          0.06795865, -0.0488795 ],\n        [-0.1871761 , -0.15056488,  0.        , ...,  0.19165687,\n          0.13996662, -0.20523356],\n        ...,\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ],\n        [ 0.        ,  0.        ,  0.        , ...,  0.        ,\n          0.        ,  0.        ]]], dtype=float32))

tank/tf/hf_masked_lm/tiny-random-flaubert_tf_test.py:86: AssertionError

HF transformers 4.19.x is broken

(new_dylib_venv) anush@nod-shared-a100-3:~/github/shark$ pytest tank/pytorch/tests/resnet101_test.py::Resnet101ModuleTest::test_module_static_cpu
================================================================================================= test session starts ==================================================================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0 -- /home/anush/github/shark/new_dylib_venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0, typeguard-2.13.3
collecting ... Fatal Python error: Aborted

Current thread 0x00007efd4103a1c0 (most recent call first):
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 304 in _constant_eager_impl
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 279 in _constant_impl
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 267 in constant
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py", line 343 in _constant_tensor_conversion_function
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 1623 in convert_to_tensor
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/profiler/trace.py", line 183 in wrapped
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 264 in args_to_matching_eager
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 77 in non_deterministic_ints_eager_fallback
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_stateful_random_ops.py", line 50 in non_deterministic_ints
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 80 in non_deterministic_ints
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/tensorflow/python/ops/stateful_random_ops.py", line 381 in from_non_deterministic_state
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 349 in TFGenerationMixin
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/generation_tf_utils.py", line 344 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 41 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 38 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 872 in _get_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 862 in __getattr__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 863 in __getattr__
  File "<frozen importlib._bootstrap>", line 1075 in _handle_fromlist
  File "/home/anush/github/shark/tank/pytorch/tests/test_utils.py", line 7 in <module>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/home/anush/github/shark/tank/pytorch/tests/resnet101_test.py", line 3 in <module>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1050 in _gcd_import
  File "/usr/lib/python3.10/importlib/__init__.py", line 126 in import_module
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/pathlib.py", line 533 in import_path
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 608 in _importtestmodule
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 519 in _getobj
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 301 in obj
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 536 in _inject_setup_module_fixture
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/python.py", line 522 in collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 768 in collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in <lambda>
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/runner.py", line 537 in collect_one_node
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 643 in perform_collect
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 332 in pytest_collection
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 321 in _main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/home/anush/github/shark/new_dylib_venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/home/anush/github/shark/new_dylib_venv/bin/pytest", line 8 in <module>

Extension modules: torch._C, torch._C._fft, torch._C._linalg, torch._C._nn, torch._C._sparse, torch._C._special, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, scipy.sparse._csparsetools, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.strptime, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pandas._libs.ops, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg.cython_lapack, scipy.linalg._decomp_update, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label, sentencepiece._sentencepiece (total: 116)
Aborted (core dumped)```

Pinning to 4.18 as a workaround

iree-compile fails on some vision transformers with GPU

FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_dynamic_gpu
FAILED tank/facebook_convnext-tiny-224_tf/facebook_convnext-tiny-224_tf_test.py::ConvNextTinyModuleTest::test_module_static_gpu
FAILED tank/facebook_deit-small-distilled-patch16-224_torch/facebook_deit-small-distilled-patch16-224_torch_test.py::DeitModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_dynamic_gpu
FAILED tank/google_vit-base-patch16-224_tf/google_vit-base-patch16-224_tf_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/google_vit-base-patch16-224_torch/google_vit-base-patch16-224_torch_test.py::VitBaseModuleTest::test_module_static_gpu
FAILED tank/nvidia_mit-b0_torch/nvidia_mit-b0_torch_test.py::MitModuleTest::test_module_static_gpu

Error Log (common for cases shown above):

E         iree.compiler.tools.binaries.CompilerToolError: Error invoking IREE compiler tool iree-compile
E         Diagnostics:
E         
E         
E         Invoked with:
E          iree-compile /data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-compile - --iree-input-type=none --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=cuda --iree-llvm-embedded-linker-path=/data/anush/actions-runner/_work/SHARK/SHARK/shark.venv/lib/python3.10/site-packages/iree/compiler/tools/../_mlir_libs/iree-lld --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false --iree-llvm-target-cpu-features=host --iree-hal-cuda-disable-loop-nounroll-wa --iree-stream-resource-index-bits=64 --iree-vm-target-index-bits=64
E         
E         Need more information? Set IREE_SAVE_TEMPS=/some/dir in your environment to save all artifacts and reproducers.

Fix tensorflow GPU memory management for pytest runs.

Currently, two cases of GPU memory management issues appear when running pytests for Tensorflow masked_lm models.

When running gpu tests for albert_base_v2, the static_gpu case (currently included in this issue) passes if tolerance values for compare_tensors_tf are increased to rtol=1e-02 and atol=1e-01. All of the tests mentioned in that issue pass with the increased tolerances. This isn't really acceptible accuracy, but we are waiting from the IREE team, so we can work around it for now to get memory management squared away.

TF albert on CPU passes for dynamic and static cases only if the tests are run individually. Tensorflow's allocated memory in CUDA does not free up for the second GPU test whether the first passes or not.

If we try bert_static_gpu, however, cuda runs out of memory even when the test is run by itself -- TF allocates ~39GB of gpu memory for the model at the beginning of the test and we run into cuda OOM when shark_module.compile() is called (hal allocation in IREE).

All of the TF model tests in tank/tf/hf_masked_lm/ share this issue.

Alexnet failures on AMD

Alexnet seems to fail static cases on AMD for some reason - but seems like something in the test script than underlying infra


anush@alderlake ~/github/shark
 % pytest tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan
================================================================================= test session starts =================================================================================
platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/anush/github/shark, configfile: pytest.ini
plugins: forked-1.4.0, xdist-2.5.0
collected 1 item                                                                                                                                                                      

tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan FAILED                                                                                   [100%]

====================================================================================== FAILURES =======================================================================================
_____________________________________________________________________ AlexnetModuleTest.test_module_static_vulkan _____________________________________________________________________

a = (<alexnet_torch_test.AlexnetModuleTest testMethod=test_module_static_vulkan>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

shark.venv/lib/python3.10/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tank/alexnet_torch/alexnet_torch_test.py:78: in test_module
    self.module_tester.create_and_check_module(dynamic, device)
tank/alexnet_torch/alexnet_torch_test.py:43: in create_and_check_module
    shark_module.compile()
shark/shark_inference.py:87: in compile
    self.shark_runner = SharkRunner(
shark/shark_runner.py:81: in __init__
    ) = get_iree_compiled_module(
shark/iree_utils/compile_utils.py:122: in get_iree_compiled_module
    return get_iree_module(flatbuffer_blob, device, func_name)
shark/iree_utils/compile_utils.py:106: in get_iree_module
    ctx.add_vm_module(vm_module)
shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:255: in add_vm_module
    self.add_vm_modules((vm_module,))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <iree.runtime.system_api.SystemContext object at 0x7f62b0332e60>, vm_modules = (<VmModule module : [forward, __init]>,)

    def add_vm_modules(self, vm_modules):
      assert self._is_dynamic, "Cannot 'add_module' on a static context"
      for m in vm_modules:
        if m.name in self._bound_modules:
          raise ValueError(f"Attempt to register duplicate VmModule: '{m.name}'")
        bound_module = BoundModule(self, m)
        self._bound_modules[m.name] = bound_module
        if self._tracer:
          self._tracer.add_module(bound_module.traced_module)
>     self._vm_context.register_modules(vm_modules)
E     RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/native_executable.cc:127: UNAVAILABLE; VK_ERROR_INITIALIZATION_FAILED; while invoking native function hal.executable.create; while calling import; 
E     [ 1]   native hal.executable.create:0 -
E     [ 0] bytecode module.__init:1788 <stdin>:134:11
E           at <stdin>:9:3

shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py:252: RuntimeError
-------------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------------
Found Radeon XT Device. Using rdna2-unknown-linux
The models are present in the /home/anush/.local/shark_tank/. If you want a fresh 
                download, consider deleting the directory.
Found Radeon XT Device. Using rdna2-unknown-linux
-------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------
Copying gs://shark_tank/latest/alexnet_torch/hash.npy...
/ [1 files][  640.0 B/  640.0 B]                                                
Operation completed over 1 objects/640.0 B.                                      
'DISPLAY' environment variable not set... skipping surface info
=============================================================================== short test summary info ===============================================================================
FAILED tank/alexnet_torch/alexnet_torch_test.py::AlexnetModuleTest::test_module_static_vulkan - RuntimeError: Error registering modules: iree/runtime/src/iree/hal/drivers/vulkan/na...
================================================================================= 1 failed in 12.42s ==================================================================================

DistilBert fails to lower through torch-mlir pass pipeline (illegal ops)

Error output:

error: failed to legalize operation 'torch.aten.view' that was explicitly marked illegal
note: see current operation: %416 = "torch.aten.view"(%414, %415) : (!torch.vtensor<[?,?,768],f32>, !torch.list<int>) -> !torch.vtensor<[?,?,12,64],f32>                                                                                   
Traceback (most recent call last):
  File "/home/ean/SHARK/generate_sharktank.py", line 180, in <module>
    save_torch_model(args.torch_model_csv)
  File "/home/ean/SHARK/generate_sharktank.py", line 68, in save_torch_model
    mlir_importer.import_debug(
  File "/home/ean/SHARK/shark/shark_importer.py", line 163, in import_debug
    imported_mlir = self.import_mlir(
  File "/home/ean/SHARK/shark/shark_importer.py", line 109, in import_mlir
    return self._torch_mlir(is_dynamic, tracing_required), func_name
  File "/home/ean/SHARK/shark/shark_importer.py", line 74, in _torch_mlir
    return get_torch_mlir_module(
  File "/home/ean/SHARK/shark/torch_mlir_utils.py", line 150, in get_torch_mlir_module
    pm.run(mb.module)
RuntimeError: Failure while executing pass pipeline.

Reproduce:

  • add distilbert-base-uncased,True,hf to tank/pytorch/torch_model_list.csv
  • run python generate_sharktank.py

undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index when running resnet50_script.py

I just made a fresh Python venv and followed the readme instructions to run resnet50_script.py.

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
#Install deps for test script
pip install pillow requests tqdm torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal 

I got this error:

Traceback (most recent call last):
  File "./resnet50_script.py", line 7, in <module>
    from shark.shark_inference import SharkInference
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 12, in <module>
    from shark.torch_mlir_utils import get_torch_mlir_module, run_on_refbackend
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/torch_mlir_utils.py", line 22, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir import (
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/__init__.py", line 13, in <module>
    from torch_mlir.dialects.torch.importer.jit_ir import ClassAnnotator, ModuleBuilder
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/dialects/torch/importer/jit_ir/__init__.py", line 14, in <module>
    from ....._mlir_libs._jit_ir_importer import *
ImportError: /home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1022getCustomClassTypeImplERKSt10type_index

Add Option for setup_venv.sh to choose frontends

Many users have their favourite deep learning framework of choice and not use others. We should set the setup_venv.sh to be have an option to choose whether they are intending to use torch-frontend, tf-frontend, or both. This way users can have a leaner environment!

some tflite tests fail on macOS

(shark.venv) anush@MacStudio shark % pytest tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu 
========================================================================================================================== test session starts ===========================================================================================================================
platform darwin -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0 -- /Users/anush/github/shark/shark.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/anush/github/shark, configfile: pytest.ini
plugins: xdist-2.5.0, forked-1.4.0
collected 1 item                                                                                                                                                                                                                                                         

tank/mobilebert/mobilebert_tflite_test.py::MobilebertTfliteModuleTest::test_module_static_cpu Fatal Python error: Segmentation fault

Current thread 0x0000000104f34580 (most recent call first):
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 75 in _create_default_iree_driver
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/iree/runtime/system_api.py", line 115 in __init__
  File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 102 in get_iree_module
  File "/Users/anush/github/shark/shark/iree_utils/compile_utils.py", line 120 in get_iree_compiled_module
  File "/Users/anush/github/shark/shark/shark_runner.py", line 80 in __init__
  File "/Users/anush/github/shark/shark/shark_inference.py", line 73 in compile
  File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 111 in create_and_check_module
  File "/Users/anush/github/shark/tank/mobilebert/mobilebert_tflite_test.py", line 137 in test_module_static_cpu
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 591 in run
  File "/opt/homebrew/Cellar/[email protected]/3.10.5/Frameworks/Python.framework/Versions/3.10/lib/python3.10/unittest/case.py", line 650 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/unittest.py", line 327 in runtest
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 166 in pytest_runtest_call
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 259 in <lambda>
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 338 in from_call
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 258 in call_runtest_hook
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 219 in call_and_report
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 130 in runtestprotocol
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/runner.py", line 111 in pytest_runtest_protocol
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 347 in pytest_runtestloop
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 322 in _main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/Users/anush/github/shark/shark.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/Users/anush/github/shark/shark.venv/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, tensorflow.python.framework.fast_tensor_util, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, PIL._imaging (total: 40)
zsh: segmentation fault  pytest 

CUDA needs to default to sm_80 and use devicearrays

We need model the CUDA backend in SHARK to be similar to:

https://github.com/nod-ai/transformer-benchmarks/blob/435984a420a2f285f717aa4752c14c0cabfd8c96/benchmark.py#L397-L437


    if use_gpu:
        backend = "cuda"
        backend_config = "cuda"
        args = ["--iree-cuda-llvm-target-arch=sm_80", "--iree-hal-cuda-disable-loop-nounroll-wa"]
        ireert.flags.FUNCTION_INPUT_VALIDATION = False
        ireert.flags.parse_flags("--cuda_allow_inline_execution")

...

    # Setting up input on host and moving to device.
    host_inputs =[encoded_input["input_ids"], encoded_input["attention_mask"], encoded_input["token_type_ids"]]
    if use_gpu:
        device_inputs = [ireert.asdevicearray(config.device, a) for a in host_inputs]
    else:
        device_inputs = host_inputs

Incompatible version range of IREE dependency

The nodai-shark pip package specifies version dependencies to iree-runtime and iree-compiler that are too old.

$ pipdeptree -p nodai-shark
nodai-SHARK==20220810.173
  - iree-compiler [required: >=20220427.13, installed: 20220714.204]
    - numpy [required: Any, installed: 1.22.4]
    - PyYAML [required: Any, installed: 6.0]
  - iree-runtime [required: >=20220427.13, installed: 20220714.204]
    - numpy [required: Any, installed: 1.22.4]
    - PyYAML [required: Any, installed: 6.0]
  - numpy [required: Any, installed: 1.22.4]
  - PyYAML [required: Any, installed: 6.0]
  - torch-mlir [required: >=20220428.420, installed: 20220606.495]
    - numpy [required: Any, installed: 1.22.4]
    - torch [required: ==1.13.0.dev20220606+cpu, installed: 1.13.0.dev20220606+cpu]
      - typing-extensions [required: Any, installed: 4.2.0]

When running with

iree-compiler      20220604.24
iree-runtime       20220604.24

I get this error

$ python ./resnet50_script.py --device="cpu"
/home/petkantchin/.local/shark_tank/
load image from https://upload.wikimedia.org/wikipedia/commons/2/26/YellowLabradorLooking_new.jpg
Copying gs://shark_tank/274650f/resnet50_torch/function_name.npy...
Copying gs://shark_tank/274650f/resnet50_torch/golden_out.npz...                
Copying gs://shark_tank/274650f/resnet50_torch/hash.npy...                      
Copying gs://shark_tank/274650f/resnet50_torch/inputs.npz...                    
\ [4 files][593.2 KiB/593.2 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://shark_tank/274650f/resnet50_torch/resnet50_dynamic_torch.mlir...
Copying gs://shark_tank/274650f/resnet50_torch/resnet50_torch.mlir...           
- [6 files][391.5 MiB/391.5 MiB]   10.7 MiB/s                                   
Operation completed over 6 objects/391.5 MiB.                                    
Target triple found:x86_64-linux-gnu
ERROR:root:Could not create driver local-task (not registered)
Traceback (most recent call last):
  File "./resnet50_script.py", line 72, in <module>
    shark_module.compile()
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_inference.py", line 87, in compile
    self.shark_runner = SharkRunner(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/shark_runner.py", line 81, in __init__
    ) = get_iree_compiled_module(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 120, in get_iree_compiled_module
    return get_iree_module(flatbuffer_blob, device, func_name)
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/shark/iree_utils/compile_utils.py", line 102, in get_iree_module
    config = ireert.Config(IREE_DEVICE_MAP[device])
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 115, in __init__
    self.driver = _create_default_iree_driver(
  File "/home/petkantchin/ws/nodlabs/shark/shark_venv/lib/python3.8/site-packages/iree/runtime/system_api.py", line 97, in _create_default_iree_driver
    raise RuntimeError(
RuntimeError: Could not create any requested driver ['local-task'] (available=['cuda', 'dylib', 'dylib-sync', 'vmvx', 'vmvx-sync', 'vulkan']) : {}

Updating IREE to 20220714.204 fixed the issue.

iree-compiler      20220714.204
iree-runtime       20220714.204

I suspect that the dependency version requirements has to be fixed. Other earlier versions may be OK as well. I have not checked.

GPU benchmarks for PyTorch tests benchmark on CPU instead.

Currently, there is no support for benchmarking pytorch models on CUDA via pytest.

SHARK/setup_venv.sh should be updated with a GPU_BENCHMARKS flag to uninstall the CPU version of Pytorch Nightly and replace with CUDA version.

SHARK/shark/shark_benchmark_runner.py::SharkBenchmarkRunner has a torch_benchmark method that should be updated to run with GPU/CUDA for gpu pytest cases.

TF tapas-base import requirements aren't met with IMPORTER=1 ./setup_venv.sh

To reproduce:

pytest tank/tf/hf_masked_lm/tapas-base_tf_test.py -k "static_cpu"

Error output:

E           ImportError: 
E           TFTapasMainLayer requires the tensorflow_probability library but it was not found in your environment. You can install it with pip as
E           explained here: https://github.com/tensorflow/probability.

I wasn't able to get this to work by pip installing tfp-nightly -- if we can get it to work let's make it run out of the box for IMPORTER=1.

fix xlm-roberta lowering

pytest benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu]```

fails with:
====================================================== short test summary info =======================================================FAILED benchmarks/tests/test_benchmark.py::test_bench_xlm_roberta[False-cpu] - OSError: Can't load tokenizer for 'xlm-roberta-base'...========================================================= 1 failed in 12.31s =========================================================

minilm_jit example doesn't work

(shark.venv) a@debian-1:~/github/dshark$ python -m  shark.examples.minilm_jit
/home/a/github/dshark/shark.venv/lib/python3.7/site-packages/torch/nn/modules/module.py:1403: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  " and ".join(warn_msg) + " are deprecated. nn.Module.state_dict will not accept them in the future. "
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at microsoft/MiniLM-L12-H384-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Target triple found:x86_64-linux-gnu
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
(shark.venv) a@debian-1:~/github/dshark$    

Enhancements/Fixes to HF Benchmark Runtime

HF Benchmarker is a module within SHARK that enable easy testing of HF models with ONNX, Torch, TF, and SHARK-RT of course. this work is based of SharkBenchmarker for MLIR part and Microsoft Transformer Benchmark.
EDIT: nightly ORT did not fix GPU nor did it fix TF.

Some issues/Enhancements that need fixing

1. Integrate running of TF in HF-Benchmarker.

Has some Runtime issues wrt RuntimeError: Intra op parallelism cannot be modified after initialization. and RuntimeError: Visible devices cannot be modified after being initialized. See https://github.com/microsoft/onnxruntime/issues/ 11751 for more details.

2. Fix up HF Benchmark Runtime with GPU

Currently the only supported device is CPU, since we will get OOM with GPU. The problem lies within importing of onnxruntime causes to load 39GB of data into the GPU, this leaves very little space for us to load our model and even run anything.

TF roberta/XLM roberta numerics issues on A100 if num_iterations >= 100

XLM-roberta assert failure:

>       np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.01
E       
E       Mismatched elements: 5505 / 4000032 (0.138%)
E       Max absolute difference: 0.09074688
E       Max relative difference: 3171.7234
E        x: array([[[ 2.683771,  0.183121, 10.453473, ...,  6.315439,  2.047505,
E                 3.32532 ],
E               [-0.482143,  0.061366,  9.494564, ...,  6.593861,  1.620899,...
E        y: array([[[ 2.671124,  0.182537, 10.456981, ...,  6.322483,  2.0[515](https://github.com/nod-ai/SHARK/runs/7868468050?check_suite_focus=true#step:9:516)46,
E                 3.322179],
E               [-0.481575,  0.061454,  9.495419, ...,  6.59101 ,  1.619549,...

roberta-base-tf assert failure:

>       np.testing.assert_allclose(golden_out, result, rtol=1e-01, atol=1e-02)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.01
E       
E       Mismatched elements: 453 / 804240 (0.0563%)
E       Max absolute difference: 0.04533577
E       Max relative difference: 763.70135
E        x: array([[[33.55235 , -3.827327, 18.863625, ...,  3.420343,  6.171632,
E                11.648125],
E               [-0.598835, -4.141003, 14.904708, ..., -4.515923, -1.790529,...
E        y: array([[[33.567413, -3.829913, 18.870962, ...,  3.422938,  6.174327,
E                11.656706],
E               [-0.58585 , -4.141752, 14.913631, ..., -4.516505, -1.788759,...

To reproduce:

On a100 instance,

  • remove xfail for gpu case in tank/roberta-base_tf/roberta-base_tf_test.py
  • remove xfail for gpu case in tank/xlm-roberta-base_tf/xlm-roberta-base_tf.py
  • run: pytest tank/*roberta -k "gpu"

Checkpoint model

Can we get a save method to checkpoint/save the model/save the vmfb S.T we do not need to recompile from scratch every time we run the script.

add option to --save_mlir to pytest runs

Add support to save mlir files when running the tests

(new_dylib_venv) 139 anush@nod-shared-a100-3:~/github/shark$ IREE_SAVE_TEMPS=iree_temps_bert_dynamic  pytest tank/pytorch/bert_test.py::BertModuleTest::test_module_dynamic_cpu --save_mlir
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --save_mlir
  inifile: /home/anush/github/shark/pytest.ini
  rootdir: /home/anush/github/shark```

Feature Request: Flag for showing result of each dispatch

It would be helpful for debugging to be able to see the result of each dispatch when running a module. IREE already does this, e.g.

$ iree-run-module --device=vulkan --entry_function=forward --function_input=1x4xf32=1.0 --module_file=model.vmfb
EXEC @forward
=== forward_dispatch_0::forward_dispatch_0_generic_3x4 inputs ===

=== forward_dispatch_0::forward_dispatch_0_generic_3x4 outputs ===
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]

=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 inputs ===
1x4xf32=[1 1 1 1]
4x3xf32=[1 0 0][0 0 0][0 0 0][0 0 0]

=== forward_dispatch_1::forward_dispatch_1_matmul_1x3x4 outputs ===
1x3xf32=[1 0 0]

result[0]: hal.buffer_view
1x3xf32=[1 0 0]

This could be a flag such as

shark_module = SharkInference(
    model, func_name, device="vulkan", mlir_dialect="linalg", print_dispatches=True
)

and gives a numpy view of the dispatch results or just shows the iree output.

Longformer-base-4096 fails import to IREE (illegal ops)

All cases fail with the following error on TensorFlow longformer:

E         <unknown>:0: error: The following illegal operations still remain: 
E               tf.BatchMatMulV2 (count: 24)
E               tf.StridedSlice (count: 24)
E               tf.Tile (count: 12)
E               tf.TensorScatterAdd (count: 36)
E               tf.Where (count: 3)

To reproduce:

pytest tank/tf/hf_masked_lm/longformer-base-4096_tf_test.py -k "static_cpu"

(errors are the same for all cases, so one test case should be sufficient for repro purposes.)

Improvements to pytest --benchmark option.

Several features/improvements to SHARK's pytest --benchmark option are tracked in this issue:

  • Improve "frontend" / MLIR dialect argument transmission through SharkBenchmarkRunner
  • Verify benchmark results for PyTorch+CUDA on Vision Models.
  • Benchmarks in CI should upload bench_results.csv to gs://iree-shared-files/nod-perf/bench_results/{Y-M-D}/bench_results_{cpu/gpu}_{github-SHA}.csv (#241 )
  • Update README with benchmarking instructions. (#239 )
  • Enable pytest --benchmark for TensorFlow shark tank module tests.
  • Add options to setup_venv.sh for ONNX benchmarking requirements
  • Benchmarks should be able to produce ONNX results and provide better data in generated results. (see: nod-ai/transformer-benchmarks)
  • Make benchmark results more accessible -- upload to gs://shark-public/builder/...
  • Thread counts
  • save compile-time flags
  • useful logs, traces, etc.
  • metadata
  • comparison %'s

"is_zero" is undefined running resnet50 script.

To reproduce...

cloned the repo and tried running the examples both resnet and minilm.

I keep getting

RuntimeError: required keyword attribute 'is_zero' is undefined

seems to have something to do with ModuleBuilder -> mb.import_module(module._c, class_annotator)
env
Using the Apple Silicon M1 Snapshot version of torch-mlir.
Running on M1 macbook, python 3.9

attached screenshot of both resnet50_script and minilm

Disclaimer: new to torch-mlir

image

ci - improvment to-do-list

  1. because of hash checking local artifacts of the nightly build aren't being tested, the existing latest will instead, this has the downstream effect of making impossible to automatically pass checks when a change to the tank is made.

  2. add a ci that tests the generated pip packages in an end-user style

Intel macOS crashes with loading libtorch twice

Upstream issue is here: llvm/torch-mlir#853

Workaround:

# Replace shark_venv with whatever your venv is
cd shark_venv/lib/python3.10/site-packages/torch_mlir/.dylibs
rm *.dylib
ln -s ../../torch/lib/libc10.dylib
ln -s ../../torch/lib/libshm.dylib
ln -s ../../torch/lib/libtorch.dylib
ln -s ../../torch/lib/libtorch_cpu.dylib
ln -s ../../torch/lib/libtorch_python.dylib

Numerical Errors Due to Reduced Precision from TF32

With iree-org/iree#9975 and other upcoming changes, we'll be looking to enable TensorCore on more kernels for performance. This may change results in some tests enough to fail assertions checking for correctness.

EX: distilbert tf

========================================================================== FAILURES ===========================================================================
_________________________________________________________ DistilBertModuleTest.test_module_static_gpu _________________________________________________________

self = <distilbert-base-uncased_tf_test.DistilBertModuleTest testMethod=test_module_static_gpu>

    @pytest.mark.skipif(
        check_device_drivers("gpu"), reason=device_driver_info("gpu")
    )
    def test_module_static_gpu(self):
        dynamic = False
        device = "gpu"
>       self.module_tester.create_and_check_module(dynamic, device)

tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:48: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <distilbert-base-uncased_tf_test.DistilBertModuleTester object at 0x7fcf101fb430>, dynamic = False, device = 'gpu'

    def create_and_check_module(self, dynamic, device):
        model, func_name, inputs, golden_out = download_tf_model(
            "distilbert-base-uncased"
        )
    
        shark_module = SharkInference(
            model, func_name, device=device, mlir_dialect="mhlo"
        )
        shark_module.compile()
        result = shark_module.forward(inputs)
>       np.testing.assert_allclose(golden_out, result, rtol=1e-02, atol=1e-03)
E       AssertionError: 
E       Not equal to tolerance rtol=0.01, atol=0.001
E       
E       Mismatched elements: 4292 / 488352 (0.879%)
E       Max absolute difference: 0.02955437
E       Max relative difference: 48.456425
E        x: array([[[ -6.442754,  -6.393649,  -6.419188, ...,  -5.638614,
E                 -5.491579,  -3.414548],
E               [ -7.036943,  -6.988676,  -7.100483, ...,  -6.865986,...
E        y: array([[[ -6.442857,  -6.394039,  -6.419235, ...,  -5.639162,
E                 -5.492108,  -3.414864],
E               [ -7.039788,  -6.991871,  -7.102982, ...,  -6.868385,...

tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py:28: AssertionError
=================================================================== short test summary info ===================================================================
FAILED tank/distilbert-base-uncased_tf/distilbert-base-uncased_tf_test.py::DistilBertModuleTest::test_module_static_gpu - AssertionError: 
===================================================================== 1 failed in 45.37s ======================================================================

Looks like the expected value difference is 0.02955437 just above 0.01 tolerance. This and other tolerances may needed to be updated.

Change frontend from strings to enum

Currently backend selection is in string, while it's great/working for now. May be confusing on what backends are valid, and may produce bugs later on (for example: typos in string can just flow through compile phase and compile "something" but will produce error later on, will be hard to debug this issue if someone doesn't realize the typo).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.