Coder Social home page Coder Social logo

fastmachinelearning / qonnx Goto Github PK

View Code? Open in Web Editor NEW
116.0 20.0 39.0 5.13 MB

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

Home Page: https://qonnx.readthedocs.io/

License: Apache License 2.0

Python 92.58% PureBasic 0.01% Jupyter Notebook 7.42%
deep-learning fpga inference machine-learning onnx quantization quantized-neural-networks

qonnx's Introduction

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

ReadTheDocs GitHub Discussions Tests Code style DOI PyPI version Downloads

QONNX example

QONNX (Quantized ONNX) introduces three new custom operators -- Quant, BipolarQuant, and Trunc -- in order to represent arbitrary-precision uniform quantization in ONNX. This enables:

  • Representation of binary, ternary, 3-bit, 4-bit, 6-bit or any other quantization.
  • Quantization is an operator itself, and can be applied to any parameter or layer input.
  • Flexible choices for scaling factor and zero-point granularity.
  • Quantized values are carried using standard float datatypes to remain ONNX protobuf-compatible.

This repository contains a set of Python utilities to work with QONNX models, including but not limited to:

  • executing QONNX models for (slow) functional verification
  • shape inference, constant folding and other basic optimizations
  • summarizing the inference cost of a QONNX model in terms of mixed-precision MACs, parameter and activation volume
  • Python infrastructure for writing transformations and defining executable, shape-inferencable custom ops
  • (experimental) data layout conversion from standard ONNX NCHW to custom QONNX NHWC ops

Quickstart

Operator definitions

  • Quant for 2-to-arbitrary-bit quantization, with scaling and zero-point
  • BipolarQuant for 1-bit (bipolar) quantization, with scaling and zero-point
  • Trunc for truncating to a specified number of bits, with scaling and zero-point

Installation

pip install qonnx

Export, Import and Model Zoo

The following quantization-aware training (QAT) frameworks support exporting to QONNX:

The following NN inference frameworks support importing QONNX models for deployment:

Head to the QONNX model zoo to download pre-trained QONNX models on various datasets.

Model Visualization

We recommend Netron for visualizing QONNX models.

Executing ONNX graph with QONNX custom nodes

Using the qonnx-exec command line utility, with top-level inputs supplied from in0.npy and in1.npy:

qonnx-exec my-qonnx-model.onnx in0.npy in1.npy

Using the Python API:

from qonnx.core.modelwrapper import ModelWrapper
from qonnx.core.onnx_exec import execute_onnx

model = ModelWrapper("my-qonnx-model.onnx")
idict = {"in0" : np.load("in0.npy), "in1" : np.load("in1.npy")}
odict = execute_onnx(idict)

Calculate inference cost for QONNX model

Using the qonnx-inference-cost command line utility for the CNV_2W2A example:

qonnx-inference-cost CNV_2W2A.onnx

Which will print a inference cost dictionary like the following:

Inference cost for CNV_2W2A.onnx
{
  "discount_sparsity": true,    # discount MAC counts by layer sparsity (disregard zero-valued MACs and params)
  # mem_o_X: number of layer outputs with datatype X
  "mem_o_INT32": 142602.0,       # number of INT32 output elements
  # mem_o_X: number of layer parameters (weights) with datatype X
  "mem_w_INT2": 908033.0,      # number of INT2 parameters (weights)
  # op_mac_X_Y: number of MAC operations, datatype X by datatype Y
  # scaled integer datatypes have a tensor- or channelwise scale factor
  "op_mac_SCALEDINT<8>_INT2": 1345500.0, # number of scaled int8 x int2 MACs
  "op_mac_INT2_INT2": 35615771.0,   # number of int2 x int2 MACs
  "total_bops": 163991084.0,        # total number of MACs normalized to bit-ops (BOPS)
  "total_mem_o_bits": 4563264.0,    # total number of bits for layer outputs
  "total_mem_w_bits": 1816066.0,    # total number of bits for layer parameters
  "unsupported": "set()"
}

You can use the --cost-breakdown option to generate a more detailed report that covers per-node (by name) and per-op-type information. You can read more about the BOPS metric in this paper, Section 4.2 Bit Operations.

Convert between different quantization representations

Using the qonnx-convert command line utility you can convert from QONNX to QCDQ-style quantization:

qonnx-convert CNV_2W2A.onnx

This will convert Quant nodes to QuantizeLinear -> Clip -> DequantizeLinear nodes where possible. Please see the documentation of the QuantToQCDQ transformation to learn more about the limitations.

Development

Install in editable mode in a Python virtual environment:

git clone https://github.com/fastmachinelearning/qonnx
cd qonnx
virtualenv -p python3.8 venv
source venv/bin/activate
pip install --upgrade pip
pip install -e .[qkeras,testing]

Running tests

Run entire test suite, parallelized across CPU cores:

pytest -n auto --verbose

Run a particular test and fall into pdb if it fails:

pytest --pdb -k "test_extend_partition.py::test_extend_partition[extend_id1-2]"

Linting

If you plan to make pull requests to the qonnx repo, linting will be required. We use a pre-commit hook to auto-format Python code and check for issues. See https://pre-commit.com/ for installation. Once you have pre-commit, you can install the hooks into your local clone of the qonnx repo:

cd qonnx
source venv/bin/activate
pip install pre-commit
pre-commit install

Every time you commit some code, the pre-commit hooks will first run, performing various checks and fixes. In some cases pre-commit won’t be able to fix the issues and you may have to fix it manually, then run git commit once again. The checks are configured in .pre-commit-config.yaml under the repo root.

Why QONNX?

The QONNX representation has several advantages compared to other alternatives, as summarized in the table below. These include a compact but flexible, single-node quantization representation that avoids operator duplication and can support arbitrary precision up to the container datatype limit.

QONNX comparison table

Community

The QONNX efforts were started by the FINN and hls4ml communities working together to create a common, arbitrary-precision representation that both frameworks could ingest. However, QONNX aims to build an open-source community for practitioners and researchers working with mixed-precision quantized neural networks by providing useful tools and a discussion forum.

hls4ml

Resources

You can read more about QONNX in this paper. If you find QONNX useful in your work, please consider citing:

@inproceedings{Pappalardo:2022nxk,
    author = "Pappalardo, Alessandro and Umuroglu, Yaman and Blott, Michaela and Mitrevski, Jovan and Hawks, Ben and Tran, Nhan and Loncar, Vladimir and Summers, Sioni and Borras, Hendrik and Muhizi, Jules and Trahms, Matthew and Hsu, Shih-Chieh Hsu and Hauck, Scott and Duarte, Javier"
    title = "{QONNX: Representing Arbitrary-Precision Quantized Neural Networks}",
    booktitle = "{4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference}",
    eprint = "2206.07527",
    archivePrefix = "arXiv",
    primaryClass = "cs.LG",
    reportNumber = "FERMILAB-CONF-22-471-SCD",
    month = "6",
    year = "2022",
    url = "https://accml.dcs.gla.ac.uk/papers/2022/4thAccML_paper_1(12).pdf"
}

@software{yaman_umuroglu_2023_7622236,
  author       = "Umuroglu, Yaman and Borras, Hendrik and Loncar, Vladimir, and Summers, Sioni and Duarte, Javier",
  title        = "fastmachinelearning/qonnx",
  month        = {06},
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.7622236},
  url          = {https://github.com/fastmachinelearning/qonnx}
}

qonnx's People

Contributors

auphelia avatar harsh9650 avatar heborras avatar i-colbert avatar iksnagreb avatar jicampos avatar jmduarte avatar jmitrevs avatar maltanar avatar mmrahorovic avatar thephysicsboi avatar thesps avatar vloncar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qonnx's Issues

Changed behavior after BatchNormToAffine transformation

Quick summary

A standalone BatchNormalization node with certain settings (see .onnx file in .zip folder: bn_model.zip) changes its functional behavior, when transformed with the BatchNormToAffine transformation.

Steps to Reproduce

  1. The issue was observed when using the FINN docker container, but with the current main branch of qonnx (commit hash: 12c96a3ded06beacab08e0f554e4ed014476c0aa).
  2. Run transformation BatchNormToAffine on ONNX file.
  3. Execute model before and after the transformation with random floating point input (x = gen_finn_dt_tensor(DataType["FLOAT32"], (1, 64, 64, 64)) inp_dict = {"global_in": x})
  4. Compare execution of the model before and after the transformation.

Expected behavior

The outputs before and after do not match.

Actual behavior

The functional behavior should not change due to the transformation.

Possible fix

It seems to be a rounding error, coming from this calculation: A = scale / np.sqrt(epsilon + variance)

Support `QConv2DBatchnorm` and `QDenseBatchnorm` layers

Details

Currently, QConv2DBatchnorm and QDenseBatchnorm are not supported for conversion in #7

New behavior

Support QConv2DBatchnorm and QDenseBatchnorm layers for QKeras to QONNX conversion

Motivation

This would allow MLPerf Tiny models, which use these QKeras layers, to be converted from

Parts of QONNX being affected

Mainly (Q)Keras conversion files src/qonnx/converters/qkeras/qlayers.py and src/qonnx/converters/keras.py

@rushtheboy @nhanvtran @julesmuhizi

Change NodeLocalTransformation Pool to use spawn instead of fork

Prerequisites

Current main commit: db969e6

Quick summary

The use of mp.Pool in qonnx/transformation/base.py for NodeLocalTransform can cause deadlocks in certain cases.

Details

During work on FINN, I encountered an issue where calling HLSSynthIP() (which inherits from NodeLocalTransform) in a multithreaded context could deadlock the processes from the MP pool. This is very likely caused by Python's start method defaulting to 'fork'. It is a well known issue and the solution is mostly to change the start method either globally or locally using get_context("spawn").Pool(...). Arguably a transform designed to be parallelized should not be multithreaded as well normally, however the default start method will be switched to spawn in Python 3.14 anyways, and changing it to spawn manually for earlier versions does not have any negative impacts and might prevent issues in the best case.

inference_cost_matmul: Confusion or Bug regarding the MAC-count of Scaled Dot-Product Attention

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out. 6ca8f8e
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

While working on our characterization of the transformer data-flow we encountered some discrepancies when validating against the QONNX inference_cost estimations of the MatMul operator within the attention mechanism. We are not entirely sure whether this is indeed a bug on the QONNX side or still some confusion/error on our side. Thus we would like to start a discussion to understand this issue.

Details

Multi-Head Scaled Dot-Product Attention involves two consecutive MatMul operations where both inputs dynamically depend on the model inputs. The heads are independent of each other and typically treated in a way similar to a batch dimension. Our cost model assumes HxTxTxd MAC operations for each of the two MatMuls, i.e. H heads each producing a TxT attention matrix (T is the sequence length) where each element is the result of a d-dimensional dot-product. However, the QONNX analysis function inference_cost_matmul seems to be off by an additional factor of H (i.e. HxHxTxTxd), indicating the heads are not treated like a batch dimension.

My suspicion is further raised by the following lines from the QONNX inference_cost_matmul function:

# exclude common dim (last axis) from one side to avoid duplication
n_macs = np.prod(i_shape[:-1]) * np.prod(w_shape)

Is this actually always the case? At least for the model graph I have attached it seems like the last axis is not the common dimension.

In the following, I provide a minimal working example of a scaled dot-product attention in isolation in PyTorch exporting to an ONNX graph. I have also attached the already preprocessed graph which in particular already includes the InferShapes transform. Note that running the qonnx.util.inference_cost script on the PyTorch ONNX export breaks at the FoldConstants transform due to IndexError which is probably unrelated and should be investigated separately (I have "fixed" it by removing that transformation step for now).

Steps to Reproduce

The following code produces a minimal example of scaled dot-product attention and exports to ONNX.

import torch


# Minimal working example of the Scaled Dot-Product Attention mechanism
class ScaleDotProductAttention(torch.nn.Module):
    # Initializes the module parameters
    def __init__(self, num_heads):
        # Initialize the PyTorch base Module
        super().__init__()
        # Set the number of attention heads
        self.num_heads = num_heads

    # Forward pass computing scaled dot-product attention between q, k and v
    def forward(self, q, k, v):
        # Assume the most simple case of q, k and v all having the same
        # dimensions
        assert q.shape == k.shape == v.shape, \
            "Q, K and V must have the same shape"
        # Embedding dimension must be divisible by number of heads
        assert q.shape[-1] % self.num_heads == 0, \
            f"Dimensions must be divisible by heads ({self.num_heads})"

        # Assume sequence first layout and get the sizes per axis
        s, b, d = q.shape
        # Number of heads and dimension per head
        n_head, d_head = self.num_heads, d // self.num_heads

        # Reshape tensors to treat the heads like batch dimensions
        q = q.reshape(s, b, n_head, d_head).reshape(s, b * n_head, d_head)
        k = k.reshape(s, b, n_head, d_head).reshape(s, b * n_head, d_head)
        v = v.reshape(s, b, n_head, d_head).reshape(s, b * n_head, d_head)
        # Compute the not-yet-normalized attentions matrices for each head.
        #   Note: permute brings batch x heads to front and transposes k
        a = torch.matmul(q.permute(1, 0, 2), k.permute(1, 2, 0))
        # Scale and normalize the attention matrix
        a = torch.softmax(a * (d_head ** -0.5), dim=-1)
        # Apply the attention matrices to the value projection
        #   Note: Second permute brings sequence dimension back to front
        o = torch.matmul(a, v.permute(1, 0, 2)).permute(1, 0, 2)
        # Reshape heads into feature dimension
        o = o.reshape(s, b, n_head, d_head).reshape(s, b, n_head * d_head)

        # Return the scaled dot-product attention output
        return o


# Script entrypoint
if __name__ == '__main__':
    # Instantiate a scale dot-product attention with 4 attention heads
    sdp = ScaleDotProductAttention(num_heads=4)
    # Generate random query, key and value tensors
    #   Note: Sequence of length 64, single instance batch, 128 dim embeddings
    q, k, v = torch.randn(3, 64, 1, 128)
    # Export the attention module to ONNX
    torch.onnx.export(sdp, args=(q, k, v), f='sdp.onnx')

Get MAC operation counts by running

python -m qonnx.util.inference_cost sdp.onnx

Outputs something like

{'op_mac_FLOAT32_FLOAT32': 4194304.0, 'mem_w_FLOAT32': 0.0, 'mem_o_FLOAT32': 24576.0, 'unsupported': "{'Softmax', 'Pow', 'Constant'}", 'discount_sparsity': True, 'total_bops': 4294967296.0, 'total_mem_w_bits': 0.0, 'total_mem_o_bits': 786432.0}

Expected behavior

According to our cost model, the MAC count should be 2x HxTxTxd, which for the given example model is 2x 4x64x64x32 = 1048576.

Actual behavior

The MAC count is reported as 4194304, which is 4x (Hx) our expectation, indicating a cost function of 2x HxHxTxTxd.

Attached ONNX Graph

sdp.costs.onnx.zip
sdp onnx

Transformations should return whether a model was modified or not

Details

Currently a transformation on a model in looks like this:

model = model.transform(MoveChanLastUpstream())

The executed transformation, in this case MoveChanLastUpstream, may change the model, or it may not. In some cases it would be good to know if a transformation had any effect on the model. As example in the channelsLast transformation.
Here I need to check if the transformation has converged to then stop iterating over the model. Currently this is solved, by converting the ONNX model to a sting at the beginning of the loop: https://github.com/fastmachinelearning/qonnx/blob/feature/qonnx_chan_last/src/qonnx/transformation/channelsLast.py#L46
And then checking at the end if anything changed: https://github.com/fastmachinelearning/qonnx/blob/feature/qonnx_chan_last/src/qonnx/transformation/channelsLast.py#L69
And while this works fine it might incur large compute and memory penalties for larger models.

New behavior

When the ModelWrapper class is ported over to QONNX it would be nice if the behavior of the transform member function could change.
In particular it would be nice if the function could optionally return whether a model was modified. As example like this:

model, modified = model.transform(MoveChanLastUpstream(), return_was_modified=True)
model = model.transform(MoveChanLastUpstream(), return_was_modified=False)

With the default set as return_was_modified=False, this would keep the interface stable.

Motivation

This change would simplify development and make parts of QONNX faster. In particular the channelsLast transformation could benefit form this change: https://github.com/fastmachinelearning/qonnx/blob/feature/qonnx_chan_last/src/qonnx/transformation/channelsLast.py#L34

Parts of QONNX being affected

This change would affect all transformations. In particular all transformations would then be required to return a boolean flag to indicate whether they modified the model or not.

InferShapes fails after FoldTransposeIntoQuantInit

This problem reported over at FINN, see Xilinx/finn#878 and Xilinx/finn#892, probably belongs here. I will try to contribute a fix soon. I think the FoldTransposeIntoQuantInit transform should not apply in cases where the predecessor Quant node has any input which which is not an initializer. However, currently it always deletes the transpose node irrespective of the inputs or initializers.

RemoveIdentityOps does not correctly handle identity operations following fork-nodes

This problem reported over at FINN belongs here, see Xilinx/finn#878 (comment).

RemoveIdentityOps seems to be broken if the identity operation directly follows a fork-node. It seemingly rewires the fork-node output only into that branch containing the identity op, disconnecting the others. Currently this happens when using packed input projections in multi-head scaled dot-product attention. I will try to fix this soon, probably only requires an additional test added to the remove_node_and_rewire utility function.

Channels last conversion fails when Conv has a bias

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out: b9d231a
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

It seems that the channels last conversion fails when there is a Conv operator with a 1D bias with:

AssertionError: Channels last conversion is only available for 3D and 4D tensors.

Details

Using in python 3.9 with the current head of qonnx (b9d231a), running this script:

#!/usr/bin/env python

from qonnx.core.modelwrapper import ModelWrapper
from qonnx.util.cleanup import cleanup_model
from qonnx.transformation.channels_last import ConvertToChannelsLastAndClean
from qonnx.transformation.gemm_to_matmul import GemmToMatMul

model = ModelWrapper("super_resolution.onnx")
model = cleanup_model(model)
model = model = model.transform(ConvertToChannelsLastAndClean(make_input_channels_last=True))
model = cleanup_model(model)
model = model.transform(GemmToMatMul())
model = cleanup_model(model)

over the file found at https://github.com/julesmuhizi/pytorch_hls4ml/blob/7f4459de1d69c1642dd858ff1a970509a61ff017/model/CNN/super_resolution.onnx produces the exception produced above. Opening it in the debugger shows that the error is when processing inp = 'Conv_0_param1', the 1D bias. Is the solution to just pass through 1D tensors?

Exception: Found multiple get_by_name matches, undefined behavior

Hello,

When creating the bitfile for a model that I exported from brevitas, the following error is raised during the streamlining step:

    model = model.transform(GiveReadableTensorNames())
  File "/tools/Xilinx/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 140, in transform
    (transformed_model, model_was_changed) = transformation.apply(transformed_model)
  File "/tools/Xilinx/finn/deps/qonnx/src/qonnx/transformation/general.py", line 135, in apply
    model.rename_tensor(i, "%s_param%d" % (n.name, init_in_num))
  File "/tools/Xilinx/finn/deps/qonnx/src/qonnx/core/modelwrapper.py", line 301, in rename_tensor
    if util.get_by_name(graph.initializer, old_name) is not None:
  File "/tools/Xilinx/finn/deps/qonnx/src/qonnx/util/basic.py", line 82, in get_by_name
    raise Exception("Found multiple get_by_name matches, undefined behavior")
Exception: Found multiple get_by_name matches, undefined behavior

Possible Fix

I observed that two nodes in my model are swapped during the streamlining step (ie. node 13 --> node 14, node 14 --> node 13) step, and as their input tensor are renamed during a GiveReadableTensorNames transformation, at some point two input parameters have the same names, which is not supposed to happen because GiveReadableTensorNames calls GiveRandomTensorNames which is supposed gives random names to all the tensors. The error then probably comes from Model.get_all_tensor_names() which probably doesn't actually returns all tensor names in the model.

Missing Shapes after Cleanup transformation

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

cleanup transformation not fixing shape issues related the onnx graph of renet-18.

Details

Please add to the following sections to describe the bug as accurately as possible.

Steps to Reproduce

import netron
import os # netron
from IPython.display import IFrame # netron

import urllib.request
from qonnx.util.cleanup import cleanup
from qonnx.core.modelwrapper import ModelWrapper

def showInNetron(model_filename: str, localhost_url: str = None, port: int = None):
"""Shows a ONNX model file in the Jupyter Notebook using Netron.

:param model_filename: The path to the ONNX model file.
:type model_filename: str

:param localhost_url: The IP address used by the Jupyter IFrame to show the model.
 Defaults to localhost.
:type localhost_url: str, optional

:param port: The port number used by Netron and the Jupyter IFrame to show
 the ONNX model.  Defaults to 8081.
:type port: int, optional

:return: The IFrame displaying the ONNX model.
:rtype: IPython.lib.display.IFrame
"""
try:
    port = port or int(os.getenv("NETRON_PORT", default="8081"))
except ValueError:
    port = 8081
localhost_url = localhost_url or os.getenv("LOCALHOST_URL", default="localhost")
netron.start(model_filename, address=("0.0.0.0", port), browse=False)
return IFrame(src=f"http://{localhost_url}:{port}/", width="100%", height=400)

model_url = ("https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet18-v1-7.onnx?download=")
dl_file = "/tmp/resnet18-v1-7.onnx"
urllib.request.urlretrieve(model_url, dl_file)
out_file = "/tmp/resnet18-v1-7_clean.onnx"
cleanup(dl_file, out_file=out_file, override_batchsize = 1) # Batchsize for resnet18 is set to 1.
showInNetron(out_file)

Expected behavior

All tensors should have their respective shapes after cleanup.

Actual behavior

Most tensors missing shapes.

Optional

Possible fix

Transforming the model using InferShapes transformation (from qonnx.transformation.infer_shapes import InferShapes)

Additional context

Faced this problem with only the model specified above.

QKeras qconv2d test failure

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

Please give a brief and concise description of the bug.

Using commit hash from latest main (d7afcbd) and a particular random seed for pytest-randomly, one of the QKeras conversion tests (test_qkeras_qconv2d_1[11]) fails with somewhat large deviations from the expected value.

Details

Please add to the following sections to describe the bug as accurately as possible.

Steps to Reproduce

Add what needs to be done to reproduce the bug. Add code examples where useful
and make sure to include the resulting ONNX files, and the commit hash you are working on.

  1. Clone the qonnx repository
  2. Checkout main branch (tested with hash d7afcbd)
  3. Execute pytest -k test_qkeras_qconv2d_1[11] --randomly-seed=719809827

Expected behavior

Test should pass successfully (converted QKeras->QONNX model should produce the expected value)

Actual behavior

A different output is produced:

quantizers = (<qkeras.quantizers.ternary object at 0x7f0cf9ba7fa0>, <qkeras.quantizers.quantized_bits object at 0x7f0cf9ba7fd0>)
request = <FixtureRequest for <Function test_qkeras_qconv2d_1[11]>>

    @pytest.mark.parametrize("quantizers", kb_quantizers, ids=kb_quantizers_ids)
    def test_qkeras_qconv2d_1(quantizers, request):
        kq, bq = quantizers
        k_ini = tf.keras.initializers.RandomUniform(minval=kq.min(), maxval=kq.max())
        b_ini = tf.keras.initializers.RandomUniform(minval=bq.min(), maxval=bq.max())
        x = x_in = Input((28, 28, 3), name="input")
        x = QConv2D(
            32,
            (2, 2),
            strides=(2, 2),
            kernel_quantizer=kq,
            bias_quantizer=bq,
            activation=quantized_bits(4, 4, 1, alpha=1.0),
            kernel_initializer=k_ini,
            bias_initializer=b_ini,
            name="conv2d_0",
        )(x)
        x = QActivation("quantized_relu(6,2)", name="act1")(x)
        x = QConv2D(
            64,
            (3, 3),
            strides=(2, 2),
            kernel_quantizer=kq,
            bias_quantizer=bq,
            use_bias=False,
            kernel_initializer=k_ini,
            bias_initializer=b_ini,
            name="conv2d_1",
        )(x)
        model = Model(inputs=[x_in], outputs=[x])
    
        x_test = np.random.uniform(low=-1.0, high=1.0, size=(1, 28, 28, 3)).astype(dtype=np.float32)
        y_qkeras = model.predict(x_test)
    
        onnx_model, external_storage = from_keras(model, "test_qkeras_conversion", opset=9)
        assert external_storage is None
        model_path = f"model_test_qkeras_qconv2d1_{request.node.callspec.id}.onnx"
        onnx.save(onnx_model, model_path)
    
        onnx_model = ModelWrapper(model_path)
        onnx_model = onnx_model.transform(InferShapes())
    
        idict = {onnx_model.graph.input[0].name: x_test}
        odict = oxe.execute_onnx(onnx_model, idict, True)
        y_qonnx = odict[onnx_model.graph.output[0].name]
    
>       np.testing.assert_allclose(y_qkeras, y_qonnx, rtol=1e-4, atol=1e-4)

tests/keras/test_keras_convert.py:373: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<function assert_allclose.<locals>.compare at 0x7f0cec725550>, array([[[[15.8125, 31.5625, 31.5   , ..., 13.875 ,  0....875 , -2.    , 27.5   ],
         [15.8125, 27.5625, 27.5   , ..., 11.875 , -6.    , 33.5   ]]]],
      dtype=float32))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=0.0001, atol=0.0001', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=0.0001, atol=0.0001
E           
E           Mismatched elements: 36 / 2304 (1.56%)
E           Max absolute difference: 3.9375
E           Max relative difference: 63.
E            x: array([[[[15.8125, 31.5625, 31.5   , ..., 13.875 ,  0.    , 33.5   ],
E                    [15.75  , 25.625 , 33.5625, ...,  5.9375,  3.875 , 27.5625],
E                    [13.8125, 25.625 , 27.5625, ...,  5.875 ,  1.9375, 27.5625],...
E            y: array([[[[15.8125, 31.5625, 31.5   , ..., 13.875 ,  0.    , 33.5   ],
E                    [15.75  , 25.625 , 33.5625, ...,  5.9375,  3.875 , 27.5625],
E                    [13.8125, 25.625 , 27.5625, ...,  5.875 ,  1.9375, 27.5625],...

/usr/lib/python3.8/contextlib.py:75: AssertionError

@selwyn96 @jmduarte could you take a look at this please? I am flagging it because the max abs/rel differences look rather large, but I also suspect this could be an off-by-1 error during quantization magnified by a scale factor.

Multithreshold operator FINN-ONNX execution error after 1D scalar add operator is absorbed into it.

The notebook attached with this bug report describes an error scenario of the multithreshold operator (after applying the AbsorbAddIntoMultithreshold transformation) during finn-onnx graph execution:

Error operator : Multithreshold.

Error description : Getting shape mismatch error during execution. Attached image below. It occurs when an 1D scalar add operator is merged with the proceeding multithresholding operator and executed.
error

Error resolution : The input vector 'v' shape needs to be reversed for this operation to correctly work during execution.

Notebook executed with FINN v0.9 along with the intermediate ONNX files are included in the attached zip folder :
Debug.zip

channels_last transformation support for branching models

I am working with a UNet model in ONNX format and I needed to move channels from first to last. I tried using the following pass https://github.com/fastmachinelearning/qonnx/blob/main/src/qonnx/transformation/channels_last.py but at the moment it seems it does not support branching model. I also noticed that something similar has been implemented in hls4ml, exploiting configuration parameters coming from PyTorch, https://github.com/fastmachinelearning/hls4ml/blob/fcd9c58d977ee7684e21d81cc13f124c9aa4209c/hls4ml/model/optimizer/passes/convert_to_channels_last.py
Is there any ongoing work on this? If not, if anyone has any kind of guideline/slides on how to support this feature I would be happy to contribute.

ConvertToChannelsLastAndClean: New transpose nodes don't have names and intermediate tensors are not kept in place

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

This is technically two bugs:

  • New Transpose nodes don't have names after the transformation completes
    • This causes issues for hls4ml as all nodes in a graph must be named for proper ingestion
  • After the transformation completes some tensors are not kept in place, but have moved somewhere else
    • This seems to be a side effect of at least the MoveChanLastUpstream transformation not properly maintaining order.

Steps to Reproduce

  1. Clone the qonnx repository
  2. Checkout the main branch
  3. Run to_channels_last utility on the CNV_W2A2.onnx file from the model zoo
  4. Inspect the resulting CNV_W2A2_channels_last.onnx in Netron
  5. The resulting file is also attached: CNV_2W2A_clean_channels_last.zip

Expected behavior

  • All nodes should have names
  • Tensors should stay in-place

Actual behavior

  • The name of the input Transpose is now missing
  • The first Mul node now has an output tensor called "Sub_0_out0", and an input called "Mul_0_out0". So the tensors are not in place anymore.

Optional

Possible fix

Some ideas on fixes:

  • The issue could be hotfixed by running GiveUniqueNodeNames and GiveReadableTensorNames again after the transformation
    • For the naming this might be fine, but ideally the Transposes should have names upon insertion
    • For the tensors moving around, this would not be a clean solution, since it would only hide the fact that the tensors are not in-place anymore
  • Non-hotfix way
    • Give Transposes proper unique names upon insertion
    • Check what's going wrong during Transpose insertion/deletion while moving Transposes up/down

Add cleanup transformation sorting inputs of commutative operations

Details

Most graph transformations (here as well as in FINN) assume a particular order of node inputs, even if the operation is actually commutative, e.g. the add operation. There seems to be a clear distinction between "dynamic" (i.e., produced upstream) and initializer inputs. As far as I can tell, the assumption is always that dynamic inputs are listed first, followed by the initializers. Concretely, node.input[0] is used to refer to the (single) dynamic input to node. However, this does not always hold true and I have encountered multiple occasions where valid transformations did not apply or were applied incorrectly due to violations of this assumption: Xilinx/finn#878

New behavior

I propose to add a cleanup transformation to sort the node input list of such commutative operations to have the initializer inputs last. For all other types of transformations, input order already has special meaning and seems to be handled correctly.

Motivation

Ideally, transformations of commutative operations should not care for the input order. But as order is assumed in a lot of places right now, it seems to be easier to introduce a cleanup transformation to ensure the assumptions hold true.

Parts of QONNX being affected

A new cleanup transformation will be introduced and will be added to the default cleanup transformations of the ModelWrapper.

GiveReadableTensorNames: Exception("Found multiple get_by_name matches, undefined behavior")

If the problem is coming from streamlining, please report this issue in the FINN repository instead. Also, please provide a concrete testcase (ONNX file and minimal Python script to reproduce the problem).

Hi I open the issue again, as the error comes from qonnx and not from finn.

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out. 7d50273
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug. onxx model generated in the python script

Quick summary

If the model misses a tensor shape for one the inputs or outputs of a node, then the model is not able to rename the tensors correctly.

Details and Fix

Here is the execution trace:

  • GiveReadableTensorNames first gives tensors random names (GiveRandomTensorNames) which supposely avoid the error above of duplicate names.
  • GiveRandomTensorNames retrieve all the tensor names (ModelWrapper.get_all_tensor_names) which doesn't return graph.initializer names. Therefore a first fix is to add the initializers tensor to the returned names in ModelWrapper.get_all_tensor_names:
     def get_all_tensor_names(self):
         """Returns a list of all (input, output and value_info) tensor names
         in the graph."""
         graph = self.graph
         names = [x.name for x in graph.value_info]
         names += [x.name for x in graph.input]
         names += [x.name for x in graph.output]
         # fix
         names += [x.name for x in graph.initializer]  
         return names
  • But when ModelWrapper loads a model, it automatically add all initializer tensors to graph.value_info:
       class ModelWrapper:
           def __init__(self, [...], fix_missing_initializer_valueinfo=True)
               [...]
               if fix_missing_initializer_valueinfo:
                   self.check_all_tensor_shapes_specified(fix_missing_init_shape=True)
  • The real error then comes from the method ModelWrapper.check_all_tensor_shapes_specified. If one tensors of the model misses its shape, then all subsequent tensor shapes are ignored, and are not added to graph.value_info. The reason is that the compiler ignores the call to self.get_tensor_shape as soon as ret==False:
    for n in graph.node:
        for i in n.input:
             ret = (self.get_tensor_shape(i, fix_missing_init_shape=fix_missing_init_shape) is not None) and ret  # fix
            # ret = ret and (self.get_tensor_shape(i, fix_missing_init_shape=fix_missing_init_shape) is not None)  # compiler ignores call to method
        for o in n.output:
             ret = (self.get_tensor_shape(o, fix_missing_init_shape=fix_missing_init_shape) is not None) and ret  # fix
            # ret = ret and (self.get_tensor_shape(o, fix_missing_init_shape=fix_missing_init_shape) is not None) # compiler ignores method

Steps to Reproduce

Run this snippet of code:

import onnx
import numpy as np
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames

Add1_node = onnx.helper.make_node(
    'Add',
    inputs=['in1', 'Add_1_param0'],
    outputs=['sum1'],
    name='Add_1'
)

Add0_node = onnx.helper.make_node(
    'Add',
    inputs=['sum1', 'Add_0_param0'],
    outputs=['sum2'],
    name='Add_0',
)

Add2_node = onnx.helper.make_node(
    'Add',
    inputs=['abs1', 'abs1'],
    outputs=['sum3'],
    name='Add_2',
)

Abs_node = onnx.helper.make_node(
    'Abs',
    inputs=['sum2'],
    outputs=['abs1'],
    name='Abs'
)

Round_node = onnx.helper.make_node(
    'Round',
    inputs=['sum3'],
    outputs=['out1'],
    name='Round',
)

in1 = onnx.helper.make_tensor_value_info("in1", onnx.TensorProto.FLOAT, [4, 4])
out1 = onnx.helper.make_tensor_value_info("out1", onnx.TensorProto.FLOAT, [4, 4])

graph = onnx.helper.make_graph(
    nodes=[
        Add1_node,
        Add0_node,
        Abs_node,
        Add2_node,
        Round_node,
    ],
    name="simple_graph",
    inputs=[in1],
    outputs=[out1],
    value_info=[
        # onnx.helper.make_tensor_value_info("sum1", onnx.TensorProto.FLOAT, [4, 4]),
        onnx.helper.make_tensor_value_info("sum2", onnx.TensorProto.FLOAT, [4, 4]),
        onnx.helper.make_tensor_value_info("abs1", onnx.TensorProto.FLOAT, [4, 4]),
        onnx.helper.make_tensor_value_info("sum3", onnx.TensorProto.FLOAT, [4, 4]),
    ],
    initializer=[
        onnx.helper.make_tensor('Add_1_param0', onnx.TensorProto.FLOAT, [4, 4], np.zeros(16).tolist()),
        onnx.helper.make_tensor('Add_0_param0', onnx.TensorProto.FLOAT, [4, 4], np.zeros(16).tolist())
    ]
)

onnx_model = onnx.helper.make_model(graph, producer_name="simple-model")
onnx.save(onnx_model, 'simple_model.onnx')

model = ModelWrapper(onnx_model)
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())

Inference cost calculation bug with quantized model

Prerequisites

qonnx was installed via pip install qonnx

example onnx file:
pruning_clean_onnx(1).zip

Quick summary

Using a quantized network trained with Brevitas, I expected the calculation of BOPs to depend on the number of bits used in the quantized layers (QuantLinear and QuantReLU). However, I always got the same BOPs for any chosen quantization. I found that the underlying data type that was picked up here
https://github.com/fastmachinelearning/qonnx/blob/main/src/qonnx/analysis/inference_cost.py#L126
was always Float32.

Details

Please add to the following sections to describe the bug as accurately as possible.

Steps to Reproduce

Can be reproduced using the inference_cost util

Expected behavior

An output that was sensitive to bit widhts, as

{
  "discount_sparsity": true,
  "mem_o_FLOAT32": 38011120.0,
  "mem_w_INT2": 367468.0,
  "mem_w_INT6": 12076.0,
  "op_mac_INT10_INT6": 111357820.0,
  "op_mac_UINT6_INT2": 4509484280.0,
  "op_mac_UINT6_INT6": 111468532.0,
  "op_mac_UINT8_INT2": 2271035256.0,
  "total_bops": 101144711808.0,
  "total_mem_o_bits": 1216355840.0,
  "total_mem_w_bits": 807392.0,
  "unsupported": "set()"
}

Actual behavior

What I got was not sensitive to bit widths, only giving op_mac_FLOAT32

Possible fix

@HenniOVP found a fix for it, I'd leave it to him to elaborate on that.

Fails to install on Python 3.10 due to pinned onnxruntime dependency

Prerequisites

Please make sure to check off these prerequisites before submitting a bug report.

  • Test that the bug appears on the current version of the main branch. Make sure to include the commit hash of the commit you checked out.
  • Check that the issue hasn't already been reported, by checking the currently open issues.
  • If there are steps to reproduce the problem, make sure to write them down below.
  • If relevant, please include the ONNX files, which were created directly before and/or after the bug.

Quick summary

Installing qonnx via pip on python 3.10 fails due to the pinned onnxruntime==1.11.1 dependency being no longer available. This happens for the most recent release on PyPI as well as for the current state of the main branch.

Details

Steps to Reproduce

On python 3.10 running pip install qonnx yields the following error message, this is similar for installing from the current main branch:

...
ERROR: Cannot install qonnx==0.1 and qonnx==0.2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    qonnx 0.2.0 depends on onnxruntime==1.11.1
    qonnx 0.1 depends on onnxruntime==1.11.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Expected behavior

It should be possible to install qonnx on python 3.10 (after solving this dependency issue), especially as this is the default python version on Ubuntu 22.04, which is probably a common choice of operating system.

Actual behavior

Installing via pip fails resolving the dependencies.

Optional

Possible fix

Pinning the onnxruntime dependency to a more recent version, potentially to the most recent release 1.15.0 should solve the problem without introducing any major issues. I am already testing this and will be happy to contribute a pull request.

Strage Behavior of FoldConstants Transformation

Occasionally, I had the FoldConstants transformation produce wrong shape constants when applied to Reshape nodes where the shape input is produced by a bunch of Shape-Gather-Unsqueeze-Concat operators (this seems to be common export behavior of PyTorch to produce these, but it should be easily constant-fold-able when all shapes are known at export time). By "wrong" I mean at least one axis is zero and the resulting graph is broken beyond repair from that point on (all following shapes make no sense at all). I have not really an idea what exactly is going on and giving a minimal example is difficult as it seems to occur only for more complex operator patterns deeper inside the model (e.g., the same pattern is folded fine in the first layer but then it breaks in the second), but a fix seems to be trivial: Insert a break here, leaving the loop to remove the node and re-do the shape annotations after each folded constant instead of just once at the end.

Not sure whether this is the proper way to solve this and I will try to follow up on this, hopefully with a reproducible example later, but before I forget I wanted to document this issue and maybe someone else already encountered this or something similar and knows what is going on.

Inference Cost calculation bug with pruned quantized model

Prerequisites

qonnx was installed via pip install qonnx in my environment.
Onnx files before and after pruning are attached in the .zip file.
pruning_0_clean.onnx is an unpruned model
pruning_1_clean.onnx is a pruned model

Quick summary

I wanted to calculate BOPs using inference_cost for a model that is pruned iteratively. I expected after each pruning step that the number of BOPs and total_mem_w_bits decreases, but I observed the same values before and after pruning.

Here is a short part of the code showing the relevant calls to the qonnx tools, and a few steps ahead regarding pruning removal and export:
model_copy = copy.deepcopy(model) # necessary for the export to work properly to be able to continue pruning on the model itself parameters_to_prune_copy = [(model_copy.network[1], "weight"), (model_copy.network[4], "weight"), (model_copy.network[7], "weight"), (model_copy.network[10], "weight"), (model_copy.network[13], "weight")] # list of pruned layers for paras in parameters_to_prune_copy: prune.remove(paras[0], name = 'weight') # necessary for the export to work properly with a pruned model BrevitasONNXManager.export(model_copy, export_path = export_path, input_t = input_quant_tensor, export_params="True") cleanup(export_path, out_file=export_path_cleanup) # : relevant qonnx calls inference_cost(export_path_cleanup, output_json = export_json, discount_sparsity = True)

Expected behavior

When investigating in

inp1_is_const = model.get_initializer(node.input[1]) is not None
, I found that the quantized weight matrix was not initialized with constant values, inp1_is_const returned None.

Possible fix

Replacing this line

model = model.transform(FoldConstants())

model = model.transform(FoldConstants())
with this
model = model.transform(FoldConstants(exclude_op_types=[]))
seem to have done the job.

Additional context

pruning_clean_onnx.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.