Coder Social home page Coder Social logo

Comments (22)

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024 1

Hello Federico,

I've resolved the problems in CpuAcc and GpuAcc now. This model contains kTfLiteBool which is a data type we don't often encounter.

I've updated the review, https://review.mlplatform.org/c/ml/armnn/+/11379, to include all the necessary changes.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

another model with problems #758

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

Unfortunately I'm not permitted to test using your model without knowing the source and license under which it is shared. However, from your description of the problem it appears that whichever backend you are targeting says the layer is valid but then subsequently prevents a workload being created to execute the layer.

Can you tell me which backend you are trying to use? You could also try using the CpuRef backend. It will not be performant but if it runs the layer it will give a strong indication that it is the backend that's at fault.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

First of all, please be more precise about what exactly do you need in order to be able to execute the models I'm sharing here and in #758. In both cases these are open source models, the one about the current bug report is located here https://github.com/aselsan-research-imaging-team/flight-net and the one about #758 is located here https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
Now of course those links take you to the PyTorch version of the models because you know well that no one in the world almost works with TFLITE (or TensorFlow for that matter), so the models I'm sharing are my conversions. If you "are not permitted to test using" a user's conversion of a PyTorch model and you require to have officially released models in TFLITE then ARMNN is basically useless, since no one is providing those models for any AI development directly.

Regarding back end I tried both CpuAcc and GpuAcc, both separately, together (in both orders).

Both the release for ARM64 as well as my own compilation of ARMNN do not support the CpuRef (at least not when run on an Orange Pi5B), so I can't test that unless you give me a different way.

Please, check the links I just shared to confirm both models (both the one of this bug report and the one of my other bug report) are open licensed models and please try them yourselves.

ARMNN when it works is really performant but if you can't make it work with more models then it will fall onto oblivion.

Thank you,
Federico

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

Thank you for pointing me to the source of the models. I can see they are creative commons which is good. However, you converting them to tflite is considered a derivative work. Two options:

  • You can include a copy of the CC BY-NC-SA 4.0 license in the zip file.
  • State in this thread the license you want this work to be associated with. (It cannot be any more restrictive than CC BY-NC-SA 4.0 license)

If you want to try CpuRef you can find a binary release of Arm NN for aarch64 platforms that includes the CpuRef here

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Hi, thank you for the detail.

I want to assert here that the converted tflite model I shared above, which is based on https://github.com/aselsan-research-imaging-team/flight-net, was shared here under license CC BY-NC-SA 4.0

I'll go now to the other bug report (#758) and do the same so that you can test that model as well.

I'll try the CpuRef ASAP and report back as well.

All my best,
Federico

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

I tried feather_32bit.tflite and it shows the same ComparisonQueueDescriptor error in CpuRef as you saw with CpuAcc. I'll start investigating it now.

Colm.

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

I have a review up to resolve the first problem I encountered: https://review.mlplatform.org/c/ml/armnn/+/11379

You can cherry pick this patch on top of Arm NN main branch if you want to experiment with it. The fault came down to inconsistent handling of broadcast in the Greater_Equal layer.

I have verified byte level accuracy of results between TfLite runtime and CpuRef backends.

However, there appears to be a further problem with the CpuAcc and GpuAcc backends that I'm investigating now.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Hello Federico,

I've resolved the problems in CpuAcc and GpuAcc now. This model contains kTfLiteBool which is a data type we don't often encounter.

I've updated the review, https://review.mlplatform.org/c/ml/armnn/+/11379, to include all the necessary changes.

Colm.

Amazing! I'll try this today! Please look at the other case #758 whenever you have the time, I added the license for that model as well in the thread.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Hello Federico,

I've resolved the problems in CpuAcc and GpuAcc now. This model contains kTfLiteBool which is a data type we don't often encounter.

I've updated the review, https://review.mlplatform.org/c/ml/armnn/+/11379, to include all the necessary changes.

Colm.

so weird! I applied the patch on the latest pull of the main branch of armnn, using git apply 48eefee.diff (after downloading your patch), I checked the files locally to see the patch was indeed applied - it was; I then went to the build tool script and built armnn, and... I get exactly the same error somehow!

RuntimeError: TfLiteArmnnDelegate: Exception (TfLiteArmnnDelegate: Network could not be loaded: An error occurred when preparing the network workloads: ComparisonQueueDescriptor: Tensors input_0 & input_1 must have the same number of dimensions in order to be broadcasted) caught from LoadNetwork.

what could it possibly be? I'm sure the build tool is somehow not using the patched files...

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

You do need to check your build. The change to AddBroadcastReshapeLayer.hpp will resolve that specific error.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

You do need to check your build. The change to AddBroadcastReshapeLayer.hpp will resolve that specific error.

Colm.

Oh as I said I applied the patch and then built, so all the files including the one you just mentioned have all the changes you made.

Could it be that the build tool accept uses a specific branch or uses another source? Otherwise I'm lost :(

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Running build-armnn.sh should reuse the version of Arm NN previously cloned by setup-armnn.sh. Any changes you've made to the cloned repository should be built.

I've no idea why you're not seeing the changes. If you deliberately break the code and rebuild does the build fail?

You could also try the --clean option to force a clean build?

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

ok, I see the issue now - I was cloning the main branch then applying the patch then going to its build-tool/script directory and running setup-armnn and build-armnn with the understanding it would use the version of the source code that was already cloned - I haven't noticed a new source folder was created by setup-armnn and that it was THAT folder that needed patching. I applied the patch to that folder after doing a pull and then built and...I still have errors :(

Using this python code to call it, using the feather_16bit.tflite model

armnn_delegate = tflite.experimental.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so",
options={"backends": "GpuAcc", "logging-severity":"trace"})

Delegates/Executes all operations supported by Arm NN to/with Arm NN

interpreter = tflite.Interpreter(model_path="../models/feather_16bit.tflite", experimental_delegates=[armnn_delegate])

This are the outputs (log level trace) with each backend used independently:


GpuAcc:

Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
(this error repeats hundreds of times)

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.61 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
Info: constantInfo datatype:Float16inputDequantizeInfo datatype:Float16outputDequantizeInfo datatype:Float32
Info: ConvertConstDequantisationLayersToConstLayersImpl:: Converting FP16 -> FP32
this message happens many times, then:
Info: C
INFO: TfLiteArmnnDelegate: Added backend GpuAcc

and then:

RuntimeError Traceback (most recent call last)
Cell In[21], line 11
7 armnn_delegate = tflite.experimental.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so",
8 options={"backends": "GpuAcc", "logging-severity":"trace"})
10 # Delegates/Executes all operations supported by Arm NN to/with Arm NN
---> 11 interpreter = tflite.Interpreter(model_path="../models/feather_16bit.tflite", experimental_delegates=[armnn_delegate])
13 interpreter.allocate_tensors()
15 # Get input and output tensors.

File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering)
511 self._delegates = experimental_delegates
512 for delegate in self._delegates:
--> 513 self._interpreter.ModifyGraphWithDelegate(
514 delegate._get_native_delegate_pointer()) # pylint: disable=protected-access
515 self._signature_defs = self.get_signature_list()
517 self._metrics = metrics.TFLiteMetrics()

RuntimeError: TfLiteArmnnDelegate: Exception (Failed to assign a backend to each layer) caught from optimize.

onvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
Info: constantInfo datatype:Float16inputDequantizeInfo datatype:Float16outputDequantizeInfo datatype:Float32
Info: ConvertConstDequantisationLayersToConstLayersImpl:: Converting FP16 -> FP32
Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
this last error repeats a lot, and then:

Info: Optimize ArmnnSubgraph time: 3.50 ms
Info: Load ArmnnSubgraph time: 348.06 ms
Info: Overall ArmnnSubgraph creation time: 352.47 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.07 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Warning: WARNING: Layer of type Convolution2d is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: ArmNN ClDepthwiseConv2dWorkload does not support non constant bias.), falling back to the next backend.
Warning: ERROR: Layer of type Convolution2d is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Convolution2d is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: ArmNN ClDepthwiseConv2dWorkload does not support non constant bias.), falling back to the next backend.
Warning: ERROR: Layer of type Convolution2d is not supported on any preferred backend [GpuAcc ]


CpuAcc:

Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
this error repeats hundreds of time, then:

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.71 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
Info: constantInfo datatype:Float16inputDequantizeInfo datatype:Float16outputDequantizeInfo datatype:Float32
Info: ConvertConstDequantisationLayersToConstLayersImpl:: Converting FP16 -> FP32
this last one repeats many many times, then:

INFO: TfLiteArmnnDelegate: Added backend CpuAcc


RuntimeError Traceback (most recent call last)
Cell In[22], line 11
7 armnn_delegate = tflite.experimental.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so",
8 options={"backends": "CpuAcc", "logging-severity":"trace"})
10 # Delegates/Executes all operations supported by Arm NN to/with Arm NN
---> 11 interpreter = tflite.Interpreter(model_path="../models/feather_16bit.tflite", experimental_delegates=[armnn_delegate])
13 interpreter.allocate_tensors()
15 # Get input and output tensors.

File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering)
511 self._delegates = experimental_delegates
512 for delegate in self._delegates:
--> 513 self._interpreter.ModifyGraphWithDelegate(
514 delegate._get_native_delegate_pointer()) # pylint: disable=protected-access
515 self._signature_defs = self.get_signature_list()
517 self._metrics = metrics.TFLiteMetrics()

RuntimeError: TfLiteArmnnDelegate: Exception (Failed to assign a backend to each layer) caught from optimize.

type:Float32
Info: ConvertConstDequantisationLayersToConstLayersImpl:: Converting FP16 -> FP32
Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
Info: constantInfo datatype:Float16inputDequantizeInfo datatype:Float16outputDequantizeInfo datatype:Float32
this last message repeats many many times, then:

Info: Optimize ArmnnSubgraph time: 3.51 ms
Info: Load ArmnnSubgraph time: 1.09 ms
Info: Overall ArmnnSubgraph creation time: 5.57 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.04 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Warning: WARNING: Layer of type Convolution2d is not supported on requested backend CpuAcc for input data type Float32 and output data type Float32 (reason: in validate src/runtime/NEON/functions/NEConvolutionLayer.cpp:134: Dynamic weights are not supported), falling back to the next backend.
Warning: ERROR: Layer of type Convolution2d is not supported on any preferred backend [CpuAcc ]
Warning: WARNING: Layer of type Convolution2d is not supported on requested backend CpuAcc for input data type Float32 and output data type Float32 (reason: in validate src/runtime/NEON/functions/NEConvolutionLayer.cpp:134: Dynamic weights are not supported), falling back to the next backend.
Warning: ERROR: Layer of type Convolution2d is not supported on any preferred backend [CpuAcc ]
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 39
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 40


CpuRef:

This one WORKS. Yet there are warnings:

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.67 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer()
Info: constantInfo datatype:Float16inputDequantizeInfo datatype:Float16outputDequantizeInfo datatype:Float32
Info: ConvertConstDequantisationLayersToConstLayersImpl:: Converting FP16 -> FP32
this message happens many many times, then:

Info: Optimize ArmnnSubgraph time: 3.32 ms
Info: Load ArmnnSubgraph time: 0.40 ms
Info: Overall ArmnnSubgraph creation time: 4.73 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.05 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.46 ms
Info: Load ArmnnSubgraph time: 0.09 ms
Info: Overall ArmnnSubgraph creation time: 0.70 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.09 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.49 ms
Info: Load ArmnnSubgraph time: 0.11 ms
Info: Overall ArmnnSubgraph creation time: 0.75 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.19 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 1.25 ms
Info: Load ArmnnSubgraph time: 0.21 ms
Info: Overall ArmnnSubgraph creation time: 1.75 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.04 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.29 ms
Info: Load ArmnnSubgraph time: 0.05 ms
Info: Overall ArmnnSubgraph creation time: 0.43 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.09 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.58 ms
Info: Load ArmnnSubgraph time: 0.11 ms
Info: Overall ArmnnSubgraph creation time: 0.84 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.04 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.30 ms
Info: Load ArmnnSubgraph time: 0.06 ms
Info: Overall ArmnnSubgraph creation time: 0.44 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.07 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.44 ms
Info: Load ArmnnSubgraph time: 0.09 ms
Info: Overall ArmnnSubgraph creation time: 0.64 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.04 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.31 ms
Info: Load ArmnnSubgraph time: 0.06 ms
Info: Overall ArmnnSubgraph creation time: 0.45 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.08 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.54 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.78 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.06 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.52 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.74 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.06 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.45 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.67 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.03 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.30 ms
Info: Load ArmnnSubgraph time: 0.06 ms
Info: Overall ArmnnSubgraph creation time: 0.43 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.09 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.50 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.74 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.06 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.53 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.75 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.07 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.47 ms
Info: Load ArmnnSubgraph time: 0.10 ms
Info: Overall ArmnnSubgraph creation time: 0.70 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.03 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.30 ms
Info: Load ArmnnSubgraph time: 0.07 ms
Info: Overall ArmnnSubgraph creation time: 0.44 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.02 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.21 ms
Info: Load ArmnnSubgraph time: 0.04 ms
Info: Overall ArmnnSubgraph creation time: 0.31 ms

Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 19
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 20
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 21
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 22
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 23
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 24
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 25
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 26
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 27
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 28
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 29
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 30
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 31
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 32
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 33
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 34
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 35
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 36
INFO: TfLiteArmnnDelegate: Added backend CpuRef
WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type

Again CpuRef even with all these messages, does work. CpuAcc and GpuAcc do not work.
I'm building using the main branch after pulling all changes as of today and applying your patch.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Using instead the feather_8bit_dynamic.tflite, both CpuAcc and CpuRef work (and indeed CpuAcc works pretty fast on my orange pi 5b !). But GpuAcc still has the errors:

Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
this error appears many many times, then:
Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.07 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.51 ms
Info: Load ArmnnSubgraph time: 331.17 ms
Info: Overall ArmnnSubgraph creation time: 331.88 ms

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 0.11 ms
Debug: OptimizerOptions:
ReduceFp32ToFp16: 0
ReduceFp32ToBf16: 0
Debug: 0
Debug to file: 0
ShapeInferenceMethod: ValidateOnly
ImportEnabled: 0
ExportEnabled: 0
ProfilingEnabled: 0
AllowExpandedDims: 0
ModelOptions:

Info: Optimize ArmnnSubgraph time: 0.65 ms
Error: An error occurred when preparing the network workloads: Convolution2dQueueDescriptor: input & weight must have identical data types.
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 106
Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 135
INFO: TfLiteArmnnDelegate: Added backend GpuAcc
WARNING: FULLY_CONNECTED: not supported by armnn: in validate src/gpu/cl/operators/ClFullyConnected.cpp:473: Tensors have different data types


RuntimeError Traceback (most recent call last)
Cell In[55], line 11
7 armnn_delegate = tflite.experimental.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so",
8 options={"backends": "GpuAcc", "logging-severity":"trace"})
10 # Delegates/Executes all operations supported by Arm NN to/with Arm NN
---> 11 interpreter = tflite.Interpreter(model_path="../models/feather_8bit_dynamic.tflite", experimental_delegates=[armnn_delegate])
13 interpreter.allocate_tensors()
15 # Get input and output tensors.

File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering)
511 self._delegates = experimental_delegates
512 for delegate in self._delegates:
--> 513 self._interpreter.ModifyGraphWithDelegate(
514 delegate._get_native_delegate_pointer()) # pylint: disable=protected-access
515 self._signature_defs = self.get_signature_list()
517 self._metrics = metrics.TFLiteMetrics()

RuntimeError: TfLiteArmnnDelegate: Exception (TfLiteArmnnDelegate: Network could not be loaded: An error occurred when preparing the network workloads: Convolution2dQueueDescriptor: input & weight must have identical data types.) caught from LoadNetwork.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Ok, the feather_32bit.tflite does work! with both GpuAcc, CpuAcc and CpuRef, great work!

However, by comparing the speed of inference comparing the 32, 16 and 8 bits versions of the model using CpuRef (the only backend with which all 3 work) one can see that the 8 bit is much much faster - and so that presumably the 16 and 8 bits versions would also be faster if they would run (instead of crashing) on the CpuAcc and GpuAcc backends.

So if you could check the errors, particularly those related to GpuAcc backend on the feather_8bit_dynamic.tflite version that would be so helpful!

thank you!
FEderico

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

Just a point on feather_16bit.tflite. This is not a FP16 model. The input is FP32. The weights are quantized FP16 with the first operator on them being dequantize to FP32. TfLite have the concept of post training quantization. It is up to the backend/accelerator to identify this structure and to fully modify the model to use FP16 kernels. I believe the TfLite GPU backend does support this but Arm NN does not.

The result is that you will not see any performance increase between FP16 and FP32 with this model on Arm NN.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Hello Federico,

Just a point on feather_16bit.tflite. This is not a FP16 model. The input is FP32. The weights are quantized FP16 with the first operator on them being dequantize to FP32. TfLite have the concept of post training quantization. It is up to the backend/accelerator to identify this structure and to fully modify the model to use FP16 kernels. I believe the TfLite GPU backend does support this but Arm NN does not.

The result is that you will not see any performance increase between FP16 and FP32 with this model on Arm NN.

Colm.

Thanks @Colm-in-Arm . Three follow ups:

  1. how about 8-bit quantized tflite models? Does armnn accelerate those with GpuAcc?
  2. would it make sense to transform the model entirely to fp16 prior to the conversion to tflite? Would the tflite models in that case really use fp16 kernels?
  3. any idea about the errors I'm having with the fp16 and 8bit versions of the model?

Thank you!!!

Federico

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

1: Yes Arm NN will accelerate an 8 bit quantized model using the GpuAcc and CpuAcc backends. The usual restrictions on supported layers apply.

2: I've never attempted the kinds of conversions you're doing. I would hope that Tensorflow would honour an FP16 input model and create a native FP16 tflite model. I would be happy to try it if you get the conversion to work.

3:
feather_8bit_dynamic.tflite on GpuAcc

Something is going wrong with the ACL layer validation here. It is returning that this CONV2D layer is supported but in Arm NN we have 3 different reasons why this workload should not be created. When I remove these restrictions the layer fails in ACL. I'll have to check with @morgolock

feather_16bit.tflite on CpuAcc and GpuAcc

I can see the model is executing but the results are garbage. This will require a layer by layer comparison which will take some time.

Colm.

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

1: Yes Arm NN will accelerate an 8 bit quantized model using the GpuAcc and CpuAcc backends. The usual restrictions on supported layers apply.

2: I've never attempted the kinds of conversions you're doing. I would hope that Tensorflow would honour an FP16 input model and create a native FP16 tflite model. I would be happy to try it if you get the conversion to work.

3: feather_8bit_dynamic.tflite on GpuAcc

Something is going wrong with the ACL layer validation here. It is returning that this CONV2D layer is supported but in Arm NN we have 3 different reasons why this workload should not be created. When I remove these restrictions the layer fails in ACL. I'll have to check with @morgolock

feather_16bit.tflite on CpuAcc and GpuAcc

I can see the model is executing but the results are garbage. This will require a layer by layer comparison which will take some time.

Colm.

Thank you! given the 16 bit model will not improve speed with regards to the 32 bit model, then the most important model version to make work is the 8bit version.

As a general rule: what is faster in ARMNN (in mali GpuAcc)? 8bit models or 16/32 bit models?

thanks!
Federico

from armnn.

federicoparra avatar federicoparra commented on July 20, 2024

Hey @Colm-in-Arm good news is: I created a new version of the converted model, using these guidelines https://www.tensorflow.org/lite/performance/post_training_integer_quant#convert_using_integer-only_quantization (except that I did not quantize the inputs and outputs), leading to this model:
feather_8bit_inside.zip
This form of conversion makes sure every single operation inside the model is an int8 operator/kernel (with the exception of the input/output casting operations since the input/outputs are floats).

The good news: ARMNN with your patch shared in this bug report does work with this 8bit model! It reports several errors both in GpuAcc and CpuAcc but it does work anyhow (errors below).

Now the bad news: it's significantly slower than the float version. And it is slower with GpuAcc and faster with CpuAcc. This observation I have made before: I found several times that float models work faster than int models in ARMNN and that int models run faster in CpuAcc than GpuAcc, which I took to mean that int models were useless in the sense that Mali was not prepared to accelerate them or not as much as float operations. Can you confirm this? I'm trying to focus my effort on creating the fastest possible models for Mali GPU.

Here are the errors I see with this model when loading with CpuAcc (first) and GpuAcc (second):

CpuAcc errors:
Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
that error repeats hundreds of times

GpuAcc:
Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
WARNING: ELEMENTWISE_UNARY: not supported by armnn: in validate_arguments src/gpu/cl/kernels/ClElementwiseUnaryKernel.cpp:65: ITensor data type QASYMM8_SIGNED not supported by this kernel
WA
error repeats many times, then:
Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type
error repeats dozens of times, then:
WARNING: ELEMENTWISE_UNARY: not supported by armnn: in validate_arguments src/gpu/cl/kernels/ClElementwiseUnaryKernel.cpp:65: ITensor data type QASYMM8_SIGNED not supported by this kernel
warning repeats around 6 times

I repeat, both still do work, they just work slower than the float versions on either backend

from armnn.

Colm-in-Arm avatar Colm-in-Arm commented on July 20, 2024

Hello Federico,

It's never as simple as INT8 is slower on GpuAcc than FP32. There are a multitude of factors involved. Something to consider: the first time a GPU inference happens the kernels are compiled. You should probably disregard the first inference. If you run 10 iterations, ignore the first and compare the execution to CpuAcc you'll get a better impression of the relative speeds. You can avoid this initial overhead by caching a previous tuning level data (see save-cached-network, cached-network-filepath, tuning-level and tuning-path options.)

There is a script delivered with ExecuteNetwork, evaluate_network.sh, that will try to find the fastest way to run a model inference. From memory, it requires the model to be supported by the parser so not much use for this model.

The most likely cause of the errors from ./arm_compute/core/utils/DataTypeUtils.h:67 is a Boolean Datatype. Part of the review was to allow the Boolean data type to propagate down to ACL for it to be rejected by the validate method. This is the main reason I've not progressed this review. I need to work on a better way to do this. Once the layer is rejected by ACL it will fall back to TfLite runtime.

Colm.

from armnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.