I tried to substitute engine from NNPACK to XNNPACK but faced that XNNPACK 3-5x times

XNNPACK slower than NNPACK? about xnnpack HOT 5 CLOSED

google commented on May 16, 2024

XNNPACK slower than NNPACK?

from xnnpack.

Comments (5)

Maratyszcza commented on May 16, 2024 3

If you plan to run models with large non-depthwise convolutions, like VGG, NNPACK is still faster due to asymptotically fast convolution algorithms. However, recent mobile-optimized computer vision architectures use depthwise-separable convolutions, and XNNPACK performs much better than NNPACK on these networks. Sparse inference with depthwise-separable convolutions eclipse the speedup you could get from Winograd-based convolution on non-separable convolution.

from xnnpack.

Maratyszcza commented on May 16, 2024

NNPACK implements fast convolution algorithms (based on Winograd or Fourier transformations) which reduce the number of FLOPs needed for convolutions with large kernel size (e.g. 3x3). XNNPACK implements Indirect Convolution algorithms, which are more versatile (can handle strides, and dilation), but do two FLOPs per input channel x output channel x output pixel x kernel element.

from xnnpack.

Maratyszcza commented on May 16, 2024

Also, for XNNPACK you need to benchmarks with bazel run -c opt //:convolution_bench

from xnnpack.

Kepnu4 commented on May 16, 2024

Yeah, thank you, with -c opt it's a lot better but still slower by a factor from NNPACK

Run on (6 X 4300 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 9216 KiB (x1)
Load Average: 3.78, 2.21, 2.19
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                               Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
xnnpack_convolution_f32/some_test/N:1/H:128/W:128/KH:3/KW:3/PH:1/PW:1/S:1/D:1/G:1/GCin:256/GCout:128/real_time  137505341 ns    137498076 ns            5 FLOPS=69.1847G/s Freq=4.15809G
xnnpack_convolution_f32/some_test/N:1/H:256/W:256/KH:3/KW:3/PH:1/PW:1/S:1/D:1/G:1/GCin:192/GCout:96/real_time   312934518 ns    312902879 ns            2 FLOPS=68.9401G/s Freq=4.

But i'm confused. I saw in advertise that XNNPACK is replacement for NNPACK but it can't be if they implement different algos with their pros and cons. Could you please give some insight how this libraries should be used?

from xnnpack.

Kepnu4 commented on May 16, 2024

Thanks for the help, now it makes sense fore me!

from xnnpack.

Recommend Projects

XNNPACK slower than NNPACK? about xnnpack HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent