Comments (5)
If you plan to run models with large non-depthwise convolutions, like VGG, NNPACK is still faster due to asymptotically fast convolution algorithms. However, recent mobile-optimized computer vision architectures use depthwise-separable convolutions, and XNNPACK performs much better than NNPACK on these networks. Sparse inference with depthwise-separable convolutions eclipse the speedup you could get from Winograd-based convolution on non-separable convolution.
from xnnpack.
NNPACK implements fast convolution algorithms (based on Winograd or Fourier transformations) which reduce the number of FLOPs needed for convolutions with large kernel size (e.g. 3x3). XNNPACK implements Indirect Convolution algorithms, which are more versatile (can handle strides, and dilation), but do two FLOPs per input channel x output channel x output pixel x kernel element
.
from xnnpack.
Also, for XNNPACK you need to benchmarks with bazel run -c opt //:convolution_bench
from xnnpack.
Yeah, thank you, with -c opt
it's a lot better but still slower by a factor from NNPACK
Run on (6 X 4300 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 256 KiB (x6)
L3 Unified 9216 KiB (x1)
Load Average: 3.78, 2.21, 2.19
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
xnnpack_convolution_f32/some_test/N:1/H:128/W:128/KH:3/KW:3/PH:1/PW:1/S:1/D:1/G:1/GCin:256/GCout:128/real_time 137505341 ns 137498076 ns 5 FLOPS=69.1847G/s Freq=4.15809G
xnnpack_convolution_f32/some_test/N:1/H:256/W:256/KH:3/KW:3/PH:1/PW:1/S:1/D:1/G:1/GCin:192/GCout:96/real_time 312934518 ns 312902879 ns 2 FLOPS=68.9401G/s Freq=4.
But i'm confused. I saw in advertise that XNNPACK is replacement for NNPACK but it can't be if they implement different algos with their pros and cons. Could you please give some insight how this libraries should be used?
from xnnpack.
Thanks for the help, now it makes sense fore me!
from xnnpack.
Related Issues (20)
- XNNPACK make error scc1: error: invalid feature modifier 'i8mm' in '-march=armv8.2-a+i8mm+fp16' HOT 2
- Fallthroughs should be explicit HOT 1
- XNNPACK on LicheePi Console 4A HOT 2
- XNN_FLAG_KEEP_DIM not backwards compatible HOT 1
- When running `build -c opt --config android_arm64 :end2end_bench` under bench, `sys/system_properties.h` is missing HOT 2
- Build Android arm-v7 faild HOT 3
- Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF` HOT 10
- Is running TEST(CONVERT_NC_F16_QD8, unit_batch) failed because it does not support armv7a ? HOT 1
- Why is Signal 7 reporting an error on the armv7a platform TEST (F16_VCMUL_NEONFP16ARITH_U8, batch_lt_8) ? HOT 3
- A segment error occurred while running test case static-reshape-test on the armv7a platform HOT 1
- ARMv7 (with NEON) can not support on Linux but only support ARMv7 (with NEON) on Android HOT 3
- Enable HEXAGON to build XNNPack
- Work with the gvisor team on this
- scripts/build-android-armv7.sh fails with NDK 21
- `xnn_weights_cache_provider` look_up doesn't work? HOT 2
- How can I parallelize the execution of this benchmark? (https://github.com/google/XNNPACK/blob/master/bench/spmm-benchmark.h)
- cmake build failure with XNNPACK_BUILD_TESTS=ON and XNNPACK_LIBRARY_TYPE=shared
- test/sigmoid_nc_test fails on Hexagon simulator HOT 1
- Load-from-misaligned-address failures on Hexagon simulator HOT 3
- XNNPACK tests that use mmap() fail on Hexagon devices
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xnnpack.