Coder Social home page Coder Social logo

Comments (14)

leleliu008 avatar leleliu008 commented on June 5, 2024

step1:

sed -i 's|-march=armv6 -mfpu=vfp|-march=armv7-a -mfpu=neon|' CMakeLists.txt

step2:
add cmake option: -DXNNPACK_ENABLE_ARM_BF16=OFF

successfully built now, but don't know if the right way.

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

Manage to get the repro steps down to:

android-ndk-r25c/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --target=armv7a-linux-androideabi21 -Iinclude -Isrc -O1 -march=armv8.2-a+bf16 -mfpu=neon-fp-armv8 src/bf16-gemm/gen/bf16-gemm-6x8c2-minmax-neonbf16-bfdot-lane-ld128.c

Note that -O1, -O2, -O3 fails, but -O0 is fine.

taking a closer look

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

add cmake option: -DXNNPACK_ENABLE_ARM_BF16=OFF is the easiest way, it doesn't include the problematic c file.
@georgthegreat if you are getting blocked, that's probably the easiest way.

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

https://godbolt.org/z/656PqT6o1
I think it is a clang bug, 16.0.0 fixes it.

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

I found that I can workaround the bug with changing:

const uint32x4_t va0c01 = vdupq_lane_u32(vreinterpret_u32_bf16(vget_low_bf16(va0)), 0);

to

const uint32x4_t va0c01 = vdupq_lane_u32(vget_low_u32(vreinterpretq_u32_bf16(va0)), 0);

just the ordering of interpret and get_low.

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

vout0x0123 = vreinterpret_bf16_u16(vext_u16(vreinterpret_u16_bf16(vout0x0123), vreinterpret_u16_bf16(vout0x0123), 2)); is still a problem though

I think i can change it to vout0x0123 = vdup_lane_bf16(vout0x0123, 2);

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

Hm, looks like we don't test building of these microkernels: https://github.com/google/XNNPACK/blob/master/scripts/build-android-armv7.sh#L59

We might also need to update the ndk version we are testing with: https://github.com/google/XNNPACK/blob/master/.github/workflows/build.yml#LL172C28-L172C28

from xnnpack.

georgthegreat avatar georgthegreat commented on June 5, 2024

Hm, looks like we don't test building of these microkernels:

There is even a comment about this specific issue:

# BF16 instructions cause ICE in Android NDK compiler

from xnnpack.

ngzhian avatar ngzhian commented on June 5, 2024

I think the easiest fix is for you to add this compile flag when building XNNPACK -DXNNPACK_ENABLE_ARM_BF16=OFF, until the ndk ships a newer version of clang.

from xnnpack.

georgthegreat avatar georgthegreat commented on June 5, 2024

Yep, this is what I did for now.

from xnnpack.

georgthegreat avatar georgthegreat commented on June 5, 2024

NDK maintainers claim that canary builds with clang16 can compile this code.
I think there is nothing to be done from xnnpack side, so this may be closed if there is no workaround / nobody wants to search for it.

from xnnpack.

RobertFlatt avatar RobertFlatt commented on June 5, 2024

As I read the posts above there are two possible workarounds, either could be implemented in XNNPACK.

Just waiting for some future NDK where some future Clang will be fixed, is I think an insufficient response.

from xnnpack.

Maratyszcza avatar Maratyszcza commented on June 5, 2024

The recommended work-around is to disable ARM BF16 extensions using either -DXNNPACK_ENABLE_ARM_BF16=OFF option (for CMake) or --define xnn_enable_arm_bf16=false option (for Bazel).

from xnnpack.

RobertFlatt avatar RobertFlatt commented on June 5, 2024

Using tensorflow 2.13.0-rc1 and -DXNNPACK_ENABLE_ARM_BF16=OFF

cmake  -DCMAKE_TOOLCHAIN_FILE=../android-ndk-r25c/build/cmake/android.toolchain.cmake -DANDROID_ABI=armeabi-v7a -DTFLITE_ENABLE_XNNPACK=ON -DXNNPACK_ENABLE_ARM_BF16=OFF ../tensorflow_src/tensorflow/lite  
cmake --build . -j

Does not workaround the issue.

And we should not expect it to because this is defined as the default behavior https://github.com/google/XNNPACK/blob/master/scripts/build-android-armv7.sh#L59 and we know the default fails to build.
Not clear why it was suggested as a workaround.

I've been trying to get some attention paid to this for 3.5 months #4348
Please implement a fix.
It seems reasonable to suggest that: the relationship between the code and the build tools is everywhere and always the responsibility of the programmer responsible for the code.

from xnnpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.