Coder Social home page Coder Social logo

dnnl_aarch64's Introduction

Deep Neural Network Library for AArch64 (DNNL_aarch64)

  • An open-source performance library for deep learning applications running on ARM(R)v8-A architecture CPUs
  • Optimized to ARMv8-A architecture with the Scalable Vector Extension (SVE)
  • The key components are Xbyak, Xbyak_aarch64, and Xbyak_Translator
    • Xbyak : A JIT-assembler for x86 and x64 architectures developed by Shigeo MITSUNARI (Cybozu Labs Inc.)
    • Xbyak_aarch64 : A JIT-assembler for ARMv8-A architecture of Xbayk
    • Xbyak_Translator : A translator which generates JIT functions for ARMv8 with SVE from JIT functions for x86
  • Developed based on version 0.21.2 of Deep Neural Network Library (DNNL) by Intel(R)

Development status

DNNL_aarch64 generates two types of JIT functions for FP32 operations using Xbyak, Xbyak_aarch64, and Xbyak_Translator on ARMv8 with SVE processors

  • One is to generate JIT functions for AArch64 directly using Xbyak_aarch64, which is called Direct method. The following operations are generated by the method.
    • Convolution
    • Reorder
  • The other is a JIT-translation from JIT functions for x64 to JIT functions for AArch64 using Xbyak, Xbyak_aarch64, and Xbyak_Translator, which is called Indirect method. The following operations are generated by the method.
    • Batch normalization
    • Eltwise
    • Pooling
    • Concat
    • Softmax
    • Sum
    • RNN operations

Reference implementations by C++ run other than those above operations and unsupported parameter sets. They output correct result, but run somewhat slow.

Bfloat16 support : Currently, DNNL_aarch64 does not support

Validated Configurations

CPU Fujitsu FX1000 / 700
OS RedHad 8.1 / Centos 8.1
Compiler Fujitsu compiler / GCC 8.3.1 20190507

Requirements

Currently, DNNL_aarch64 is intended to run on CPUs of ARMv8-A with SVE. If you run DNNL_aarch64 on CPUs without SVE, it will be aborted because of undefined instruction exception.

Installation

  1. Download DNNL_aarch64 from the repository.
git clone https://github.com/fujitsu/dnnl_aarch64.git
  1. Update submodule
cd dnnl_aarch64/
git submodule update --init --recursive
  1. Build xed library
mkdir third_party/build_xed_aarch64
pushd third_party/build_xed_aarch64/
../xbyak_translator_aarch64/translator/third_party/xed/mfile.py --shared examples install
cd kits/
ln -sf xed-install-base-* xed
popd
  1. Build DNNN_aarch64
mkdir build_aarch64
cd build_aarch64/
cmake ..
make -j40
  • Using BLAS (Optional)

    1. Set the path to the BLAS library on your environment into LD_LIBRARY_PATH
    2. Add the following options to cmake command
    BLAS Option
    SSL2 -DWITH_BLAS=ssl2 (only with FUJITSU compiler)
    openblas -DWITH_BLAS=openblas
  1. Test DNNL_aarch64 (optional)
cd tests/gtests
MKLDNN_VERBOSE=1 MKLDNN_JIT_DUMP=1 ./test_reorder

License

Copyright FUJITSU LIMITED 2019-2020

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Notice

  • Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
  • Intel is a registered trademark of Intel Corporation (or its subsidiaries) in the US and/or elsewhere.

History

Date Version Remarks
December 11, 2019 0.9.0_base_0.19 First public release version.
May 31, 2020 1.0.0_base_0.21.2 Update

Copyright

Copyright FUJITSU LIMITED 2019-2020

dnnl_aarch64's People

Contributors

aaraujom avatar akharito avatar andersanton avatar ankalinin avatar densamoilov avatar dmitry-gorokhov avatar dzarukin avatar espetrov avatar herumi avatar igorsafo avatar irinasok avatar kawakami-k avatar koji-kurihara avatar kolbusa avatar kurumeyuta avatar kwiersch avatar m-ymzk avatar mgouicem avatar mistler avatar msotoflo avatar nastafie avatar nshustrov avatar qt-suzukik avatar qyi1 avatar saosudo avatar shelleygoel avatar skazakov1 avatar takumi-honda avatar tprimak avatar vpirogov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dnnl_aarch64's Issues

Build with recent versions of Python3

Recent versions of Python, for example 3.9.0, seem to get an error during the building of XED incorporated by dnnl_aarch64.
Fortunately, the latest version of the original XED can be built with Python 3.9.0 in the AArch64 environment, so we can use it.
In this case, the build instructions are as follows.

git clone https://github.com/fujitsu/dnnl_aarch64.git
cd dnnl_aarch64/
git submodule update --init  --recursive
cd third_party/xbyak_translator_aarch64/translator/third_party/
rm -rf mbuild xed
git clone https://github.com/intelxed/xed.git
git clone https://github.com/intelxed/mbuild.git
cd ../../../
mkdir build_xed_aarch64
cd build_xed_aarch64
../xbyak_translator_aarch64/translator/third_party/xed/mfile.py --shared examples install
cd kits/
ln -sf xed-install-base-* xed
cd ../../../
mkdir build_aarch64
cd build_aarch64/
cmake ..
make -j40

Some warning messages, such as

cc1plus: warning: /home/kawakami/tmp/dnnl_aarch64/build_aarch64/src/CMakeFiles/mkldnn.dir/cmake_pch.hxx.gch: had text segment at different address
cc1plus: warning: unrecognized command line option '-Wno-unused-private-field'

, are outputed. but they should do no real harm.

build problem

Here's the place for your question, suggestion, a feature request or brief
description of the problem. If you are submitting a defect report please fill
all the sections below. For everything else feel free to remove everything
below the line.


Environment

Intel MKL-DNN includes hardware-specific optimizations and may behave
differently on depending on the compiler and build environment. Include
the following information to help reproduce the issue:

  • CPU make and model (try lscpu; if your lscpu does not list CPU flags,
    try running cat /proc/cpuinfo | grep flags | sort -u)
  • OS version (uname -a)
  • Compiler version (gcc --version)
  • MKLROOT value (echo MKLROOT=$MKLROOT)
  • CMake version (cmake --version)
  • CMake output log
  • git hash (git log -1 --format=%H)

on qemu docker ubuntu environment

Steps to reproduce

Please check that the issue is reproducible with the latest revision on
master. Include all the steps to reproduce the issue. A short C/C++ program
or modified unit tests demonstrating the issue will greatly help
with the investigation.

Actual behavior

Describe the behavior you see.

Expected behavior

Describe the behavior you expect.

just follow the instructions and find this problem:
/home/dev/files/binutils-2.38/bin/ld: ../src/libmkldnn.so.0.21: undefined reference to `mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64, (mkldnn_data_type_t)1, (mkldnn_data_type_t)1>::gemm(char, char, int, int, int, float, flo
at const*, int, float const*, int, float, float*, int) const'

JIT assembler function missing for aarch64

Hi, *

MKL-DNN use jit_* function to produce assembler code, but it seems this repo didn`t produce arm assembler base on Xbyak_aarch64. If i am wrong,which interface can do it.

How long does it take to execute examples/simple-net-cpp on Qemu 4.2?

I try to execute examples/simple-net-cpp but it does not finished overnight (8h).

How long does it take to execute examples?
My server CPU is follows (16 cores)

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
stepping        : 12
microcode       : 0xae
cpu MHz         : 4699.968
cache size      : 16384 KB
physical id     : 0
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp arch_capabilities
bugs            : spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 7199.74
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

P.S. for executing examples/simple-net-c finished within an hour.

Arm support in oneDNN

Just wanted to let you know that initial support for Arm 64-bit Architecture (AArch64) landed in oneDNN master along with the changes that will enable to easier maintain optimizations for non-x86 architectures. See details here.

test_sum is not fully running on Qemu 4.2

I try to run test_sum on Qemu 4.2. But it failed on some tests.
I do following comment out and works.
Currently these functions are supported?

# git diff
diff --git a/mkl-dnn/tests/gtests/test_sum.cpp b/mkl-dnn/tests/gtests/test_sum.cpp
index 77c1cb8..92636f2 100644
--- a/mkl-dnn/tests/gtests/test_sum.cpp
+++ b/mkl-dnn/tests/gtests/test_sum.cpp
@@ -262,14 +262,14 @@ TEST_P(sum_cc_f32, TestSumCornerCases) {}
 #undef CASE_CC

   INST_TEST_CASE(sum_test_float_omit_output, 1)
-  INST_TEST_CASE(sum_test_u8_omit_output, 1)
-  INST_TEST_CASE(sum_test_s8_omit_output, 1)
-  INST_TEST_CASE(sum_test_s32_omit_output, 1)
+//  INST_TEST_CASE(sum_test_u8_omit_output, 1)
+//  INST_TEST_CASE(sum_test_s8_omit_output, 1)
+//  INST_TEST_CASE(sum_test_s32_omit_output, 1)

   INST_TEST_CASE(sum_test_float, 0)
-  INST_TEST_CASE(sum_test_u8, 0)
-  INST_TEST_CASE(sum_test_s8, 0)
-  INST_TEST_CASE(sum_test_s32, 0)
+//  INST_TEST_CASE(sum_test_u8, 0)
+//  INST_TEST_CASE(sum_test_s8, 0)
+//  INST_TEST_CASE(sum_test_s32, 0)

 #undef INST_TEST_CASE
 }

Questions about __ARM_ARCH

I have two questions below:

First, when I used the extended_sgemm, I found it went into __ARM_ARCH acquiescently. But I can not find the place that it was defined. Could you help me solve this problem?

Second, I tried to use jit_avx512_common_gemm_f32 but was failed because of a undefined references ocuured in libmkldnn. Should I adjust other parameters to run it?

  • OS version: aarch64 GNU/Linux
  • Compiler version gcc (Ubuntu/Linaro 5.4.0-6kord1~16.04.12) 5.4.0 20160609
  • MKLROOT value (echo MKLROOT=$MKLROOT)

#ifdef __ARM_ARCH
// return ref_gemm(transa, transb, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
//else // #ifdef __ARM_ARCH
if (mayiuse(avx512_mic)) {
printf("enter 1\n");
return jit_avx512_common_gemm_f32(transa, transb,
M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
} else if (mayiuse(avx)) {
printf("enter 2\n");
float *dummy_ao = NULL;
float *dummy_bo = NULL;

    return gemm_driver(transa, transb, bias ? "C" : NULL, M, N, K, alpha,
            A, lda, dummy_ao, B, ldb, dummy_bo, beta, C, ldc, bias,
            force_jit_nocopy_gemm);
} else {
    printf("enter 3\n");
    return ref_gemm<float>(transa, transb,
            M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, bias);
}

#endif // #ifdef __ARM_ARCH

DNNL_aarch64 aborts, when TF+DNNL with a simple imagenet/resnet50 benchmark

using TF+DNNL with a simple imagenet/resnet50 benchmark fails , see the following error output:

Train on 32 samples
bad err=17 in Xbyak::Error
terminate called after throwing an instance of 'Xbyak::Error'
  what():  illegal immediate parameter (condition error)
-----------------------------------------
index:   mnemonic location of define
   14:        eor (external/mkl_dnn/src/cpu/xbyak_aarch64/xbyak_aarch64/xbyak_aarch64_mnemonic.h:4726)
   15:      ptrue (external/mkl_dnn/src/cpu/xbyak_aarch64/xbyak_aarch64/xbyak_aarch64_mnemonic.h:15821)
   16:        add (external/mkl_dnn/src/cpu/xbyak_aarch64/xbyak_aarch64/xbyak_aarch64_mnemonic.h:40)
   17:       ld4w (external/mkl_dnn/src/cpu/xbyak_aarch64/xbyak_aarch64/xbyak_aarch64_mnemonic.h:18759)
   18:       ld3w (external/mkl_dnn/src/cpu/xbyak_aarch64/xbyak_aarch64/xbyak_aarch64_mnemonic.h:18751) <---- Error[1]    17441 abort (core dumped)  python3.6 -m benchmarker --mode=training --framework=tensorflow

test_concat does not fully running on Qemu 4.2

test_concat does not fully running on Qemu 4.2.
The error message is follows

test_concat: malloc.c:4023: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted (core dumped)

To avoid this problem following change should be applied

diff --git a/mkl-dnn/tests/gtests/test_concat.cpp b/mkl-dnn/tests/gtests/test_concat.cpp
index b479779..66ff650 100644
--- a/mkl-dnn/tests/gtests/test_concat.cpp
+++ b/mkl-dnn/tests/gtests/test_concat.cpp
@@ -275,9 +275,9 @@ INSTANTIATE_TEST_CASE_P(TestConcat, concat_test_float, ::testing::Values(
     concat_test_params{engine::kind::cpu, 0,
     {memory::format::nChw8c, memory::format::nChw8c}, memory::format::nChw8c,
     {{2, 16, 1, 1}, {2, 16, 1, 1}}, {4, 16, 1, 1}},
-    concat_test_params{engine::kind::cpu, 0,
-    {memory::format::nchw, memory::format::nchw}, memory::format::nChw8c,
-    {{2, 16, 1, 1}, {2, 16, 1, 1}}, {4, 16, 1, 1}},
+    // concat_test_params{engine::kind::cpu, 0,
+    // {memory::format::nchw, memory::format::nchw}, memory::format::nChw8c,
+    // {{2, 16, 1, 1}, {2, 16, 1, 1}}, {4, 16, 1, 1}},
     concat_test_params{engine::kind::cpu, 0,
     {memory::format::nChw8c, memory::format::nChw8c}, memory::format::nchw,
     {{2, 16, 1, 1}, {2, 16, 1, 1}}, {4, 16, 1, 1}},

ResNet50 performance evaluation

Hello! excuse me.
DNNL is an excellent work.
I want to know the performance of training ResNet-50 on A64FX processor with DNNL.
I wonder if you have done this performance evaluation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.