Coder Social home page Coder Social logo

intel / cldnn Goto Github PK

View Code? Open in Web Editor NEW
574.0 70.0 115.0 138.67 MB

Compute Library for Deep Neural Networks (clDNN)

Home Page: https://01.org/cldnn

CMake 1.37% C 16.30% C++ 82.13% Batchfile 0.02% Shell 0.01% Python 0.16% SourcePawn 0.01%
deep-neural-networks deep-learning intel intel-hd-graphics cldnn

cldnn's Introduction

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: [email protected] 

Compute Library for Deep Neural Networks (clDNN)


Discontinued repository

This project now is an integral part of Intel® Distribution of OpenVino™ Toolkit. It's content and development has been moved to DLDT repo.

To get latest clDNN sources please refer to DLDT repo.


Apache License Version 2.0 v1.0

Compute Library for Deep Neural Networks (clDNN) is an open source performance library for Deep Learning (DL) applications intended for acceleration of DL Inference on Intel® Processor Graphics – including HD Graphics and Iris® Graphics.
clDNN includes highly optimized building blocks for implementation of convolutional neural networks (CNN) with C and C++ interfaces. We created this project to enable the DL community to innovate on Intel® processors.

Usages supported: Image recognition, image detection, and image segmentation.

Validated Topologies: AlexNet*, VGG(16,19)*, GoogleNet(v1,v2,v3)*, ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.

As with any technical preview, APIs may change in future updates.

License

clDNN is licensed is licensed under Apache License Version 2.0.

Attached licenses

clDNN uses 3rd-party components licensed under following licenses:

Documentation

The latest clDNN documentation is at GitHub pages.

There is also inline documentation available that can be generated with Doxygen.

Accelerate Deep Learning Inference with Intel® Processor Graphics whitepaper link.

Intel® OpenVino™ Toolkit and clDNN

clDNN is released also together with Intel® OpenVino™ Toolkit, which contains:

  • Model Optimizer a Python*-based command line tool, which imports trained models from popular deep learning frameworks such as Caffe*, TensorFlow*, and Apache MXNet*.
  • Inference Engine an execution engine which uses a common API to deliver inference solutions on the platform of your choice (for example GPU with clDNN library)

You can find more information here.

Changelog

Drop 14.1

New features:
- network serialization
- 3D support for: Acitvation, Reorder, Eltwise, Reshape, Deconvolution
Bug fixes:
- concatenation fix for different input formats
UX:
- added 2019.4 intel ocl icd
- refactored bfyx_f16 format
- added i32 and i64 support for select primitive

Drop 14.0

New features:
- 3 spatial dimensions support in convolution primitive (3D convolution)
- reverse primitive
- arg_max_min support for i8/s8/i32/i64 types
- concatenation support for bfzyx (5D) format
Bug fixes:
- fixes in primitive fusing pass (for i8/s8 types)
- fixes in graph optimizer (reshape primitive)
- overflow/underflow fixes for eltwise (i8/s8)
- fixes for convolution-eltwise primitive
- fixes for convolution primitive (depth-wise case)
- perf fixes for events pool
- fixes for pooling primitive (u8)
- fixes for deconvolution primitive
- fixes for fc primitive
- fixes for batch_norm primitive
UX:
- refactored and cleaned up JIT constants generation mechanism
- refactored kernel selection mechanism
- removed legacy device info mechanism
Performance:
- convolution primitive optimizations (for byxf, for MMAD-based, for byxf fp16, for bfyx fp16)
- fc primitive optimizations (for byxf)
- pooling primitive optimizations (for byxf, bfyx)
- convolution-relu primitive fusing (i8 -> s8 case)
- eltwise primitive optimizations (for byxf)
- fused convolution-eltwise primitive optimizations (IMAD-based)
- block-based optimizations for fp16 primitives

Drop 13.1

New features:
- added max mode for contract primitive
- added one_hot primitive
- optional explicit output data type support for all primitives
Bug fixes:
- fix for graph optimizer (crop primitive)
- fix for processing order (deconvolution primitive)
- fix for convolution-eltwise primitive
UX:
- cache.json is searched in to library directory
Performance:
- optimizations for lstm_gemm primitive

Drop 13.0

New features:
- events pool
- group support in convolution and deconvolution primitives
- broadcastable inputs support for eltwise primitive
- asymmetric padding for convolution primitive
- fused convolution-eltwise primitive (API extension)
- auto-calculated output shape support for reshape primitive
- crop support for i8/s8/i32/i64 types
- broadcast axis support for broadcast primitive
- logic and comparison operations support for eltwise primitive
Bug fixes:
- added required alignment checks for some fc implementations
- added lstm support for f16 (half) type
- reorders for fc moved to graph compiler
- primitive fusing and reorder fixes
UX:
- added internal core tests project
- refactored optimizations pass manager and passes
Performance:
- optimized concatenation during upsampling (unpool)
- IMAD-based optimizations for convolution, fc, eltwise and pooling primitives (i8/s8)
- convolution-eltwise fusing optimizations
- partial writes optimizations for block-based kernels

Drop 12.1

- gtests code refactor
- buildbreak fix

Drop 12.0

New features:
- pyramidRoiAlign primitive
- multiple axes support for reverse mode in index_select
- eltwise min/max/mod support for i8/i32/i64
- broadcast support for i32/i64
Bug fixes:
- memory leak fixes
- in-place reshape
- no padding for output primitives
UX:
- RapidJSON library for auto-tune cache
- less dependencies in program.cpp
- do not throw error, when device not validated
- global pooling in c API
- optimized padding for convolution

Drop 11.0

New features:
- throttle hints
- extended border and tile
- GPU implementation of Detection Output
- More cases for BatchNorm primitive
Bug fixes:
- GEMM fix (align with ONNX)
- memory leak fix in memory pool
- increase FC precision for fp16 (fp32 accu) 
Performance:
- cache for new topologies and devices
- conv1x1 with stride >1 into eltwise optimization 

Drop 10.0

New features:
- condition primitive
- fused convolution with bn and scale (backprop)
- scale/shit and mean/var as an output in batch norm
- add LSTM output selection
Bug fixes:
- memory pool fixes 
UX:
- downgrade to cxx11
- add support for u8 data type in custom primitive 
- library size optimizations
Performance:
- in place concatenation optimization 
- conv1x1 with stride >1 into eltwise optimization 

Drop 9.2

New features
- local convolution
- eltwise with strie

Drop 9.1

New features:
- select index primitive
- gemm primitive
Bug fixes:
- fix for output format in fully connected primitive

Drop 9.0

New features:
- log2 activation function
- support for i32 and i64 types
- select primitive
- border primitive
- tile primitive
Bug fixes:
- dilation > input size fix

Drop 8.0

New features:
- lstm primitive
- average unpooling primitive
- serialization - dump weights, biases and kernels
- scale grad for input and weights primitive
Bug fixes:
- wrong gws in concatenation
- int8 layers
- convolution depthwise bias concatenation
- params in engine_info
- mutable_data filler
- momentum calculation
UX:
- kernel selector renaming
- bfyx_yxfb batched reorder
- code cleanups
- primitives allocation order

Drop 7.0

New features:
- support for img_info=4 in proposal_gpu
- support images format in winograd
- support for 2 or more inputs in eltwise
- priority and throttle hints
- deconvolution_grad_input primitive
- fc_grad_input and fc_grad_weights primitives
Bug fixes:
- tensor fixes (i.e. less operator fix)
- cascade concat fixes
- winograd fixes for bfyx format
- auto-tuning fixes for weights calculation
UX:
- memory pool (reusing memory buffers)
- added choosen kernel name in graph dump
- flush memory functionality
Performance:
- graph optimizations
- depth-concatenation with fused relu optimization
- winograd optimizations
- deconvolution optimizations (i.e bfyx opt)

Drop 6.0

New features:
- fused winograd
- image support for weights
- yolo_region primitive support
- yolo_reorg primitive support
Bug fixes:
- winograd bias fix
- mean subtract fix
UX:
- update boost to 1.64.0
- extend graph dumps
Performance:
- update offline caches for newer drivers
- conv1x1 byxf optimization
- conv1x1 with images
- cascade depth concatenation fuse optimization

Drop 5.0

New features:
- split primitive
- upsampling primitive
- add preliminary Coffe Lake support
- uint8 weights support
- versioning
- offline autotuner cache
- Winograd phase 1 - not used yet
Bug fixes:
- in-place crop optimization bug fix
- output spatial padding in yxfb kernels fix
- local work sizes fix in softmax
- underflow fix in batch normalization
- average pooling corner case fix
UX:
- graph logger, dumps graphwiz format files
- extended documentation with API diagram and graph compilation steps
Performance:
- softmax optimization
- lrn within channel optimization
- priorbox optimization
- constant propagation

Drop 4.0

New features:
- OOOQ execution model implementation
- depthwise separable convolution implementation
- kernel auto-tuner implementation
Bug fixes:
- dump hidden layer fix
- run single layer fix
- reshape fix
UX:
- enable RTTI
- better error handling/reporting
Performance:
- lrn optimization
- dynamic pruning for sparse fc layers
- reorder optimization
- concatenation optimization
- eltwise optimization
- activation fusing 

Drop 3.0

Added:
- kernel selector
- custom layer
Changed:
- performance improvments
- bug fixes (deconvolution, softmax, reshape)
- apply fixes from community reported issues

Drop 2.0

Added:
- step by step tutorial
Changed:
- perfomance optimization for: softmax, fully connected, eltwise, reshape
- bug fixes (conformance)

Drop 1.0

- initial drop of clDNN

Support

Please report issues and suggestions GitHub issues.

How to Contribute

We welcome community contributions to clDNN. If you have an idea how to improve the library:

  • Share your proposal via GitHub issues
  • Ensure you can build the product and run all the examples with your patch
  • In the case of a larger feature, create a test
  • Submit a pull request

We will review your contribution and, if any additional fixes or modifications are necessary, may provide feedback to guide you. When accepted, your pull request will be merged into our internal and GitHub repositories.

System Requirements

clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for

  • Codename Skylake:
    • Intel® HD Graphics 510 (GT1, client market)
    • Intel® HD Graphics 515 (GT2, client market)
    • Intel® HD Graphics 520 (GT2, client market)
    • Intel® HD Graphics 530 (GT2, client market)
    • Intel® Iris® Graphics 540 (GT3e, client market)
    • Intel® Iris® Graphics 550 (GT3e, client market)
    • Intel® Iris® Pro Graphics 580 (GT4e, client market)
    • Intel® HD Graphics P530 (GT2, server market)
    • Intel® Iris® Pro Graphics P555 (GT3e, server market)
    • Intel® Iris® Pro Graphics P580 (GT4e, server market)
  • Codename Apollolake:
    • Intel® HD Graphics 500
    • Intel® HD Graphics 505
  • Codename Kabylake:
    • Intel® HD Graphics 610 (GT1, client market)
    • Intel® HD Graphics 615 (GT2, client market)
    • Intel® HD Graphics 620 (GT2, client market)
    • Intel® HD Graphics 630 (GT2, client market)
    • Intel® Iris® Graphics 640 (GT3e, client market)
    • Intel® Iris® Graphics 650 (GT3e, client market)
    • Intel® HD Graphics P630 (GT2, server market)
    • Intel® Iris® Pro Graphics 630 (GT2, server market)

clDNN currently uses OpenCL™ with multiple Intel® OpenCL™ extensions and requires Intel® Graphics Driver to run.

clDNN requires CPU with Intel® SSE/Intel® AVX support.


The software dependencies are:

  • CMake* 3.9 or later
  • C++ compiler with partial or full C++11 standard support compatible with:
    • GNU* Compiler Collection 4.8.2
    • clang 3.5 or later
    • Intel® C++ Compiler 17.0 or later
    • Visual C++ 2015 (MSVC++ 19.0) or later

Intel® CPU intrinsics header (<immintrin.h>) must be available during compilation.

  • python™ 2.7 or later (scripts are both compatible with python™ 2.7.x and python™ 3.x)
  • (optional) Doxygen* 1.8.13 or later
    Needed for manual generation of documentation from inline comments or running docs custom target which will generate it automatically.

GraphViz* (2.38 or later) is also recommended to generate documentation with all embedded diagrams.
(Make sure that dot application is visible in the PATH environment variable.)


We recommend to use latest for Linux link and 24.20 driver for Windows link.

Installation

Building

Download clDNN source code or clone the repository to your system:

    git clone  https://github.com/intel/cldnn.git

Satisfy all software dependencies and ensure that the versions are correct before building.

clDNN uses multiple 3rd-party components. They are stored in binary form in common subdirectory. Currently they are prepared for MSVC++ and GCC*. They will be cloned with repository.


clDNN uses a CMake-based build system. You can use CMake command-line tool or CMake GUI (cmake-gui) to generate required solution.
For Windows system, you can call in cmd (or powershell):

    @REM Generate 32-bit solution (solution contains multiple build configurations)...
    cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015" ..
    @REM Generate 64-bit solution (solution contains multiple build configurations)...
    cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015 Win64" ..

Created solution can be opened in Visual Studio 2015 or built using appropriate msbuild tool (you can also use cmake --build . to select build tool automatically).

For Unix and Linux systems:

    @REM Create GNU makefile for release clDNN and build it...
    cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
    @REM Create Ninja makefile for debug clDNN and build it...
    cmake -E make_directory build && cd build && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug .. && ninja -k 20

You can call also scripts in main directory of project which will create solutions/makefiles for clDNN (they will generate solutions/makefiles in build subdirectory and binary outputs will be written to build/out subdirectory):

  • create_msvc_mscc.bat (Windows*, Visual Studio* 2015)
  • create_unixmake_gcc.sh [Y|N] [<devtoolset-version>] (Linux*, GNU* or Ninja* makefiles, optional devtoolset support)
    • If you specify the first parameter as Y, the Ninja makefiles will be generated.
    • If you specify second parameter (number), the CMake will be called via scl with selected devtoolset version.

CMake solution offers multiple options which you can specify using normal CMake syntax (-D<option-name>=<value>):

CMake option Type Description
CMAKE_BUILD_TYPE STRING Build configuration that will be used by generated makefiles (it does not affect multi-configuration generators like generators for Visual Studio solutions). Currently supported: Debug (default), Release
CMAKE_INSTALL_PREFIX PATH Install directory prefix.
CLDNN__ARCHITECTURE_TARGET STRING Architecture of target system (where binary output will be deployed). CMake will try to detect it automatically (based on selected generator type, host OS and compiler properties). Specify this option only if CMake has problem with detection. Currently supported: Windows32, Windows64, Linux64
CLDNN__OUTPUT_DIR (CLDNN__OUTPUT_BIN_DIR, CLDNN__OUTPUT_LIB_DIR) PATH Location where built artifacts will be written to. It is set automatically to roughly build/out/<arch-target>/<build-type> subdirectory. For more control use: CLDNN__OUTPUT_LIB_DIR (specifies output path for static libraries) or CLDNN__OUTPUT_BIN_DIR (for shared libs and executables).
CMake advanced option Type Description
PYTHON_EXECUTABLE FILEPATH Path to Python interpreter. CMake will try to detect Python. Specify this option only if CMake has problem with locating Python.
CLDNN__IOCL_ICD_USE_EXTERNAL BOOL Use this option to enable use of external Intel® OpenCL™ SDK as a source for ICD binaries and headers (based on INTELOCLSDKROOT environment variable). Default: OFF
CLDNN__IOCL_ICD_VERSION STRING Version of Intel® OpenCL™ ICD binaries and headers to use (from common subdirectory). It is automatically setected by CMake (highest version). Specify, if you have multiple versions and want to use different than automatically selected.
CLDNN__COMPILE_LINK_ALLOW_UNSAFE_SIZE_OPT BOOL Allow unsafe optimizations during linking (like aggressive dead code elimination, etc.). Default: ON
CLDNN__COMPILE_LINK_USE_STATIC_RUNTIME BOOL Link with static C++ runtime. Default: OFF (shared C++ runtime is used)
CLDNN__INCLUDE_CORE BOOL Include core clDNN library project in generated makefiles/solutions. Default: ON
CLDNN__INCLUDE_TESTS BOOL Include tests application project (based on googletest framework) in generated makefiles/solutions . Default: ON
CLDNN__RUN_TESTS BOOL Run tests after building tests project. This option requires CLDNN__INCLUDE_TESTS option to be ON. Default: OFF
CLDNN__CMAKE_DEBUG BOOL Enable extended debug messages in CMake. Default: OFF

clDNN includes unit tests implemented using the googletest framework. To validate your build, run tests target, e.g.:

    make tests

(Make sure that both CLDNN__INCLUDE_TESTS and CLDNN__RUN_TESTS were set to ON when invoking CMake.)

Generating documentation

Documentation is provided inline and can be generated in HTML format with Doxygen. We recommend to use latest Doxygen* and GraphViz*.

Documentation templates and configuration files are stored in docs subdirectory. You can simply call:

    cd docs && doxygen

to generate HTML documentation in docs/html subdirectory.

There is also custom CMake target named docs which will generate documentation in CLDNN__OUTPUT_BIN_DIR/html directory. For example, when using Unix makefiles, you can run:

    make docs

in order to create it.

Deployment

Special install target will place the API header files and libraries in /usr/local (C:/Program Files/clDNN or C:/Program Files (x86)/clDNN on Windows). To change the installation path, use the option -DCMAKE_INSTALL_PREFIX=<prefix> when invoking CMake.


* Other names and brands may be claimed as the property of others.

Copyright © 2017, Intel® Corporation

cldnn's People

Contributors

abialas1 avatar ddominia avatar mdorozyn avatar mwalkowi avatar ph0b avatar rdower avatar shssf avatar smarcink avatar tponieck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cldnn's Issues

ONNXIFI wrapper

Hi, cldnner, is there any in-progress work or plan for onnxifi support?

FPGA compatibility

Is it possible to compile distributed opencl kernels into single bit stream file on FPGA? Any plans for extending this repo for intel FPGAs?

Fail to build with MINGW64 compiler within MSYS2 environment

cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Release .. && make

DL@2030006696-SOH MINGW64 ~/cldnn/build
$ cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Release .. && make
-- The C compiler identification is GNU 8.2.1
-- The CXX compiler identification is GNU 8.2.1
-- Check for working C compiler: C:/msys64/mingw64/bin/gcc.exe
-- Check for working C compiler: C:/msys64/mingw64/bin/gcc.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: C:/msys64/mingw64/bin/g++.exe
-- Check for working CXX compiler: C:/msys64/mingw64/bin/g++.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: C:/msys64/usr/bin/python2.7.exe (found suitable version "2.7.15", minimum required is "2.7")
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.4.22.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "C:/msys64/home/DL/cldnn/build/out/Windows32/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "C:/msys64/home/DL/cldnn/build/out/Windows32/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Windows32 (detected: Windows32)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build core internal tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN] - Run core internal tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/include
-- [clDNN] + Static libs: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/lib/x86
-- [clDNN] + Shared libs: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/bin/x86
-- [clDNN] + Libs to link: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/lib/x86
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: C:/msys64/home/DL/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: C:/msys64/home/DL/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: C:/msys64/home/DL/cldnn/common/boost/1.64.0/windows/x86/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- [clDNN] Selected capabilities: public
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- [clDNN] Selected capabilities: public
-- Configuring done
-- Generating done
-- Build files have been written to: C:/msys64/home/DL/cldnn/build
[ 0%] Generating ks_primitive_db.inc ...
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_opt.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_ref.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_tutorial.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/arg_max_min_axis.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/arg_max_min_gpu_ref.cl

..
..
..
..
..

[ 35%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_fb.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_items_class_optimized.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_ref.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_selector.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_base.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_ref.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_selector.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/tile/tile_kernel_ref.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/tile/tile_kernel_selector.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_base.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_ref.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_selector.cpp.obj
[ 37%] Linking CXX static library ../out/Windows32/Release/libcldnn_kernel_selector32.a
Error copying file (if different) from "C:/msys64/home/DL/cldnn/kernel_selector/core/cache/cache.json" to "C:/msys64/home/DL/cldnn/build/out/Windows32/Release/Release/".
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/build.make:3965: out/Windows32/Release/libcldnn_kernel_selector32.a] Error 1
make[2]: *** Deleting file 'out/Windows32/Release/libcldnn_kernel_selector32.a'
make[1]: *** [CMakeFiles/Makefile2:313: kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

clDNNPlugin R5 cannot be built with clDNN Drop 12.1 without graphics driver installed

I'm trying to build OpenVINO 2018 R5 with Drop 12.1 (since Intel's distribution contains an earlier version of clDNN, like something before Drop 11, which features a horrendous memory leak).
Due to absence of Intel Graphics on my CPU the graphics driver refuses to install, which result in a clDNNPlugin linking error to OpenCL.lib.

I've traced the issue to clDNN's build script:

clDNN/src/CMakeLists.txt

Lines 234 to 239 in f91d7d8

target_link_libraries("${CLDNN_BUILD__PROJ}"
OpenCL
Boost::filesystem
Boost::system
cldnn_kernel_selector
)

while it sets OpenCL.lib as a public link library, it does not propagate corresponding link directory to consumers in the same way (I'm not sure what happens to link_directories from the root script, though it is not respected by OpenVINO's scripts, and I don't think it's a good practice to propagate target's dependencies via include_directories, link_directories and similar global commands).

I'm not sure if it's the best place for a fix (moreover I think it would be better to create an import target for OpenCL), though it definitely resolves the link issue:

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 6313d50..10c5b88 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -237,6 +237,9 @@ target_link_libraries("${CLDNN_BUILD__PROJ}"
     Boost::system
     cldnn_kernel_selector
   )
+target_link_directories("${CLDNN_BUILD__PROJ}"
+    INTERFACE ${CLDNN__IOCL_ICD_LIBDIRS}
+  )

 if(WIN32)
   target_link_libraries("${CLDNN_BUILD__PROJ}" setupapi)

System Requirements

It seems like the "System Requirements" on the main github page is a bit misleading. It says that clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for Skylake and Apollolake, when in fact it does not support anything older than gen 5.

The following exception is thrown when attempting to create an engine on an i7-4712HQ:
Device lookup failed - unsupported device id: 0x416. Note: HD5xx+ devices are supported

Are there any plans to support HD4 and older ?

use CMAKE_BINARY_DIR appropriately

https://github.com/intel/clDNN/blob/master/CMakeLists.txt#L104 causes an issue where systems will compile a binary with avx2 and another binary with avx512. Because CMAKE_BINARY_DIR is not used, whichever binary that was compiled last will override the former. When compiling with different instruction sets or optimizations, distros will create build_avx2 or build_avx512 directories and set CMAKE_BINARY_DIR to one of those. This way different build types do not conflict.

The solution here is NOT to reinvent the wheel by setting binary output to places other than CMAKE_BINARY_DIR.

Support ONNX network model loading..

Hi,
now that some optimized inference DNN libs like: NV TensorRT, Windows WinML and Qualcomm Snapdragon Neural Processing Engine (NPE) SDK support loading ONNX models (or whatever format like tensorflow etc.. for ONNX seems the most commonly/broadly supported) for simplicity would be nice if clDNN supports that also..
seems like a simple mnist sample should be much shorter than :

#include <api/CPP/memory.hpp>
#include <api/CPP/topology.hpp>
#include <api/CPP/reorder.hpp>
#include <api/CPP/input_layout.hpp>
#include <api/CPP/convolution.hpp>
#include <api/CPP/data.hpp>
#include <api/CPP/pooling.hpp>
#include <api/CPP/fully_connected.hpp>
#include <api/CPP/softmax.hpp>
#include <api/CPP/engine.hpp>
#include <api/CPP/network.hpp>
#include
using namespace cldnn;
using namespace std;
const tensor::value_type
input_channels = 1,
input_size = 28,
conv1_out_channels = 20,
conv2_out_channels = 50,
conv_krnl_size = 5,
fc1_num_outs = 500,
fc2_num_outs = 10;
// Create layout with same sizes but new format.
layout create_reordering_layout(format new_format, const layout& src_layout)
{
return { src_layout.data_type, new_format, src_layout.size };
}
// Create MNIST topology
topology create_topology(const layout& in_layout, const memory& conv1_weights_mem, const memory& conv1_bias_mem )
{
auto data_type = in_layout.data_type;
// Create input_layout description
// "input" - is the primitive id inside topology
input_layout input("input", in_layout);
// Create topology object with 2 primitives
cldnn:: topology topology(
// 1. input layout primitive.
input,
// 2. reorder primitive with id "reorder_input"
reorder("reorder_input",
// input primitive for reorder (implicitly converted to primitive_id)
input,
// output layout for reorder
create_reordering_layout(format::yxfb, in_layout))
);
// Create data primitive - its content should be set already.
cldnn::data conv1_weights( "conv1_weights", conv1_weights_mem );
// Add primitive to topology
topology.add(conv1_weights);
// Emplace new primitive to topology
topology.addcldnn::data({ "conv1_bias", conv1_bias_mem });
// Emplace 2 primitives
topology.add(
// Convolution primitive with id "conv1"
convolution("conv1",
"reorder_input", // primitive id of the convolution's input
{ conv1_weights }, // weights primitive id is taken from the object
{ "conv1_bias" } // bias primitive id
),
// Pooling id: "pool1"
pooling("pool1",
"conv1", // Input: "conv1"
pooling_mode::max, // Pooling mode: MAX
spatial(2,2), // stride: 2
spatial(2,2) // kernel_size: 2
)
);
// Conv2 weights data is not available now, so just declare its layout
layout conv2_weights_layout(data_type, format::bfyx,{ conv2_out_channels, conv1_out_channels, conv_krnl_size, conv_krnl_size });
// Define the rest of topology.
topology.add(
// Input layout for conv2 weights. Data will passed by network::set_input_data()
input_layout("conv2_weights", conv2_weights_layout),
// Input layout for conv2 bias.
input_layout("conv2_bias", { data_type, format::bfyx, spatial(conv2_out_channels) }),
// Second convolution id: "conv2"
convolution("conv2",
"pool1", // Input: "pool1"
{ "conv2_weights" }, // Weights: input_layout "conv2_weights"
{ "conv2_bias" } // Bias: input_layout "conv2_bias"
),
// Second pooling id: "pool2"
pooling("pool2",
"conv2", // Input: "conv2"
pooling_mode::max, // Pooling mode: MAX
spatial(2, 2), // stride: 2
spatial(2, 2) // kernel_size: 2
),
// Fully connected (inner product) primitive id "fc1"
fully_connected("fc1",
"pool2", // Input: "pool2"
"fc1_weights", // "fc1_weights" will be added to the topology later
"fc1_bias", // will be defined later
true // Use built-in Relu. Slope is set to 0 by default.
),
// Second FC/IP primitive id: "fc2", input: "fc1".
// Weights ("fc2_weights") and biases ("fc2_bias") will be defined later.
// Built-in Relu is disabled by default.
fully_connected("fc2", "fc1", "fc2_weights", "fc2_bias"),
// The "softmax" primitive is not an input for any other,
// so it will be automatically added to network outputs.
softmax("softmax", "fc2")
);
return topology;
}
// Copy from a vector to cldnn::memory
void copy_to_memory(memory& mem, const vector& src)
{
cldnn::pointer dst(mem);
std::copy(src.begin(), src.end(), dst.begin());
}
// Execute network
int recognize_image(network& network, const memory& input_memory)
{
// Set/update network input
network.set_input_data("input", input_memory);
// Start network execution
auto outputs = network.execute();
// get_memory() blocks output generation completed
auto output = outputs.at("softmax").get_memory();
// Get direct access to output memory
cldnn::pointer out_ptr(output);
// Analyze result
auto max_element_pos = max_element(out_ptr.begin(), out_ptr.end());
return static_cast(distance(out_ptr.begin(), max_element_pos));
}
// User-defined helpers which are out of this example scope
// //////////////////////////////////////////////////////////////
// Loads file to a vector of floats.
vector load_data(const string&) { return{ 0 }; }
// Allocates memory and loads data from file.
// Memory layout is taken from file.
memory load_mem(const engine& eng, const string&) {
//return a dummy value
return memory::allocate(eng, layout{ data_types::f32, format::bfyx, { 1, 1, 1, 1 } });
}
// Load image, resize to [x,y] and store in a vector of floats
// in the order "bfyx".
vector load_image_bfyx(const string&, int, int) { return{ 0 }; }
// //////////////////////////////////////////////////////////////
int main()
{
// Use data type: float
auto data_type = type_to_data_type::value;
// Network input layout
layout in_layout(
data_type, // stored data type
format::bfyx, // data stored in order batch-channel-Y-X, where X coordinate changes first.
{1, input_channels, input_size, input_size} // batch: 1, channels: 1, Y: 28, X: 28
);
// Create memory for conv1 weights
layout conv1_weights_layout(data_type, format::bfyx,{ conv1_out_channels, input_channels, conv_krnl_size, conv_krnl_size });
vector my_own_buffer = load_data("conv1_weights.bin");
// The conv1_weights_mem is attached to my_own_buffer, so my_own_buffer should not be changed or descroyed until network execution completion.
auto conv1_weights_mem = memory::attach(conv1_weights_layout, my_own_buffer.data(), my_own_buffer.size());
// Create default engine
cldnn::engine engine;
// Create memory for conv1 bias
layout conv1_bias_layout(data_type, format::bfyx, spatial(20));
// Memory allocation requires engine
auto conv1_bias_mem = memory::allocate(engine, conv1_bias_layout);
// The memory is allocated by library, so we do not need to care about buffer lifetime.
copy_to_memory(conv1_bias_mem, load_data("conv1_bias.bin"));
// Get new topology
cldnn::topology topology = create_topology(in_layout, conv1_weights_mem, conv1_bias_mem);
// Define network data not defined in create_topology()
topology.add(
cldnn::data("fc1_weights", load_mem(engine, "fc1_weights.data")),
cldnn::data("fc1_bias", load_mem(engine, "fc1_bias.data")),
cldnn::data("fc2_weights", load_mem(engine, "fc2_weights.data")),
cldnn::data("fc2_bias", load_mem(engine, "fc2_bias.data"))
);
// Build the network. Allow implicit data optimizations.
// The "softmax" primitive is not used as an input for other primitives,
// so we do not need to explicitly select it in build_options::outputs()
cldnn::network network(engine, topology, { build_option::optimize_data(true) });
// Set network data which was not known at topology creation.
network.set_input_data("conv2_weights", load_mem(engine, "conv2_weights.data"));
network.set_input_data("conv2_bias", load_mem(engine, "conv2_bias.data"));
// Allocate memory for input image.
auto input_memory = memory::allocate(engine, in_layout);
// Run network 2 times with different images.
for (auto img_name : { "one.jpg", "two.jpg" })
{
// Reuse image memory.
copy_to_memory(input_memory, load_image_bfyx("one.jpg", in_layout.size.spatial[0], in_layout.size.spatial[1]));
auto result = recognize_image(network, input_memory);
cout << img_name << " recognized as" << result << endl;
}
return 0;

System error

Running openvino on Intel i5 9600k UHD graphics 630
But get "[ ERROR ] failed to create engine: Device lookup failed - unsupported device id: 0x3E98. Note: HD5xx+ devices are supported"

Is this error related to driver version?

[QA] I'm curious about the formula for finding the output blockwidth

The formula for finding the output blockwidth in the ConvolutionKernel_bfyx_os_iyx_osv16.cpp file is shown.

    if (cp.stride.x == 1 && cp.stride.y == 1)
    {
        if (cp.filterSize.x == 1 && cp.filterSize.y == 1)
        {
            option.blockWidth = 16;
            option.blockHeight = 1;
            option.prefetch = 4;
        }
        //if less than 16 values is required to compute one single row of output
        //then each WI shall compute one single row to maximize reuse within SIMD subgroup (this gives very nice performance results)
        else if (params.output.X().v + (cp.filterSize.x - 1)*cp.dilation.x < sub_group_size)
        {
            option.blockWidth = params.output.X().v;
            option.blockHeight = 1;
            option.prefetch = 4;
        }
        else if (cp.filterSize.x < 5 && cp.filterSize.y < 5)
        {
            option.blockWidth = sub_group_size - cp.filterSize.x + 1;
            option.blockHeight = 2;
            option.prefetch = 4;
        }
        else
        {
            option.blockWidth = 4;
            option.blockHeight = 3;
            option.prefetch = 4;
        }
    }
    else if (cp.stride.x == 2 && cp.stride.y == 2)
    {
        option.blockWidth = 5;
        option.blockHeight = 4;
        option.prefetch = 4;
    }
    else
    {
        option.blockWidth = 4;
        option.blockHeight = 3;
        option.prefetch = 5;
        //run_info.effiency = FORCE_PRIORITY_7; // GEMM is better
    }

.

I wonder why the output blockWidth is 4 if the stride size is greater than 2. How can I calculate the output width?

Profiling Event in Tutorial Chapter 5 returns 0 nanosecond

Hello,

I run the Chapter 5 in tutorial and get each kernel's timing. But the timing should be not correct.

fc:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
fc_bias:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
fc_weights:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
input:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
relu:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
softmax:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds

My environment is Ubuntu 14.04 with Beignet driver.
The device information is as the following.

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 beignet 1.3 (git-8bd8c3a)
Platform Name: Intel Gen OCL Driver
Platform Vendor: Intel
Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short

Platform Name: Intel Gen OCL Driver
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 72
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 8
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 1000Mhz
Address bits: 32
Max memory allocation: 3221225472
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 8192
Max image 3D height: 8192
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 8192
Global memory size: 4294967296
Constant buffer size: 134217728
Max number of constant args: 8
Local memory type: Local
Local memory size: 65536
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fc8d642ebe0
Name: Intel(R) HD Graphics Skylake Server GT4
Vendor: Intel
Device OpenCL C version: OpenCL C 1.2 beignet 1.3 (git-8bd8c3a)
Driver version: 1.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 beignet 1.3 (git-8bd8c3a)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_fp16

Could you identify the issue and give me some suggestions?

Best Regards

CPU support

hi,
I could not find how to run it on CPU only. is this possible?
I have on ly high end CPU and do not have GPU.

Thanks for your support
Best Regards
Mazda

a few lines in convolution.cpp are formatted with tabs, which triggers the error 'misleading-indentation' in gcc

A few lines (38, 47, and a few others) in the source file 'convolution.cpp' are formatted using the tab character (with a tab length of 4). If they are viewed with a tab length of 8 they look incorrectly indented and trigger errors like this one in gcc:

/home/franco/cldnn/src/convolution.cpp:44:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (kernel_xy.size() != 2)

Replacing all the tabs with 4 spaces (with something like s/\t/ /g) fixes the problem and the code compiles successfully.

Missing throw?

Should it be "throw std::invalid_argument(...)" ?

cldnn_memory cldnn_attach_memory(cldnn_layout layout, void* pointer, size_t size, cldnn_status* status)
{
    return exception_handler<cldnn_memory>(CLDNN_ERROR, status, nullptr, [&]()
    {
        cldnn::layout layout_obj(layout);
        if (layout_obj.bytes_count() > size) 
            std::invalid_argument("buffer size does not match layout size");
        return api_cast(new cldnn::simple_attached_memory(layout_obj, pointer));
    });
}

FPGA?

Is this OpenCL implementation supported on Intel FPGAs?

Is there any case to verify depthwise convolution in CLDNN?

Hi, I noticed that you have implemented several depthwise convolution related cl kernels(see below). But I didn't see any related case to use these kernel.

Also, you mentioned the validated topologies include ": AlexNet*, VGG(16,19), GoogleNet(v1,v2,v3), ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.", but not include mobilenet.

So does it mean the depthwise convolution cl kernels haven't been verified yet? If not, do you have any samples to share?

depthwise convolution related cl kernels:
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_f16_depthwise.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_depthwise_weights_lwg.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_byxf_af32_depthwise.cl

Why does clDNN conv2d barely use any GPU shared memory (__local)

Hi clDNN team! I recently look into your convolution code and find that expect in winograd algorithm, the conv2d primitive doesn't use any __local cache which should be the fastest gpu cache. I ran on Intel Gen9 GPU and the convolution is still pretty fast. I'm still studying the story behind the performance boost, and it'll be great if you could give any insights,

Example of DQN and ALE

Hi guys,

Thank you for the great work to bring Intel GPU into DNN. I applauded this move to level up OpenCL into more DNN applications.

I'm looking for Deep Q-Network (DQN and Double DQN) where user could connect and link to Atari Learning Environment (ALE) in clDNN but couldn't find any similar example. By any chance if it is not available, would you consider to add this example in as ALE is a critical application in my DNN research and I would like to run DNN in Intel GPUs.

padding

As per https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics, padding can be achieved by having a output_padding set to the layer.

I have a net which is like conv1 -> pool1 -> conv2 -> pool -> fc1 -> fc2 -> softmax

When I put output_padding to pool1 layer and run net only till that I can see output being padded correctly for pool1. However, when I connect pool1 with output_padding to conv2 it dones't seem to pad the data.

I tried also putting an explicit reorder with output_padding in b/w the pool1 and conv2 still doesn't seem to pad the output of pool1.

build_option::optimize_data

Hi,
This unit test worked fine with Drop 3.0 but doesn't work with Drop 5.0.
If I remove the optimize_data build option it works on Drop 5.0 too.
I don't know if this is a problem with my test or with clDNN.

test.zip

I am running this test on Core i3-6100.

Cmake failed to compile

image

Getting error when I do "make" -
cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make

Any help? I'm running on Ubunutu 16.04

In file included from /home/ae/Documents/clDNN/src/fully_connected.cpp:18:0:
/home/ae/Documents/clDNN/src/include/fully_connected_inst.h:50:28: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
const bool bias_term() const { return !argument.bias.empty(); }
^
cc1plus: all warnings being treated as errors
src/CMakeFiles/clDNN_shlib.dir/build.make:498: recipe for target 'src/CMakeFiles/clDNN_shlib.dir/fully_connected.cpp.o' failed
make[2]: *** [src/CMakeFiles/clDNN_shlib.dir/fully_connected.cpp.o] Error 1
CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/clDNN_shlib.dir/all' failed
make[1]: *** [src/CMakeFiles/clDNN_shlib.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
ae@hped800nuc2:~/Documents/clDNN/build$ cmake --version
cmake version 3.8.0

Disclaimer in README.md

Could you please comment on the following in README.md
"
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel® a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
"
Is this part of the license? Could you please add a license file to the top level directory which covers everything in the repository?

Is there any case to verify depthwise convolution in CLDNN?

Hi, I noticed that you have implemented several depthwise convolution related cl kernels(see below). But I didn't see any related case to use these kernel.

Also, you mentioned the validated topologies include ": AlexNet*, VGG(16,19), GoogleNet(v1,v2,v3), ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.", but not include mobilenet.

So does it mean the depthwise convolution cl kernels haven't been verified yet? If not, do you have any samples to share?

depthwise convolution related cl kernels:
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_f16_depthwise.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_depthwise_weights_lwg.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_byxf_af32_depthwise.cl

cmake -DCMAKE_BUILD_TYPE=Release .. Error

Hello,
My OS is Ubuntu 16.04.
cmake version is 3.7.2.
intel graphic driver is SRB5
intel opencl sdk is 1.2-7.0
When i run cmake -DCMAKE_BUILD_TYPE=Release .., i get this following error:
**-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.12", minimum required is "2.7")
CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)

CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)

CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)

CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)

-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.3.8.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "/home/user1/cldnn/build/out/Linux64/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "/home/user1/cldnn/build/out/Linux64/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Linux64 (detected: Linux64)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: /home/user1/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/include
-- [clDNN] + Static libs: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/lib/x64
-- [clDNN] + Shared libs: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN] + Libs to link: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: /home/user1/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: /home/user1/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: /home/user1/cldnn/common/boost/1.64.0/linux/x64/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- [clDNN] Selected capabilities: public
-- Configuring done
CMake Error at src/CMakeLists.txt:191 (add_library):
Target "clDNN_shlib" links to target "Boost::filesystem" but the target was
not found. Perhaps a find_package() call is missing for an IMPORTED
target, or an ALIAS target is missing?

CMake Error at src/CMakeLists.txt:191 (add_library):
Target "clDNN_shlib" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

CMake Error at tests/CMakeLists.txt:123 (add_executable):
Target "tests" links to target "Boost::filesystem" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

CMake Error at tests/CMakeLists.txt:123 (add_executable):
Target "tests" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

CMake Error at tutorial/CMakeLists.txt:60 (add_executable):
Target "tutorial" links to target "Boost::filesystem" but the target was
not found. Perhaps a find_package() call is missing for an IMPORTED
target, or an ALIAS target is missing?

CMake Error at tutorial/CMakeLists.txt:60 (add_executable):
Target "tutorial" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

-- Generating done
-- Build files have been written to: /home/user1/cldnn/build**

Please help me. I cannot find any solutions. Thank you very much.

Drop 12.0 does not work with OpenVINO R4

Getting
primitive add failed: basic_string::_S_construct null not valid
while trying to replace original OpenVINO's libclDNN64.so with a Drop 12.0 build (due to horrendous memory leak at ~1.5MB/s).

macOS build fail

I try to compile on macOS. I have installed boost by package manager Homebrew, but Cmake can't find boost. Here is error message:

  Could not find the following static Boost libraries:

          boost_system
          boost_date_time
          boost_program_options
          boost_filesystem

  No Boost libraries were found.  You may need to set BOOST_LIBRARYDIR to the
  directory containing Boost libraries or BOOST_ROOT to the location of
  Boost.
Call Stack (most recent call first):
  CMakeLists.txt:577 (find_package)
CMake Error at CMakeCompilerLinkerOpts.txt:328 (message):
  [clDNN] Unknown compiler.  Please define support for it or use different
  compiler.
Call Stack (most recent call first):
  CMakeLists.txt:709 (include)

Building on ClearLinux Fails

Hi,
I'm trying to build the project on ClearLinux OS here are my environment details:
CMake version: 3.13.3
GCC version: gcc (Clear Linux OS for Intel Architecture) 8.2.1 20180502

Errors:

[ 53%] Built target api_test_builds
Scanning dependencies of target clDNN_shlib
[ 53%] Building CXX object src/CMakeFiles/clDNN_shlib.dir/graph_optimizer/add_required_reorders.cpp.o
In file included from /home/daniel/cldnn/src/include/layout_optimizer.h:31,
                 from /home/daniel/cldnn/src/include/pass_manager.h:21,
                 from /home/daniel/cldnn/src/graph_optimizer/add_required_reorders.cpp:21:
/home/daniel/cldnn/src/include/generic_layer.hpp: In constructor ‘cldnn::generic_layer::generic_layer(const dto*)’:
/home/daniel/cldnn/src/include/generic_layer.hpp:64:111: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
         , generic_params(*static_cast<const kernel_selector::generic_kernel_params* const>(dto->generic_params))
                                                                                                               ^
cc1plus: all warnings being treated as errors
make[3]: *** [src/CMakeFiles/clDNN_shlib.dir/build.make:63: src/CMakeFiles/clDNN_shlib.dir/graph_optimizer/add_required_reorders.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:92: src/CMakeFiles/clDNN_shlib.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:215: tests/CMakeFiles/tests.dir/rule] Error 2
make: *** [Makefile:190: tests] Error 2

any help will be deeply appreciated

many disabled tests

I am using the following variables to enable some tests:
-DCLDNN__RUN_TESTS:BOOL=ON -DCLDNN__INCLUDE_TESTS:BOOL=ON

About 125 tests pass, but there is a warning that more than 1000 tests have been disabled. What is the reason?

clDNN Make errors

After running this command to build:

cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make

the following error occur, anyone have any idea to solve this? The log is at the bottom.

/home/up2/cldnn/kernel_selector/common/tensor_type.cpp: In member function ‘KernelSelector::Tensor::DataTensor KernelSelector::Tensor::DataTensor::FlattenFeatureAndSpatials() const’:
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:128:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:129:13: note: here
case Tensor::bfyx:
^~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:137:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:138:13: note: here
case Tensor::byxf:
^~~~
cc1plus: all warnings being treated as errors
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o] Error 1
make[1]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [all] Error 2

----------------------------------------------------Log-----------------------------------------------------------

cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
-- The C compiler identification is GNU 7.2.1
-- The CXX compiler identification is GNU 7.2.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.5", minimum required is "2.7")
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.3.8.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "/home/up2/cldnn/build/out/Linux64/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "/home/up2/cldnn/build/out/Linux64/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Linux64 (detected: Linux64)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: /home/up2/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/include
-- [clDNN] + Static libs: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/lib/x64
-- [clDNN] + Shared libs: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN] + Libs to link: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: /home/up2/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: /home/up2/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: /home/up2/cldnn/common/boost/1.64.0/linux/x64/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- [clDNN] Selected capabilities: public
-- Configuring done
-- Generating done
-- Build files have been written to: /home/up2/cldnn/build
[ 0%] Generating ks_primitive_db.inc ...
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/concatenation_gpu_depth_bfyx_no_pitch.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/concatenation_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_1x1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_1x1_hgemm_buf_16x1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_3x3_dw_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_direct_10_12_16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_direct_8_8_16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_gemm_like_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_gemm_like_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_os_iyx_osv16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_winograd_2x3_s1_fused.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b16_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b16_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b1_block_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b1_block_multiple_x_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b8_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/deconvolution_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/eltwise_simple_vload8.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_gemm.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_input_spatial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bfyx_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv16_af8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv16_b1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv8_af8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_b8_f8.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_b8_f8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_block_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_oi_b8_fp32_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_oi_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_image_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_yxfb_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/generic_eltwise_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_multiple_features.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_yxfb_b8_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_within_channel.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_within_channel_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/normalize_gpu_across_spatial_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/normalize_gpu_within_spatial_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/permute_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_average_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_bfyx_block_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/region_yolo_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_data.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_data_fast_b1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_from_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_to_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights_image_2d_c4_fyx_b.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorg_yolo_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reshape_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/roi_pooling_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_bf.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_fb.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_items_class_optimized.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/upsampling_ref.cl
[ 1%] Updating file if the file changed (ks_primitive_db.inc) ...
Scanning dependencies of target cldnn_kernel_selector
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/auto_tuner.cpp.o
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/auto_tuner_offline.cpp.o
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_base.cpp.o
[ 3%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector.cpp.o
[ 3%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector_common.cpp.o
[ 4%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector_params.cpp.o
[ 4%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp: In member function ‘KernelSelector::Tensor::DataTensor KernelSelector::Tensor::DataTensor::FlattenFeatureAndSpatials() const’:
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:128:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:129:13: note: here
case Tensor::bfyx:
^~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:137:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:138:13: note: here
case Tensor::byxf:
^~~~
cc1plus: all warnings being treated as errors
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o] Error 1
make[1]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [all] Error 2

Build fails

Linux build fails against commit 02add7c

my Linux box is ubuntu 16.04.5

Build cldnn with mkdir build; cd build; cmake ..; make

The error message is:

[ 51%] Built target cldnn_kernel_selector
make[2]: Circular codegen/test_builds/api_c_test.c <- codegen/test_builds/api_c_test.c dependency dropped.
make[2]: Circular codegen/test_builds/api_cpp_test.cpp <- codegen/test_builds/api_cpp_test.cpp dependency dropped.
make[2]: Circular codegen/test_builds/api_cpp_test.cpp <- codegen/test_builds/api_cpp_test.cpp dependency dropped.
make[2]: Circular codegen/test_builds/api_c_test.c <- codegen/test_builds/api_c_test.c dependency dropped.
[ 51%] Building C object api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o
In file included from /home/nhu/code/clDNN/build/codegen/test_builds/api_c_test.c:16:0:
/home/nhu/code/clDNN/api/C/pooling.h:56:1: error: unknown type name ‘bool’
 bool global_pooling;
 ^
api_test_builds/CMakeFiles/api_test_builds.dir/build.make:213: recipe for target 'api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o' failed
make[2]: *** [api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'api_test_builds/CMakeFiles/api_test_builds.dir/all' failed
make[1]: *** [api_test_builds/CMakeFiles/api_test_builds.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Support for HD Graphics 6000?

Is this planned to be supported? Alternatively, what work would need to be done to support it? We're looking at the i5-5250u because the HD Graphics 6000 seems to have good GFLOPS at a good price.

deconvolution is very slow

I think the prediction speed of clDNN is generally very good and it outperforms MKL on the same
processor for many operations I have tested. But the deconvolution operation seem to be very slow.
On Core i3-6100 and i5-6500 deconvolution takes approximately 40-50 times longer with clDNN than MKL in my tests. That is such a big difference that I don't think it is caused simply by lack of optimization.

See attached test case for details of how I measured it.
speed.zip

Missing include

kernel_selector/core/common/primitive_db.cpp is missing #include <stdexcept> and thus does not compile with VS 2019 due to undeclared std::runtime_error.

size_offset_stride_padding.html

The detailed description of the convolution class refers to a html file that seem to be missing.

"Look into docs/size_offset_stride_padding.html for description how size, offsets, stride & padding parameters work."

build fail

Build fail log are below. Solved by adding #include "cmath" in src/gpu/kernel.h

[patch]
diff --git a/src/gpu/kernel.h b/src/gpu/kernel.h
index 5a89e4e..b6ce0a5 100644
--- a/src/gpu/kernel.h
+++ b/src/gpu/kernel.h
@@ -25,6 +25,7 @@

#include
#include
+#include

namespace neural { namespace gpu {

[log]
[ 1%] Building CXX object src/CMakeFiles/clDNN_shlib.dir/network.cpp.o
In file included from /home/cv/cldnn/src/network.cpp:29:0:
/home/cv/cldnn/src/gpu/kernel.h: In function ‘std::string neural::gpu::to_code_string(T) [with T = float; std::string = std::basic_string]’:
/home/cv/cldnn/src/gpu/kernel.h:69:9: error: ‘isinf’ is not a member of ‘std’
if (std::isinf(val))
^
/home/cv/cldnn/src/gpu/kernel.h:70:61: error: ‘signbit’ is not a member of ‘std’
std::snprintf(buffer, sizeof(buffer), "%sINFINITY", std::signbit(val) ? "-" : "");
^
/home/cv/cldnn/src/gpu/kernel.h: In function ‘std::string neural::gpu::to_code_string(T) [with T = double; std::string = std::basic_string]’:
/home/cv/cldnn/src/gpu/kernel.h:80:9: error: ‘isinf’ is not a member of ‘std’
if (std::isinf(val))
^
/home/cv/cldnn/src/gpu/kernel.h:81:61: error: ‘signbit’ is not a member of ‘std’
std::snprintf(buffer, sizeof(buffer), "%sINFINITY", std::signbit(val) ? "-" : "");
^
make[2]: *** [src/CMakeFiles/clDNN_shlib.dir/network.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/clDNN_shlib.dir/all] Error 2
make: *** [all] Error 2

inception_v3 model inference with bad performance

Do you guys have perf data for classical models on clDNN?
I ran inception_v3 model with intel inference engine(with clDNN plugin),it takes 600+ms to complete, which is no better than tensorflow inference on CPU. Here is the my data:

InferenceEngine:
API version ............ 1.0
Build .................. 5852
[ INFO ] Parsing input parameters
[ INFO ] No extensions provided
[ INFO ] Loading plugin

API version ............ 0.1
Build .................. prod-02709
Description ....... clDNNPlugin

[ INFO ] Loading network files
[ INFO ] Preparing input blobs
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Start inference (50 iterations)

Average running time of one iteration: 624.855 ms

Perfomance counts:

InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 12069 cpu: 598 execType: GPU
InceptionV3/InceptionV3/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 15137 cpu: 526 execType: GPU
InceptionV3/InceptionV3/Conv2d_2a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 29440 cpu: 455 execType: GPU
InceptionV3/InceptionV3/Conv2d_2b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7561 cpu: 300 execType: GPU
InceptionV3/InceptionV3/Conv2d_3b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 33730 cpu: 230 execType: GPU
InceptionV3/InceptionV3/Conv2d_4a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/MaxPool_3a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 2311 cpu: 391 execType: GPU
InceptionV3/InceptionV3/MaxPool_5a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 1671 cpu: 179 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2375 cpu: 219 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 1834 cpu: 392 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5907 cpu: 306 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2413 cpu: 655 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3698 cpu: 568 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4711 cpu: 480 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2182 cpu: 819 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 1211 cpu: 779 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/concat:EXECUTED layerType: Concat realTime: 40 cpu: 570 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2914 cpu: 441 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2186 cpu: 623 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5912 cpu: 531 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2892 cpu: 294 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3685 cpu: 208 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4769 cpu: 697 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2813 cpu: 488 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2868 cpu: 382 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/concat:EXECUTED layerType: Concat realTime: 71 cpu: 167 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3122 cpu: 181 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2345 cpu: 358 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5956 cpu: 275 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3156 cpu: 657 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3694 cpu: 534 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4721 cpu: 453 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 3132 cpu: 812 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3102 cpu: 743 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/concat:EXECUTED layerType: Concat realTime: 71 cpu: 340 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 24400 cpu: 419 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3135 cpu: 220 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3665 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2318 cpu: 106 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_2/MaxPool_1a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 745 cpu: 289 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/concat:EXECUTED layerType: Concat realTime: 116 cpu: 325 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6320 cpu: 174 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4141 cpu: 338 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2883 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7669 cpu: 232 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4142 cpu: 164 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5154 cpu: 105 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3091 cpu: 526 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5169 cpu: 482 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4263 cpu: 429 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2095 cpu: 279 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6312 cpu: 223 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/concat:EXECUTED layerType: Concat realTime: 54 cpu: 457 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6532 cpu: 226 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5200 cpu: 384 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4348 cpu: 329 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 9439 cpu: 277 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5565 cpu: 223 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8005 cpu: 170 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4347 cpu: 115 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8015 cpu: 477 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5126 cpu: 441 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2143 cpu: 395 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6431 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/concat:EXECUTED layerType: Concat realTime: 74 cpu: 428 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6293 cpu: 213 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5170 cpu: 377 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4362 cpu: 326 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 9445 cpu: 266 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5196 cpu: 249 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7879 cpu: 198 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4372 cpu: 197 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7889 cpu: 151 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5217 cpu: 414 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2163 cpu: 365 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6348 cpu: 303 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/concat:EXECUTED layerType: Concat realTime: 72 cpu: 476 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6243 cpu: 249 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6411 cpu: 432 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6050 cpu: 356 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11285 cpu: 301 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6293 cpu: 324 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11011 cpu: 274 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6066 cpu: 210 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11170 cpu: 159 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6037 cpu: 106 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2175 cpu: 440 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6242 cpu: 380 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/concat:EXECUTED layerType: Concat realTime: 49 cpu: 98 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6297 cpu: 204 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3030 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6316 cpu: 474 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6024 cpu: 384 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11242 cpu: 323 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2440 cpu: 245 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_2/MaxPool_1a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 580 cpu: 509 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/concat:EXECUTED layerType: Concat realTime: 48 cpu: 419 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3312 cpu: 125 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4425 cpu: 315 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2448 cpu: 233 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4386 cpu: 272 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/concat:EXECUTED layerType: Concat realTime: 26 cpu: 171 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4623 cpu: 301 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8054 cpu: 247 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2454 cpu: 140 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4342 cpu: 191 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/concat:EXECUTED layerType: Concat realTime: 28 cpu: 379 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 1040 cpu: 369 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2452 cpu: 351 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/concat:EXECUTED layerType: Concat realTime: 18 cpu: 324 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5353 cpu: 302 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6703 cpu: 264 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2454 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4341 cpu: 205 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/concat:EXECUTED layerType: Concat realTime: 27 cpu: 342 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7212 cpu: 158 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7973 cpu: 112 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2475 cpu: 372 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4524 cpu: 399 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/concat:EXECUTED layerType: Concat realTime: 26 cpu: 314 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 1547 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3882 cpu: 224 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/concat:EXECUTED layerType: Concat realTime: 17 cpu: 187 execType: GPU
InceptionV3/Logits/AvgPool_1a_8x8/AvgPool:EXECUTED layerType: Pooling realTime: 565 cpu: 149 execType: GPU
InceptionV3/Logits/Conv2d_1c_1x1/convolution:EXECUTED layerType: Convolution realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Logits/SpatialSqueeze:EXECUTED layerType: Reshape realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Predictions/Reshape:EXECUTED layerType: Reshape realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Predictions/Reshape_1:EXECUTED layerType: Reshape realTime: 32 cpu: 798 execType: GPU
InceptionV3/Predictions/Reshape_1_cldnn_output_postprocess:EXECUTED layerType: Reorder realTime: 6 cpu: 784 execType: GPU
InceptionV3/Predictions/Softmax:EXECUTED layerType: SoftMax realTime: 32 cpu: 798 execType: GPU
input_cldnn_input_preprocess: EXECUTED layerType: Reorder realTime: 1211 cpu: 656 execType: GPU
scale: NOT_RUN layerType: Power realTime: 0 cpu: 0 execType: None
Total time: 645521 microseconds
[ INFO ] Processing output blobs

Top 10 results:

Image ./grace_hopper_299.bmp

715 1.0000000 label #715
111 0.0000000 label #111
711 0.0000000 label #711
917 0.0000000 label #917
949 0.0000000 label #949
503 0.0000000 label #503
983 0.0000000 label #983
853 0.0000000 label #853
35 0.0000000 label #35
615 0.0000000 label #615

[ INFO ] Execution successfull

Linking library

I want to build the example code by linking the libclDNN64.so. How do I build it?

"$ make tests" failed.

After running command "$ make tests", 8 tests are shown as failed. Also showing error:

tests/CMakeFiles/tests.dir/build.make:867: recipe for target 'build/out/Linux64/Debug/tests64' failed

What could be the reason?
Log:

[----------] Global test environment tear-down
[==========] 525 tests from 89 test cases ran. (151136 ms total)
[ PASSED ] 517 tests.
[ FAILED ] 8 tests, listed below:
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz2x2_in2x2x1x2_bfyx_stride2_pad1_fwd_backw
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz1x1_in1x2x5x5_bfyx_stride2_pad1
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz2x2_in32x1x2x2_yxfb_stride1
[ FAILED ] memory_pool.basic_non_padded_relu_pipe
[ FAILED ] memory_pool.basic_non_padded_relu_and_pooling_pipe
[ FAILED ] memory_pool.multi_outputs_network
[ FAILED ] memory_pool.shared_mem_pool_same_topology_twice
[ FAILED ] memory_pool.shared_mem_pool_same_topology_twice_weights

8 FAILED TESTS
YOU HAVE 17245 DISABLED TESTS

tests/CMakeFiles/tests.dir/build.make:867: recipe for target 'build/out/Linux64/Debug/tests64' failed
make[3]: *** [build/out/Linux64/Debug/tests64] Error 1
make[3]: *** Deleting file 'build/out/Linux64/Debug/tests64'
CMakeFiles/Makefile2:202: recipe for target 'tests/CMakeFiles/tests.dir/all' failed
make[2]: *** [tests/CMakeFiles/tests.dir/all] Error 2
CMakeFiles/Makefile2:214: recipe for target 'tests/CMakeFiles/tests.dir/rule' failed
make[1]: *** [tests/CMakeFiles/tests.dir/rule] Error 2
Makefile:190: recipe for target 'tests' failed
make: *** [tests] Error 2

Execute tests error

Hello,
my environment of hardware:
[ OK ] Processor name: Intel(R) Xeon(R) CPU E3-1585 v5 @ 3.50GHz
[ INFO ] Intel Processor
[ INFO ] Processor brand: Xeon
[ INFO ] Processor arch: Skylake

OS readiness checks:
[ INFO ] GPU PCI id : 193A
[ INFO ] GPU description: SKL SRV GT4e
[ OK ] GPU visible to OS
[ INFO ] no nomodeset in GRUB cmdline (good)
[ INFO ] Linux distro : Ubuntu 16.04
[ INFO ] Linux kernel : 4.13.0-32-generic
[ INFO ] glibc version : 2.23
[ INFO ] Linux distro suitable for Generic install
[ INFO ] gcc version : 20160609 (>=4.8.2 suggested)

Media Server Studio Install:
[ OK ] user in video group
[ ERROR ] libva.so.1 not found. Check LD_LIBRARY_PATH contains '/usr/lib64;/usr/local/lib'
[ ERROR ] libva not loading Intel iHD
[ ERROR ] vainfo not reporting codec entry points
[ INFO ] i915 driver in use by Intel video adapter
[ ERROR ] no libva include files. Are Intel components installed?

Component Smoke Tests:
[ ERROR ] no Media SDK include files. Are Intel components installed?
[ OK ] OpenCL check:platform:Intel(R) OpenCL GPU OK CPU OK
platform:Experimental OpenCL 2.1 CPU Only Platform GPU OK CPU OK

  When I execute the tests and tutorial, i get this following error:
  terminate called after throwing an instance of 'cldnn::error'
  what():  failed to create engine: Device lookup failed - unsupported device id: 0x193A. Note: HD5xx+ 
  devices are supported
  Aborted (core dumped)  
  tests/CMakeFiles/tests.dir/build.make:904: recipe for target 'out/Linux64/Release/tests64' failed 
  make[2]: *** [out/Linux64/Release/tests64] Error 134 
  make[2]: *** Deleting file 'out/Linux64/Release/tests64'
  CMakeFiles/Makefile2:197: recipe for target 'tests/CMakeFiles/tests.dir/all' failed  
  make[1]: *** [tests/CMakeFiles/tests.dir/all] Error 2 
  Makefile:129: recipe for target 'all' failed
  make: *** [all] Error 2

Thank you very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.