intel / cldnn Goto Github PK
View Code? Open in Web Editor NEWCompute Library for Deep Neural Networks (clDNN)
Home Page: https://01.org/cldnn
Compute Library for Deep Neural Networks (clDNN)
Home Page: https://01.org/cldnn
Hi,
This unit test worked fine with Drop 3.0 but doesn't work with Drop 5.0.
If I remove the optimize_data build option it works on Drop 5.0 too.
I don't know if this is a problem with my test or with clDNN.
I am running this test on Core i3-6100.
Getting error when I do "make" -
cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
Any help? I'm running on Ubunutu 16.04
In file included from /home/ae/Documents/clDNN/src/fully_connected.cpp:18:0:
/home/ae/Documents/clDNN/src/include/fully_connected_inst.h:50:28: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
const bool bias_term() const { return !argument.bias.empty(); }
^
cc1plus: all warnings being treated as errors
src/CMakeFiles/clDNN_shlib.dir/build.make:498: recipe for target 'src/CMakeFiles/clDNN_shlib.dir/fully_connected.cpp.o' failed
make[2]: *** [src/CMakeFiles/clDNN_shlib.dir/fully_connected.cpp.o] Error 1
CMakeFiles/Makefile2:85: recipe for target 'src/CMakeFiles/clDNN_shlib.dir/all' failed
make[1]: *** [src/CMakeFiles/clDNN_shlib.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
ae@hped800nuc2:~/Documents/clDNN/build$ cmake --version
cmake version 3.8.0
The detailed description of the convolution class refers to a html file that seem to be missing.
"Look into docs/size_offset_stride_padding.html for description how size, offsets, stride & padding parameters work."
hi,
I could not find how to run it on CPU only. is this possible?
I have on ly high end CPU and do not have GPU.
Thanks for your support
Best Regards
Mazda
0.14 is out. we should git tag it.
Hi, I noticed that you have implemented several depthwise convolution related cl kernels(see below). But I didn't see any related case to use these kernel.
Also, you mentioned the validated topologies include ": AlexNet*, VGG(16,19), GoogleNet(v1,v2,v3), ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.", but not include mobilenet.
So does it mean the depthwise convolution cl kernels haven't been verified yet? If not, do you have any samples to share?
depthwise convolution related cl kernels:
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_f16_depthwise.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_depthwise_weights_lwg.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_byxf_af32_depthwise.cl
Is this OpenCL implementation supported on Intel FPGAs?
A few lines (38, 47, and a few others) in the source file 'convolution.cpp' are formatted using the tab character (with a tab length of 4). If they are viewed with a tab length of 8 they look incorrectly indented and trigger errors like this one in gcc:
/home/franco/cldnn/src/convolution.cpp:44:5: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation] if (kernel_xy.size() != 2)
Replacing all the tabs with 4 spaces (with something like s/\t/ /g) fixes the problem and the code compiles successfully.
Hi,
now that some optimized inference DNN libs like: NV TensorRT, Windows WinML and Qualcomm Snapdragon Neural Processing Engine (NPE) SDK support loading ONNX models (or whatever format like tensorflow etc.. for ONNX seems the most commonly/broadly supported) for simplicity would be nice if clDNN supports that also..
seems like a simple mnist sample should be much shorter than :
#include <api/CPP/memory.hpp>
#include <api/CPP/topology.hpp>
#include <api/CPP/reorder.hpp>
#include <api/CPP/input_layout.hpp>
#include <api/CPP/convolution.hpp>
#include <api/CPP/data.hpp>
#include <api/CPP/pooling.hpp>
#include <api/CPP/fully_connected.hpp>
#include <api/CPP/softmax.hpp>
#include <api/CPP/engine.hpp>
#include <api/CPP/network.hpp>
#include
using namespace cldnn;
using namespace std;
const tensor::value_type
input_channels = 1,
input_size = 28,
conv1_out_channels = 20,
conv2_out_channels = 50,
conv_krnl_size = 5,
fc1_num_outs = 500,
fc2_num_outs = 10;
// Create layout with same sizes but new format.
layout create_reordering_layout(format new_format, const layout& src_layout)
{
return { src_layout.data_type, new_format, src_layout.size };
}
// Create MNIST topology
topology create_topology(const layout& in_layout, const memory& conv1_weights_mem, const memory& conv1_bias_mem )
{
auto data_type = in_layout.data_type;
// Create input_layout description
// "input" - is the primitive id inside topology
input_layout input("input", in_layout);
// Create topology object with 2 primitives
cldnn:: topology topology(
// 1. input layout primitive.
input,
// 2. reorder primitive with id "reorder_input"
reorder("reorder_input",
// input primitive for reorder (implicitly converted to primitive_id)
input,
// output layout for reorder
create_reordering_layout(format::yxfb, in_layout))
);
// Create data primitive - its content should be set already.
cldnn::data conv1_weights( "conv1_weights", conv1_weights_mem );
// Add primitive to topology
topology.add(conv1_weights);
// Emplace new primitive to topology
topology.addcldnn::data({ "conv1_bias", conv1_bias_mem });
// Emplace 2 primitives
topology.add(
// Convolution primitive with id "conv1"
convolution("conv1",
"reorder_input", // primitive id of the convolution's input
{ conv1_weights }, // weights primitive id is taken from the object
{ "conv1_bias" } // bias primitive id
),
// Pooling id: "pool1"
pooling("pool1",
"conv1", // Input: "conv1"
pooling_mode::max, // Pooling mode: MAX
spatial(2,2), // stride: 2
spatial(2,2) // kernel_size: 2
)
);
// Conv2 weights data is not available now, so just declare its layout
layout conv2_weights_layout(data_type, format::bfyx,{ conv2_out_channels, conv1_out_channels, conv_krnl_size, conv_krnl_size });
// Define the rest of topology.
topology.add(
// Input layout for conv2 weights. Data will passed by network::set_input_data()
input_layout("conv2_weights", conv2_weights_layout),
// Input layout for conv2 bias.
input_layout("conv2_bias", { data_type, format::bfyx, spatial(conv2_out_channels) }),
// Second convolution id: "conv2"
convolution("conv2",
"pool1", // Input: "pool1"
{ "conv2_weights" }, // Weights: input_layout "conv2_weights"
{ "conv2_bias" } // Bias: input_layout "conv2_bias"
),
// Second pooling id: "pool2"
pooling("pool2",
"conv2", // Input: "conv2"
pooling_mode::max, // Pooling mode: MAX
spatial(2, 2), // stride: 2
spatial(2, 2) // kernel_size: 2
),
// Fully connected (inner product) primitive id "fc1"
fully_connected("fc1",
"pool2", // Input: "pool2"
"fc1_weights", // "fc1_weights" will be added to the topology later
"fc1_bias", // will be defined later
true // Use built-in Relu. Slope is set to 0 by default.
),
// Second FC/IP primitive id: "fc2", input: "fc1".
// Weights ("fc2_weights") and biases ("fc2_bias") will be defined later.
// Built-in Relu is disabled by default.
fully_connected("fc2", "fc1", "fc2_weights", "fc2_bias"),
// The "softmax" primitive is not an input for any other,
// so it will be automatically added to network outputs.
softmax("softmax", "fc2")
);
return topology;
}
// Copy from a vector to cldnn::memory
void copy_to_memory(memory& mem, const vector& src)
{
cldnn::pointer dst(mem);
std::copy(src.begin(), src.end(), dst.begin());
}
// Execute network
int recognize_image(network& network, const memory& input_memory)
{
// Set/update network input
network.set_input_data("input", input_memory);
// Start network execution
auto outputs = network.execute();
// get_memory() blocks output generation completed
auto output = outputs.at("softmax").get_memory();
// Get direct access to output memory
cldnn::pointer out_ptr(output);
// Analyze result
auto max_element_pos = max_element(out_ptr.begin(), out_ptr.end());
return static_cast(distance(out_ptr.begin(), max_element_pos));
}
// User-defined helpers which are out of this example scope
// //////////////////////////////////////////////////////////////
// Loads file to a vector of floats.
vector load_data(const string&) { return{ 0 }; }
// Allocates memory and loads data from file.
// Memory layout is taken from file.
memory load_mem(const engine& eng, const string&) {
//return a dummy value
return memory::allocate(eng, layout{ data_types::f32, format::bfyx, { 1, 1, 1, 1 } });
}
// Load image, resize to [x,y] and store in a vector of floats
// in the order "bfyx".
vector load_image_bfyx(const string&, int, int) { return{ 0 }; }
// //////////////////////////////////////////////////////////////
int main()
{
// Use data type: float
auto data_type = type_to_data_type::value;
// Network input layout
layout in_layout(
data_type, // stored data type
format::bfyx, // data stored in order batch-channel-Y-X, where X coordinate changes first.
{1, input_channels, input_size, input_size} // batch: 1, channels: 1, Y: 28, X: 28
);
// Create memory for conv1 weights
layout conv1_weights_layout(data_type, format::bfyx,{ conv1_out_channels, input_channels, conv_krnl_size, conv_krnl_size });
vector my_own_buffer = load_data("conv1_weights.bin");
// The conv1_weights_mem is attached to my_own_buffer, so my_own_buffer should not be changed or descroyed until network execution completion.
auto conv1_weights_mem = memory::attach(conv1_weights_layout, my_own_buffer.data(), my_own_buffer.size());
// Create default engine
cldnn::engine engine;
// Create memory for conv1 bias
layout conv1_bias_layout(data_type, format::bfyx, spatial(20));
// Memory allocation requires engine
auto conv1_bias_mem = memory::allocate(engine, conv1_bias_layout);
// The memory is allocated by library, so we do not need to care about buffer lifetime.
copy_to_memory(conv1_bias_mem, load_data("conv1_bias.bin"));
// Get new topology
cldnn::topology topology = create_topology(in_layout, conv1_weights_mem, conv1_bias_mem);
// Define network data not defined in create_topology()
topology.add(
cldnn::data("fc1_weights", load_mem(engine, "fc1_weights.data")),
cldnn::data("fc1_bias", load_mem(engine, "fc1_bias.data")),
cldnn::data("fc2_weights", load_mem(engine, "fc2_weights.data")),
cldnn::data("fc2_bias", load_mem(engine, "fc2_bias.data"))
);
// Build the network. Allow implicit data optimizations.
// The "softmax" primitive is not used as an input for other primitives,
// so we do not need to explicitly select it in build_options::outputs()
cldnn::network network(engine, topology, { build_option::optimize_data(true) });
// Set network data which was not known at topology creation.
network.set_input_data("conv2_weights", load_mem(engine, "conv2_weights.data"));
network.set_input_data("conv2_bias", load_mem(engine, "conv2_bias.data"));
// Allocate memory for input image.
auto input_memory = memory::allocate(engine, in_layout);
// Run network 2 times with different images.
for (auto img_name : { "one.jpg", "two.jpg" })
{
// Reuse image memory.
copy_to_memory(input_memory, load_image_bfyx("one.jpg", in_layout.size.spatial[0], in_layout.size.spatial[1]));
auto result = recognize_image(network, input_memory);
cout << img_name << " recognized as" << result << endl;
}
return 0;
My GPU is HD630 but the list in
https://github.com/intel/clDNN/blob/master/src/caps/public/gpu_devices.inc
shows it is a HD620
( GEN_DEVICE(HD620, 0x3E92, HD6XX, GEN9, GT2 )
Also add 0x3E9B (as also reported in comment #47 )
I think it is better to review the complete list.
Cheers,
Nikos
Hi guys,
Thank you for the great work to bring Intel GPU into DNN. I applauded this move to level up OpenCL into more DNN applications.
I'm looking for Deep Q-Network (DQN and Double DQN) where user could connect and link to Atari Learning Environment (ALE) in clDNN but couldn't find any similar example. By any chance if it is not available, would you consider to add this example in as ALE is a critical application in my DNN research and I would like to run DNN in Intel GPUs.
Greetings Developers,
Does there exist plans to optimize for Xeon Phi Coprocessors?
Is it possible to execute and test using MIC hardware?
Thanks,
Coast
Hi, cldnner, is there any in-progress work or plan for onnxifi support?
I want to build the example code by linking the libclDNN64.so. How do I build it?
Is this planned to be supported? Alternatively, what work would need to be done to support it? We're looking at the i5-5250u because the HD Graphics 6000 seems to have good GFLOPS at a good price.
The variable img_name
is not used correctly but always "one.jpg"
is used. There is the same problem in https://github.com/01org/clDNN/blob/6f5c9f231ae720c106670a153ab60469d5a6ff2f/tutorial/example_cldnn.cpp#L213.
Following link at the home page of this project is broken
https://01org.github.io/clDNN/index.html
Hi,
I'm trying to build the project on ClearLinux OS here are my environment details:
CMake version: 3.13.3
GCC version: gcc (Clear Linux OS for Intel Architecture) 8.2.1 20180502
Errors:
[ 53%] Built target api_test_builds
Scanning dependencies of target clDNN_shlib
[ 53%] Building CXX object src/CMakeFiles/clDNN_shlib.dir/graph_optimizer/add_required_reorders.cpp.o
In file included from /home/daniel/cldnn/src/include/layout_optimizer.h:31,
from /home/daniel/cldnn/src/include/pass_manager.h:21,
from /home/daniel/cldnn/src/graph_optimizer/add_required_reorders.cpp:21:
/home/daniel/cldnn/src/include/generic_layer.hpp: In constructor ‘cldnn::generic_layer::generic_layer(const dto*)’:
/home/daniel/cldnn/src/include/generic_layer.hpp:64:111: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
, generic_params(*static_cast<const kernel_selector::generic_kernel_params* const>(dto->generic_params))
^
cc1plus: all warnings being treated as errors
make[3]: *** [src/CMakeFiles/clDNN_shlib.dir/build.make:63: src/CMakeFiles/clDNN_shlib.dir/graph_optimizer/add_required_reorders.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:92: src/CMakeFiles/clDNN_shlib.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:215: tests/CMakeFiles/tests.dir/rule] Error 2
make: *** [Makefile:190: tests] Error 2
any help will be deeply appreciated
Hi, I noticed that you have implemented several depthwise convolution related cl kernels(see below). But I didn't see any related case to use these kernel.
Also, you mentioned the validated topologies include ": AlexNet*, VGG(16,19), GoogleNet(v1,v2,v3), ResNet(50,101,152)* Faster R-CNN*, Squeezenet*, SSD_googlenet*, SSD_VGG*, PVANET*, PVANET_REID*, age_gender*, FCN* and yolo*.", but not include mobilenet.
So does it mean the depthwise convolution cl kernels haven't been verified yet? If not, do you have any samples to share?
depthwise convolution related cl kernels:
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_f16_depthwise.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_depthwise_weights_lwg.cl
https://github.com/intel/clDNN/blob/master/kernel_selector/core/cl_kernels/convolution_gpu_byxf_af32_depthwise.cl
Is it possible to compile distributed opencl kernels into single bit stream file on FPGA? Any plans for extending this repo for intel FPGAs?
The formula for finding the output blockwidth in the ConvolutionKernel_bfyx_os_iyx_osv16.cpp file is shown.
if (cp.stride.x == 1 && cp.stride.y == 1)
{
if (cp.filterSize.x == 1 && cp.filterSize.y == 1)
{
option.blockWidth = 16;
option.blockHeight = 1;
option.prefetch = 4;
}
//if less than 16 values is required to compute one single row of output
//then each WI shall compute one single row to maximize reuse within SIMD subgroup (this gives very nice performance results)
else if (params.output.X().v + (cp.filterSize.x - 1)*cp.dilation.x < sub_group_size)
{
option.blockWidth = params.output.X().v;
option.blockHeight = 1;
option.prefetch = 4;
}
else if (cp.filterSize.x < 5 && cp.filterSize.y < 5)
{
option.blockWidth = sub_group_size - cp.filterSize.x + 1;
option.blockHeight = 2;
option.prefetch = 4;
}
else
{
option.blockWidth = 4;
option.blockHeight = 3;
option.prefetch = 4;
}
}
else if (cp.stride.x == 2 && cp.stride.y == 2)
{
option.blockWidth = 5;
option.blockHeight = 4;
option.prefetch = 4;
}
else
{
option.blockWidth = 4;
option.blockHeight = 3;
option.prefetch = 5;
//run_info.effiency = FORCE_PRIORITY_7; // GEMM is better
}
.
I wonder why the output blockWidth is 4 if the stride size is greater than 2. How can I calculate the output width?
Hello,
my environment of hardware:
[ OK ] Processor name: Intel(R) Xeon(R) CPU E3-1585 v5 @ 3.50GHz
[ INFO ] Intel Processor
[ INFO ] Processor brand: Xeon
[ INFO ] Processor arch: Skylake
OS readiness checks:
[ INFO ] GPU PCI id : 193A
[ INFO ] GPU description: SKL SRV GT4e
[ OK ] GPU visible to OS
[ INFO ] no nomodeset in GRUB cmdline (good)
[ INFO ] Linux distro : Ubuntu 16.04
[ INFO ] Linux kernel : 4.13.0-32-generic
[ INFO ] glibc version : 2.23
[ INFO ] Linux distro suitable for Generic install
[ INFO ] gcc version : 20160609 (>=4.8.2 suggested)
Media Server Studio Install:
[ OK ] user in video group
[ ERROR ] libva.so.1 not found. Check LD_LIBRARY_PATH contains '/usr/lib64;/usr/local/lib'
[ ERROR ] libva not loading Intel iHD
[ ERROR ] vainfo not reporting codec entry points
[ INFO ] i915 driver in use by Intel video adapter
[ ERROR ] no libva include files. Are Intel components installed?
Component Smoke Tests:
[ ERROR ] no Media SDK include files. Are Intel components installed?
[ OK ] OpenCL check:platform:Intel(R) OpenCL GPU OK CPU OK
platform:Experimental OpenCL 2.1 CPU Only Platform GPU OK CPU OK
When I execute the tests and tutorial, i get this following error:
terminate called after throwing an instance of 'cldnn::error'
what(): failed to create engine: Device lookup failed - unsupported device id: 0x193A. Note: HD5xx+
devices are supported
Aborted (core dumped)
tests/CMakeFiles/tests.dir/build.make:904: recipe for target 'out/Linux64/Release/tests64' failed
make[2]: *** [out/Linux64/Release/tests64] Error 134
make[2]: *** Deleting file 'out/Linux64/Release/tests64'
CMakeFiles/Makefile2:197: recipe for target 'tests/CMakeFiles/tests.dir/all' failed
make[1]: *** [tests/CMakeFiles/tests.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2
Thank you very much
If I have googlenet.prototxt, how can I convert to clDNN?
Thanks a lot.
Running openvino on Intel i5 9600k UHD graphics 630
But get "[ ERROR ] failed to create engine: Device lookup failed - unsupported device id: 0x3E98. Note: HD5xx+ devices are supported"
Is this error related to driver version?
I would like to use OpenVINO at corei7 8750H.
I added Device ID (0x3E9B) to gpu_devices.inc
cldnn's TESTS passed
However, even if you replace cldnn64.dll of OpenVINO, the program can not be executed
(Reference URL)
https://ark.intel.com/ja/products/134906/Intel-Core-i7-8750H-Processor-9M-Cache-up-to-4-10-GHz-
Hello,
I run the Chapter 5 in tutorial and get each kernel's timing. But the timing should be not correct.
fc:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
fc_bias:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
fc_weights:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
input:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
relu:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
softmax:
submission:0nanoseconds
starting:0nanoseconds
executing:0nanoseconds
My environment is Ubuntu 14.04 with Beignet driver.
The device information is as the following.
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 beignet 1.3 (git-8bd8c3a)
Platform Name: Intel Gen OCL Driver
Platform Vendor: Intel
Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_shortPlatform Name: Intel Gen OCL Driver
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 72
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 8
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 1000Mhz
Address bits: 32
Max memory allocation: 3221225472
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 8192
Max image 3D height: 8192
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 8192
Global memory size: 4294967296
Constant buffer size: 134217728
Max number of constant args: 8
Local memory type: Local
Local memory size: 65536
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fc8d642ebe0
Name: Intel(R) HD Graphics Skylake Server GT4
Vendor: Intel
Device OpenCL C version: OpenCL C 1.2 beignet 1.3 (git-8bd8c3a)
Driver version: 1.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 beignet 1.3 (git-8bd8c3a)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_fp16
Could you identify the issue and give me some suggestions?
Best Regards
I am using the following variables to enable some tests:
-DCLDNN__RUN_TESTS:BOOL=ON -DCLDNN__INCLUDE_TESTS:BOOL=ON
About 125 tests pass, but there is a warning that more than 1000 tests have been disabled. What is the reason?
Build fail log are below. Solved by adding #include "cmath" in src/gpu/kernel.h
[patch]
diff --git a/src/gpu/kernel.h b/src/gpu/kernel.h
index 5a89e4e..b6ce0a5 100644
--- a/src/gpu/kernel.h
+++ b/src/gpu/kernel.h
@@ -25,6 +25,7 @@
#include
#include
+#include
namespace neural { namespace gpu {
[log]
[ 1%] Building CXX object src/CMakeFiles/clDNN_shlib.dir/network.cpp.o
In file included from /home/cv/cldnn/src/network.cpp:29:0:
/home/cv/cldnn/src/gpu/kernel.h: In function ‘std::string neural::gpu::to_code_string(T) [with T = float; std::string = std::basic_string]’:
/home/cv/cldnn/src/gpu/kernel.h:69:9: error: ‘isinf’ is not a member of ‘std’
if (std::isinf(val))
^
/home/cv/cldnn/src/gpu/kernel.h:70:61: error: ‘signbit’ is not a member of ‘std’
std::snprintf(buffer, sizeof(buffer), "%sINFINITY", std::signbit(val) ? "-" : "");
^
/home/cv/cldnn/src/gpu/kernel.h: In function ‘std::string neural::gpu::to_code_string(T) [with T = double; std::string = std::basic_string]’:
/home/cv/cldnn/src/gpu/kernel.h:80:9: error: ‘isinf’ is not a member of ‘std’
if (std::isinf(val))
^
/home/cv/cldnn/src/gpu/kernel.h:81:61: error: ‘signbit’ is not a member of ‘std’
std::snprintf(buffer, sizeof(buffer), "%sINFINITY", std::signbit(val) ? "-" : "");
^
make[2]: *** [src/CMakeFiles/clDNN_shlib.dir/network.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/clDNN_shlib.dir/all] Error 2
make: *** [all] Error 2
It seems like the "System Requirements" on the main github page is a bit misleading. It says that clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for Skylake and Apollolake, when in fact it does not support anything older than gen 5.
The following exception is thrown when attempting to create an engine on an i7-4712HQ:
Device lookup failed - unsupported device id: 0x416. Note: HD5xx+ devices are supported
Are there any plans to support HD4 and older ?
After running this command to build:
cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
the following error occur, anyone have any idea to solve this? The log is at the bottom.
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp: In member function ‘KernelSelector::Tensor::DataTensor KernelSelector::Tensor::DataTensor::FlattenFeatureAndSpatials() const’:
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:128:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:129:13: note: here
case Tensor::bfyx:
^~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:137:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:138:13: note: here
case Tensor::byxf:
^~~~
cc1plus: all warnings being treated as errors
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o] Error 1
make[1]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [all] Error 2
----------------------------------------------------Log-----------------------------------------------------------
cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
-- The C compiler identification is GNU 7.2.1
-- The CXX compiler identification is GNU 7.2.1
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc
-- Check for working C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++
-- Check for working CXX compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.5", minimum required is "2.7")
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.3.8.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "/home/up2/cldnn/build/out/Linux64/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "/home/up2/cldnn/build/out/Linux64/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Linux64 (detected: Linux64)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: /home/up2/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/include
-- [clDNN] + Static libs: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/lib/x64
-- [clDNN] + Shared libs: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN] + Libs to link: /home/up2/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: /home/up2/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: /home/up2/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: /home/up2/cldnn/common/boost/1.64.0/linux/x64/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- [clDNN] Selected capabilities: public
-- Configuring done
-- Generating done
-- Build files have been written to: /home/up2/cldnn/build
[ 0%] Generating ks_primitive_db.inc ...
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/activation_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/concatenation_gpu_depth_bfyx_no_pitch.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/concatenation_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_1x1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_1x1_hgemm_buf_16x1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_3x3_dw_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_direct_10_12_16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_direct_8_8_16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_gemm_like_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_gemm_like_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_os_iyx_osv16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_bfyx_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_winograd_2x3_s1_fused.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b16_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b16_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b1_block_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b1_block_multiple_x_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_gpu_yxfb_yxio_b8_fp32.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/convolution_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/deconvolution_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/eltwise_simple_vload8.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_gemm.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_input_spatial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bf_io_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bfyx_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv16_af8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv16_b1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_bs_f_bsv8_af8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_b8_f8.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_b8_f8_vload.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_block_fp16.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_oi_b8_fp32_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_oi_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_image_tutorial.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/fully_connected_gpu_yxfb_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/generic_eltwise_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_multiple_features.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_across_channel_yxfb_b8_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_within_channel.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_gpu_within_channel_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/lrn_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/normalize_gpu_across_spatial_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/normalize_gpu_within_spatial_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/permute_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_average_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_bfyx_block_opt.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/pooling_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/region_yolo_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_data.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_data_fast_b1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_from_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_to_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights_image_2d_c4_fyx_b.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorder_weights_winograd_2x3_s1.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reorg_yolo_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/reshape_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/roi_pooling_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_bf.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_fb.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_items_class_optimized.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/softmax_gpu_ref.cl
processing /home/up2/cldnn/kernel_selector/core/cl_kernels/upsampling_ref.cl
[ 1%] Updating file if the file changed (ks_primitive_db.inc) ...
Scanning dependencies of target cldnn_kernel_selector
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/auto_tuner.cpp.o
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/auto_tuner_offline.cpp.o
[ 2%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_base.cpp.o
[ 3%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector.cpp.o
[ 3%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector_common.cpp.o
[ 4%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/kernel_selector_params.cpp.o
[ 4%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp: In member function ‘KernelSelector::Tensor::DataTensor KernelSelector::Tensor::DataTensor::FlattenFeatureAndSpatials() const’:
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:128:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:129:13: note: here
case Tensor::bfyx:
^~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:137:30: error: this statement may fall through [-Werror=implicit-fallthrough=]
targetLayout = Tensor::fb;
~~~~~~~~~~~~~^~~~~~~~
/home/up2/cldnn/kernel_selector/common/tensor_type.cpp:138:13: note: here
case Tensor::byxf:
^~~~
cc1plus: all warnings being treated as errors
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/common/tensor_type.cpp.o] Error 1
make[1]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [all] Error 2
Do you guys have perf data for classical models on clDNN?
I ran inception_v3 model with intel inference engine(with clDNN plugin),it takes 600+ms to complete, which is no better than tensorflow inference on CPU. Here is the my data:
InferenceEngine:
API version ............ 1.0
Build .................. 5852
[ INFO ] Parsing input parameters
[ INFO ] No extensions provided
[ INFO ] Loading plugin
API version ............ 0.1
Build .................. prod-02709
Description ....... clDNNPlugin
[ INFO ] Loading network files
[ INFO ] Preparing input blobs
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Start inference (50 iterations)
Average running time of one iteration: 624.855 ms
Perfomance counts:
InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 12069 cpu: 598 execType: GPU
InceptionV3/InceptionV3/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_2a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 15137 cpu: 526 execType: GPU
InceptionV3/InceptionV3/Conv2d_2a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_2b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 29440 cpu: 455 execType: GPU
InceptionV3/InceptionV3/Conv2d_2b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_3b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7561 cpu: 300 execType: GPU
InceptionV3/InceptionV3/Conv2d_3b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Conv2d_4a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 33730 cpu: 230 execType: GPU
InceptionV3/InceptionV3/Conv2d_4a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/MaxPool_3a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 2311 cpu: 391 execType: GPU
InceptionV3/InceptionV3/MaxPool_5a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 1671 cpu: 179 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2375 cpu: 219 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 1834 cpu: 392 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5907 cpu: 306 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_1/Conv2d_0b_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2413 cpu: 655 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3698 cpu: 568 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4711 cpu: 480 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2182 cpu: 819 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 1211 cpu: 779 execType: GPU
InceptionV3/InceptionV3/Mixed_5b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5b/concat:EXECUTED layerType: Concat realTime: 40 cpu: 570 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2914 cpu: 441 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2186 cpu: 623 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5912 cpu: 531 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_1/Conv_1_0c_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2892 cpu: 294 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3685 cpu: 208 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4769 cpu: 697 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2813 cpu: 488 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2868 cpu: 382 execType: GPU
InceptionV3/InceptionV3/Mixed_5c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5c/concat:EXECUTED layerType: Concat realTime: 71 cpu: 167 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3122 cpu: 181 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2345 cpu: 358 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5956 cpu: 275 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_1/Conv2d_0b_5x5/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3156 cpu: 657 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3694 cpu: 534 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4721 cpu: 453 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_2/Conv2d_0c_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 3132 cpu: 812 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3102 cpu: 743 execType: GPU
InceptionV3/InceptionV3/Mixed_5d/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_5d/concat:EXECUTED layerType: Concat realTime: 71 cpu: 340 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 24400 cpu: 419 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_0/Conv2d_1a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3135 cpu: 220 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3665 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2318 cpu: 106 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/Branch_1/Conv2d_1a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6a/Branch_2/MaxPool_1a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 745 cpu: 289 execType: GPU
InceptionV3/InceptionV3/Mixed_6a/concat:EXECUTED layerType: Concat realTime: 116 cpu: 325 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6320 cpu: 174 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4141 cpu: 338 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2883 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7669 cpu: 232 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4142 cpu: 164 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5154 cpu: 105 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3091 cpu: 526 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5169 cpu: 482 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4263 cpu: 429 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2095 cpu: 279 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6312 cpu: 223 execType: GPU
InceptionV3/InceptionV3/Mixed_6b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6b/concat:EXECUTED layerType: Concat realTime: 54 cpu: 457 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6532 cpu: 226 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5200 cpu: 384 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4348 cpu: 329 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 9439 cpu: 277 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5565 cpu: 223 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8005 cpu: 170 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4347 cpu: 115 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8015 cpu: 477 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5126 cpu: 441 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2143 cpu: 395 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6431 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_6c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6c/concat:EXECUTED layerType: Concat realTime: 74 cpu: 428 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6293 cpu: 213 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5170 cpu: 377 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4362 cpu: 326 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 9445 cpu: 266 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5196 cpu: 249 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7879 cpu: 198 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4372 cpu: 197 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7889 cpu: 151 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5217 cpu: 414 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2163 cpu: 365 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6348 cpu: 303 execType: GPU
InceptionV3/InceptionV3/Mixed_6d/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6d/concat:EXECUTED layerType: Concat realTime: 72 cpu: 476 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6243 cpu: 249 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6411 cpu: 432 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6050 cpu: 356 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11285 cpu: 301 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6293 cpu: 324 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11011 cpu: 274 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0b_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6066 cpu: 210 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0c_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11170 cpu: 159 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0d_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6037 cpu: 106 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_2/Conv2d_0e_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 2175 cpu: 440 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6242 cpu: 380 execType: GPU
InceptionV3/InceptionV3/Mixed_6e/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_6e/concat:EXECUTED layerType: Concat realTime: 49 cpu: 98 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6297 cpu: 204 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3030 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_0/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6316 cpu: 474 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6024 cpu: 384 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0b_1x7/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 11242 cpu: 323 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_0c_7x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2440 cpu: 245 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/Branch_1/Conv2d_1a_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7a/Branch_2/MaxPool_1a_3x3/MaxPool:EXECUTED layerType: Pooling realTime: 580 cpu: 509 execType: GPU
InceptionV3/InceptionV3/Mixed_7a/concat:EXECUTED layerType: Concat realTime: 48 cpu: 419 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3312 cpu: 125 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4425 cpu: 315 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2448 cpu: 233 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4386 cpu: 272 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_1/Conv2d_0b_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_1/concat:EXECUTED layerType: Concat realTime: 26 cpu: 171 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4623 cpu: 301 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 8054 cpu: 247 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2454 cpu: 140 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0c_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4342 cpu: 191 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_2/Conv2d_0d_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/Branch_2/concat:EXECUTED layerType: Concat realTime: 28 cpu: 379 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 1040 cpu: 369 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2452 cpu: 351 execType: GPU
InceptionV3/InceptionV3/Mixed_7b/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7b/concat:EXECUTED layerType: Concat realTime: 18 cpu: 324 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 5353 cpu: 302 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_0/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 6703 cpu: 264 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2454 cpu: 163 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0b_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4341 cpu: 205 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_1/Conv2d_0c_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_1/concat:EXECUTED layerType: Concat realTime: 27 cpu: 342 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7212 cpu: 158 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0a_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 7973 cpu: 112 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0b_3x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 2475 cpu: 372 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0c_1x3/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 4524 cpu: 399 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_2/Conv2d_0d_3x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/Branch_2/concat:EXECUTED layerType: Concat realTime: 26 cpu: 314 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/AvgPool_0a_3x3/AvgPool:EXECUTED layerType: Pooling realTime: 1547 cpu: 281 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/batchnorm/mul:EXECUTED layerType: Convolution realTime: 3882 cpu: 224 execType: GPU
InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/Relu:OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: None
InceptionV3/InceptionV3/Mixed_7c/concat:EXECUTED layerType: Concat realTime: 17 cpu: 187 execType: GPU
InceptionV3/Logits/AvgPool_1a_8x8/AvgPool:EXECUTED layerType: Pooling realTime: 565 cpu: 149 execType: GPU
InceptionV3/Logits/Conv2d_1c_1x1/convolution:EXECUTED layerType: Convolution realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Logits/SpatialSqueeze:EXECUTED layerType: Reshape realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Predictions/Reshape:EXECUTED layerType: Reshape realTime: 15174 cpu: 98 execType: GPU
InceptionV3/Predictions/Reshape_1:EXECUTED layerType: Reshape realTime: 32 cpu: 798 execType: GPU
InceptionV3/Predictions/Reshape_1_cldnn_output_postprocess:EXECUTED layerType: Reorder realTime: 6 cpu: 784 execType: GPU
InceptionV3/Predictions/Softmax:EXECUTED layerType: SoftMax realTime: 32 cpu: 798 execType: GPU
input_cldnn_input_preprocess: EXECUTED layerType: Reorder realTime: 1211 cpu: 656 execType: GPU
scale: NOT_RUN layerType: Power realTime: 0 cpu: 0 execType: None
Total time: 645521 microseconds
[ INFO ] Processing output blobs
Top 10 results:
Image ./grace_hopper_299.bmp
715 1.0000000 label #715
111 0.0000000 label #111
711 0.0000000 label #711
917 0.0000000 label #917
949 0.0000000 label #949
503 0.0000000 label #503
983 0.0000000 label #983
853 0.0000000 label #853
35 0.0000000 label #35
615 0.0000000 label #615
[ INFO ] Execution successfull
cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Release .. && make
DL@2030006696-SOH MINGW64 ~/cldnn/build
$ cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Release .. && make
-- The C compiler identification is GNU 8.2.1
-- The CXX compiler identification is GNU 8.2.1
-- Check for working C compiler: C:/msys64/mingw64/bin/gcc.exe
-- Check for working C compiler: C:/msys64/mingw64/bin/gcc.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: C:/msys64/mingw64/bin/g++.exe
-- Check for working CXX compiler: C:/msys64/mingw64/bin/g++.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: C:/msys64/usr/bin/python2.7.exe (found suitable version "2.7.15", minimum required is "2.7")
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.4.22.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "C:/msys64/home/DL/cldnn/build/out/Windows32/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "C:/msys64/home/DL/cldnn/build/out/Windows32/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Windows32 (detected: Windows32)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build core internal tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN] - Run core internal tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/include
-- [clDNN] + Static libs: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/lib/x86
-- [clDNN] + Shared libs: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/bin/x86
-- [clDNN] + Libs to link: C:/msys64/home/DL/cldnn/common/intel_ocl_icd/6.3/windows/Release/lib/x86
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: C:/msys64/home/DL/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: C:/msys64/home/DL/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: C:/msys64/home/DL/cldnn/common/boost/1.64.0/windows/x86/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- [clDNN] Selected capabilities: public
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- [clDNN] Selected capabilities: public
-- Configuring done
-- Generating done
-- Build files have been written to: C:/msys64/home/DL/cldnn/build
[ 0%] Generating ks_primitive_db.inc ...
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_opt.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_ref.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/activation_tutorial.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/arg_max_min_axis.cl
processing C:/msys64/home/DL/cldnn/kernel_selector/core/cl_kernels/arg_max_min_gpu_ref.cl
..
..
..
..
..
[ 35%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_fb.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_items_class_optimized.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_ref.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax/softmax_kernel_selector.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_base.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_ref.cpp.obj
[ 36%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/softmax_loss_grad/softmax_loss_grad_kernel_selector.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/tile/tile_kernel_ref.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/tile/tile_kernel_selector.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_base.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_ref.cpp.obj
[ 37%] Building CXX object kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/core/actual_kernels/upsampling/upsampling_kernel_selector.cpp.obj
[ 37%] Linking CXX static library ../out/Windows32/Release/libcldnn_kernel_selector32.a
Error copying file (if different) from "C:/msys64/home/DL/cldnn/kernel_selector/core/cache/cache.json" to "C:/msys64/home/DL/cldnn/build/out/Windows32/Release/Release/".
make[2]: *** [kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/build.make:3965: out/Windows32/Release/libcldnn_kernel_selector32.a] Error 1
make[2]: *** Deleting file 'out/Windows32/Release/libcldnn_kernel_selector32.a'
make[1]: *** [CMakeFiles/Makefile2:313: kernel_selector/CMakeFiles/cldnn_kernel_selector.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
The check checks for clang and if true, tries to link against libc++, etc... without any checks that this is the correct runtime to be using.
https://github.com/intel/clDNN/blob/master/CMakeLists.txt#L1055
Hello,
My OS is Ubuntu 16.04.
cmake version is 3.7.2.
intel graphic driver is SRB5
intel opencl sdk is 1.2-7.0
When i run cmake -DCMAKE_BUILD_TYPE=Release .., i get this following error:
**-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
[clDNN] CLDNN__ARCHITECTURE_TARGET: Target architecture is not specified. Trying to deduce it from context.
-- Found PythonInterp: /usr/bin/python2.7 (found suitable version "2.7.12", minimum required is "2.7")
CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)
CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)
CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)
CMake Warning at /usr/local/share/cmake-3.7/Modules/FindBoost.cmake:761 (message):
Imported targets not available for Boost version 106400
Call Stack (most recent call first):
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:865 (_Boost_COMPONENT_DEPENDENCIES)
/usr/local/share/cmake-3.7/Modules/FindBoost.cmake:1454 (_Boost_MISSING_DEPENDENCIES)
CMakeLists.txt:596 (find_package)
-- Boost version: 1.64.0
-- Found the following Boost libraries:
-- system
-- date_time
-- program_options
-- filesystem
-- [clDNN] ======================== clDNN Project =======================
-- [clDNN] Version: 1.3.8.0
-- [clDNN]
-- [clDNN] Build type: Release (for single-configuration generators)
-- [clDNN] Av. build types: Debug;Release (for multi-configuration generators)
-- [clDNN]
-- [clDNN] Output bin directory:
-- [clDNN] - "/home/user1/cldnn/build/out/Linux64/Release"
-- [clDNN] Output lib directory:
-- [clDNN] - "/home/user1/cldnn/build/out/Linux64/Release"
-- [clDNN] Architecture:
-- [clDNN] - target: Linux64 (detected: Linux64)
-- [clDNN]
-- [clDNN]
-- [clDNN] Advanced:
-- [clDNN] - ICD version used to build: 6.3
-- [clDNN] - boost ver. used to build: 1.64.0
-- [clDNN]
-- [clDNN] - Include/Build cldnn core: ON
-- [clDNN] - Include/Build kernel selector: ON
-- [clDNN] - Include/Build tests: ON
-- [clDNN] - Include/Build tutorial: ON
-- [clDNN]
-- [clDNN] - Run tests: OFF
-- [clDNN]
-- [clDNN] - Use static C++ Runtime: OFF
-- [clDNN] - Allow unsafe size opts: ON
-- [clDNN] - CMake debug trace: OFF
-- [clDNN]
-- [clDNN]
-- [clDNN] ICD:
-- [clDNN] - Root: /home/user1/cldnn/common/intel_ocl_icd/6.3
-- [clDNN] + Headers: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/include
-- [clDNN] + Static libs: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/lib/x64
-- [clDNN] + Shared libs: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN] + Libs to link: /home/user1/cldnn/common/intel_ocl_icd/6.3/linux/Release/bin/x64
-- [clDNN]
-- [clDNN] boost libraries:
-- [clDNN] - Root: /home/user1/cldnn/common/boost/1.64.0
-- [clDNN] + Headers: /home/user1/cldnn/common/boost/1.64.0/include/boost-1_64
-- [clDNN] + Libs to link: /home/user1/cldnn/common/boost/1.64.0/linux/x64/lib
-- [clDNN] =============================================================================
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14
-- Performing Test CLDNN__COMPILER_SUPPORTS_CXX14 - Success
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- [clDNN] Selected capabilities: public
-- Configuring done
CMake Error at src/CMakeLists.txt:191 (add_library):
Target "clDNN_shlib" links to target "Boost::filesystem" but the target was
not found. Perhaps a find_package() call is missing for an IMPORTED
target, or an ALIAS target is missing?
CMake Error at src/CMakeLists.txt:191 (add_library):
Target "clDNN_shlib" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at tests/CMakeLists.txt:123 (add_executable):
Target "tests" links to target "Boost::filesystem" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at tests/CMakeLists.txt:123 (add_executable):
Target "tests" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
CMake Error at tutorial/CMakeLists.txt:60 (add_executable):
Target "tutorial" links to target "Boost::filesystem" but the target was
not found. Perhaps a find_package() call is missing for an IMPORTED
target, or an ALIAS target is missing?
CMake Error at tutorial/CMakeLists.txt:60 (add_executable):
Target "tutorial" links to target "Boost::system" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?
-- Generating done
-- Build files have been written to: /home/user1/cldnn/build**
Please help me. I cannot find any solutions. Thank you very much.
Hi,
As described in the example MNIST network, the size of convolution's weights memory is set to { out_channels, in_channels, kernel_size, kernel_size} when groups is 1. My questions is when groups is not 1, the size is still the same?
Thanks.
Could you please comment on the following in README.md
"
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel® a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
"
Is this part of the license? Could you please add a license file to the top level directory which covers everything in the repository?
I try to compile on macOS. I have installed boost by package manager Homebrew, but Cmake can't find boost. Here is error message:
Could not find the following static Boost libraries:
boost_system
boost_date_time
boost_program_options
boost_filesystem
No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the
directory containing Boost libraries or BOOST_ROOT to the location of
Boost.
Call Stack (most recent call first):
CMakeLists.txt:577 (find_package)
CMake Error at CMakeCompilerLinkerOpts.txt:328 (message):
[clDNN] Unknown compiler. Please define support for it or use different
compiler.
Call Stack (most recent call first):
CMakeLists.txt:709 (include)
https://github.com/intel/clDNN/blob/master/CMakeLists.txt#L104 causes an issue where systems will compile a binary with avx2 and another binary with avx512. Because CMAKE_BINARY_DIR is not used, whichever binary that was compiled last will override the former. When compiling with different instruction sets or optimizations, distros will create build_avx2 or build_avx512 directories and set CMAKE_BINARY_DIR to one of those. This way different build types do not conflict.
The solution here is NOT to reinvent the wheel by setting binary output to places other than CMAKE_BINARY_DIR.
I'm trying to build OpenVINO 2018 R5
with Drop 12.1
(since Intel's distribution contains an earlier version of clDNN
, like something before Drop 11
, which features a horrendous memory leak).
Due to absence of Intel Graphics on my CPU the graphics driver refuses to install, which result in a clDNNPlugin
linking error to OpenCL.lib
.
I've traced the issue to clDNN
's build script:
Lines 234 to 239 in f91d7d8
OpenCL.lib
as a public link library, it does not propagate corresponding link directory to consumers in the same way (I'm not sure what happens to link_directories
from the root script, though it is not respected by OpenVINO
's scripts, and I don't think it's a good practice to propagate target's dependencies via include_directories
, link_directories
and similar global commands).
I'm not sure if it's the best place for a fix (moreover I think it would be better to create an import target for OpenCL
), though it definitely resolves the link issue:
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 6313d50..10c5b88 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -237,6 +237,9 @@ target_link_libraries("${CLDNN_BUILD__PROJ}"
Boost::system
cldnn_kernel_selector
)
+target_link_directories("${CLDNN_BUILD__PROJ}"
+ INTERFACE ${CLDNN__IOCL_ICD_LIBDIRS}
+ )
if(WIN32)
target_link_libraries("${CLDNN_BUILD__PROJ}" setupapi)
I'm new to clDNN. Does clDNN need to be run in OpenCL SDK?
As per https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics, padding can be achieved by having a output_padding set to the layer.
I have a net which is like conv1 -> pool1 -> conv2 -> pool -> fc1 -> fc2 -> softmax
When I put output_padding to pool1 layer and run net only till that I can see output being padded correctly for pool1. However, when I connect pool1 with output_padding to conv2 it dones't seem to pad the data.
I tried also putting an explicit reorder with output_padding in b/w the pool1 and conv2 still doesn't seem to pad the output of pool1.
I think the prediction speed of clDNN is generally very good and it outperforms MKL on the same
processor for many operations I have tested. But the deconvolution operation seem to be very slow.
On Core i3-6100 and i5-6500 deconvolution takes approximately 40-50 times longer with clDNN than MKL in my tests. That is such a big difference that I don't think it is caused simply by lack of optimization.
See attached test case for details of how I measured it.
speed.zip
kernel_selector/core/common/primitive_db.cpp
is missing #include <stdexcept>
and thus does not compile with VS 2019 due to undeclared std::runtime_error
.
How can we compile tutorial/main.cpp? Please show me the command.
After running command "$ make tests", 8 tests are shown as failed. Also showing error:
tests/CMakeFiles/tests.dir/build.make:867: recipe for target 'build/out/Linux64/Debug/tests64' failed
What could be the reason?
Log:
[----------] Global test environment tear-down
[==========] 525 tests from 89 test cases ran. (151136 ms total)
[ PASSED ] 517 tests.
[ FAILED ] 8 tests, listed below:
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz2x2_in2x2x1x2_bfyx_stride2_pad1_fwd_backw
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz1x1_in1x2x5x5_bfyx_stride2_pad1
[ FAILED ] convolution_grad_weights_f32_fw_gpu.basic_wsiz2x2_in32x1x2x2_yxfb_stride1
[ FAILED ] memory_pool.basic_non_padded_relu_pipe
[ FAILED ] memory_pool.basic_non_padded_relu_and_pooling_pipe
[ FAILED ] memory_pool.multi_outputs_network
[ FAILED ] memory_pool.shared_mem_pool_same_topology_twice
[ FAILED ] memory_pool.shared_mem_pool_same_topology_twice_weights8 FAILED TESTS
YOU HAVE 17245 DISABLED TESTStests/CMakeFiles/tests.dir/build.make:867: recipe for target 'build/out/Linux64/Debug/tests64' failed
make[3]: *** [build/out/Linux64/Debug/tests64] Error 1
make[3]: *** Deleting file 'build/out/Linux64/Debug/tests64'
CMakeFiles/Makefile2:202: recipe for target 'tests/CMakeFiles/tests.dir/all' failed
make[2]: *** [tests/CMakeFiles/tests.dir/all] Error 2
CMakeFiles/Makefile2:214: recipe for target 'tests/CMakeFiles/tests.dir/rule' failed
make[1]: *** [tests/CMakeFiles/tests.dir/rule] Error 2
Makefile:190: recipe for target 'tests' failed
make: *** [tests] Error 2
Hi clDNN team! I recently look into your convolution code and find that expect in winograd algorithm, the conv2d primitive doesn't use any __local cache which should be the fastest gpu cache. I ran on Intel Gen9 GPU and the convolution is still pretty fast. I'm still studying the story behind the performance boost, and it'll be great if you could give any insights,
Hi! In clDNN documentation, it is said that users can use the primitive set to build and execute most common image recognition, semantic segmentation and object detection networks topologies. But I could not found the relevant files in the project. Can you tell me how to find them? Thanks!
Getting
primitive add failed: basic_string::_S_construct null not valid
while trying to replace original OpenVINO's libclDNN64.so
with a Drop 12.0
build (due to horrendous memory leak at ~1.5MB/s).
Linux build fails against commit 02add7c
my Linux box is ubuntu 16.04.5
Build cldnn with mkdir build; cd build; cmake ..; make
The error message is:
[ 51%] Built target cldnn_kernel_selector
make[2]: Circular codegen/test_builds/api_c_test.c <- codegen/test_builds/api_c_test.c dependency dropped.
make[2]: Circular codegen/test_builds/api_cpp_test.cpp <- codegen/test_builds/api_cpp_test.cpp dependency dropped.
make[2]: Circular codegen/test_builds/api_cpp_test.cpp <- codegen/test_builds/api_cpp_test.cpp dependency dropped.
make[2]: Circular codegen/test_builds/api_c_test.c <- codegen/test_builds/api_c_test.c dependency dropped.
[ 51%] Building C object api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o
In file included from /home/nhu/code/clDNN/build/codegen/test_builds/api_c_test.c:16:0:
/home/nhu/code/clDNN/api/C/pooling.h:56:1: error: unknown type name ‘bool’
bool global_pooling;
^
api_test_builds/CMakeFiles/api_test_builds.dir/build.make:213: recipe for target 'api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o' failed
make[2]: *** [api_test_builds/CMakeFiles/api_test_builds.dir/__/codegen/test_builds/api_c_test.c.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'api_test_builds/CMakeFiles/api_test_builds.dir/all' failed
make[1]: *** [api_test_builds/CMakeFiles/api_test_builds.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Should it be "throw std::invalid_argument(...)" ?
cldnn_memory cldnn_attach_memory(cldnn_layout layout, void* pointer, size_t size, cldnn_status* status)
{
return exception_handler<cldnn_memory>(CLDNN_ERROR, status, nullptr, [&]()
{
cldnn::layout layout_obj(layout);
if (layout_obj.bytes_count() > size)
std::invalid_argument("buffer size does not match layout size");
return api_cast(new cldnn::simple_attached_memory(layout_obj, pointer));
});
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.