borisgin / nvcaffe Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 18.0 42.79 MB

License: Other

CMake 2.66% Makefile 0.67% Python 10.25% Shell 0.54% C++ 77.12% Cuda 7.98% MATLAB 0.78%

nvcaffe's People

Contributors

Stargazers

Watchers

Forkers

yousongzhu liaoheping kai-xie chybhao666 cs28-dnn jarlene haruky l1129433134 wolf1981 simon821 monkeyhen shangguanshiyuan trantorrepository theerawatramchuen cynthiaprotector trellixvulnteam

nvcaffe's Issues

Documentation on major differences from BVLC/caffe

Hi,

I recently migrated from BVLC/caffe to nvcaffe, and there is a significant speedup during inference.
The options for layer-specific mixed-precision support is awesome.
I would also like to use nvcaffe for training and I have a few questions:

There is an enum Packing in caffe.proto which can be set to NCHW(default) or NHWC. If we set Packing to NHWC in the data layer, will the packing info be automatically transmitted to all the layers in the network ? Intuitively, it feels as though NHWC should be more efficient for both convolution and batch-norm layers, particularly for CUDNN.
The data augmentations are a bit confusing. Are the transformations in data_transformer.cu carried out synchronously on the GPU or are they done asynchronously with the data prefetch threads ?
There is also another layer called detectnet_transform_layer.[hpp/cpp/cu] which has its own set of transformations. When/how exactly is this layer used ? In particular, I am interested in image-to-image translation problems where both input and label are images of the same size and we mirror the transformations (random crops, flips, scale, mean subtraction, etc) on both the input and label image. I was wondering if this layer could be used for that.
Is multi-GPU inference with a single large image possible ?
Is there some way to further reduce memory during inference for large images ? For fully convolutional architectures, a lot of memory can be saved for instance by not storing most of the intermediate layer activations and using a single buffer for most convolution layers.

Finally, if there is some specific documentation for nvcaffe, that would also be awesome.

Where is mean_file: "/data/imagenet/imagenet_256x256_mean.binaryproto"

Where is the mean_file in https://github.com/borisgin/nvcaffe-0.16/blob/caffe-0.16/models/bvlc_alexnet/train_val_fp16.prototxt?

A mismatch found in your code and paper.

In your paper, I find

(1) we get the local learning rate for each learnable parameter by α = l×||w||2/(||∇w||2+β||∇w||2);

But in your code,

 rate = gw_ratio * w_norm / (wgrad_norm + weight_decay * w_norm);

The code and equation doesn't match. Is it a type error in your paper?
α equals l×||w||2/(||∇w||2+β||w||2); I think.

make: *** [.build_release/lib/libcaffe-nv.so.0.16.4] Error 1

LD -o .build_release/lib/libcaffe-nv.so.0.16.4
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libturbojpeg.a(libturbojpeg_la-turbojpeg.o): relocation R_X86_64_32 against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libturbojpeg.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:605: recipe for target '.build_release/lib/libcaffe-nv.so.0.16.4' failed
make: *** [.build_release/lib/libcaffe-nv.so.0.16.4] Error 1

Our workstation with...
Ubuntu:16.04
Cuda: 8.0
CuDNN: 7.05

Hi, did i mistake something? I'm seeking for answer about this. Thank you.

is the batch_size means global batch_size?

hello, I have read your paper, is the batch_size in your code the same as nvcaffe, which is means the global batch_size？Thanks.

[dead silence] NVcaffe Digit Object Detection With TensorRT GoogleNet dead silence

Conda 2.7.9 + Digits 6.0.0 + NVcaffe 0.16.4 + 4 X P40
All goes well with last line in caffe_oupt.log
#####################################################################
I1117 11:02:59.804045 15357 caffe.cpp:226] Starting Optimization
I1117 11:02:59.804064 15357 solver.cpp:386] Solving
I1117 11:02:59.804067 15357 solver.cpp:387] Learning Rate Policy: exp
I1117 11:02:59.822211 15357 net.cpp:1358] [3] Reserving 23918336 bytes of shared learnable space
I1117 11:02:59.823530 15357 solver.cpp:457] Iteration 0, Testing net (#0)
I1117 11:02:59.823545 15357 net.cpp:1004] Ignoring source layer train_data
I1117 11:02:59.823549 15357 net.cpp:1004] Ignoring source layer train_label
I1117 11:02:59.823552 15357 net.cpp:1004] Ignoring source layer train_transform
I1117 11:02:59.824621 15370 device_alternate.hpp:116] NVML initialized on thread 139987149186816
I1117 11:02:59.953352 15370 common.cpp:585] NVML succeeded to set CPU affinity on device 3
#####################################################################
But Caffe & Digit Freeze At
#####################################################################
Train Caffe Model Running
0%
#####################################################################
For several hours.

I shift to nvcaffe 0.15.13, its the same.
While ALL CLASSIFICATION JOBS GOES WELL .

I followed this demo link provided by nv:
https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet
with our own dataset preprocessed by digits.

Can anyone help me out?

borisgin / nvcaffe Goto Github PK

nvcaffe's People

Contributors

Stargazers

Watchers

Forkers

nvcaffe's Issues

Documentation on major differences from BVLC/caffe

Where is mean_file: "/data/imagenet/imagenet_256x256_mean.binaryproto"

A mismatch found in your code and paper.

make: *** [.build_release/lib/libcaffe-nv.so.0.16.4] Error 1

is the batch_size means global batch_size?

[dead silence] NVcaffe Digit Object Detection With TensorRT GoogleNet dead silence

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent