Coder Social home page Coder Social logo

drnikolaev / caffe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia/caffe

6.0 6.0 4.0 94.67 MB

Caffe: a fast open framework for deep learning.

Home Page: http://caffe.berkeleyvision.org/

License: Other

CMake 2.30% Makefile 0.59% Shell 0.63% C++ 76.92% MATLAB 0.66% Python 10.00% Cuda 8.82% Dockerfile 0.07%

caffe's People

Contributors

borisfom avatar borisgin avatar cypof avatar dgolden1 avatar drnikolaev avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar kkhoot avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar nv-slayton avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar slayton58 avatar thatguymike avatar tnarihi avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caffe's Issues

Convolution error

Caffe compiled with:

cmake .. -DPROTOBUF_INCLUDE_DIR="/beegfs/120x/home/ilia/protobuf/include/" -DUSE_NCCL=True -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="30 35 50 52 60 61 62 70" -DCUDA_ARCH_PTX="30 35 50 52 60 61 62 70" -DCUDA_NVCC_FLAGS=--Wno-deprecated-gpu-targets -Wno-dev

-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   system
--   thread
--   filesystem
-- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- Found PROTOBUF Compiler: /beegfs/120x/home/ilia/protobuf/bin/protoc
-- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
-- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
-- Found Snappy  (include: /usr/include, library: /usr/lib/libsnappy.so)
-- Found JPEGTurbo: /usr/include
-- CUDA detected: 9.0
-- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so (found version "7.0")
-- Added CUDA NVCC flags for: sm_30 sm_35 sm_50 sm_52 sm_60 sm_61 sm_62 sm_70 compute_30 compute_35 compute_50 compute_52 compute_60 compute_61 compute_62 compute_70
-- Found OpenCV 2.x: /usr/share/OpenCV
-- Found Atlas: /usr/include
-- Found Atlas (include: /usr/include, library: /usr/lib/libatlas.so)
-- Found PythonInterp: /beegfs/120x/home/ilia/nvcaffe_comp/bin/python2.7 (found suitable version "2.7.6", minimum required is "2.7")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython2.7.so (found suitable version "2.7.6", minimum required is "2.7")
-- Found NumPy: /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include (found suitable version "1.13.1", minimum required is "1.7.1")
-- NumPy ver. 1.13.1 found (include: /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include)
-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   python
-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE)
-- Found NCCL: /usr/include
-- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl.so)
-- Found NVML: /usr/include
-- Found NVML (include: /usr/include, library: /usr/lib/nvidia-384/libnvidia-ml.so)
-- Found Git: /usr/bin/git (found version "1.9.1")
--
-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   0.16.4
--   Git               :   v0.16.1-404-g860701c
--   System            :   Linux
--   C++ compiler      :   /usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -DDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
--
--   BUILD_SHARED_LIBS :   ON
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
--   USE_LEVELDB       :   ON
--   USE_LMDB          :   ON
--   ALLOW_LMDB_NOLOCK :   OFF
--   TEST_FP16         :   OFF
--
-- Dependencies:
--   BLAS              :   Yes (Atlas)
--   Boost             :   Yes (ver. 1.54)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 3.4.0)
--   lmdb              :   Yes (ver. 0.9.10)
--   LevelDB           :   Yes (ver. 1.15)
--   Snappy            :   Yes (ver. 1.1.0)
--   OpenCV            :   Yes (ver. 2.4.8)
--   JPEGTurbo         :   No
--   CUDA              :   Yes (ver. 9.0)
--
-- NVIDIA CUDA:
--   Target GPU(s)     :   Manual
--   GPU arch(s)       :   sm_30 sm_35 sm_50 sm_52 sm_60 sm_61 sm_62 sm_70 compute_30 compute_35 compute_50 compute_52 compute_60 compute_61 compute_62 compute_70
--   cuDNN             :   Yes (ver. 7.0)
--   NCCL              :   Yes (ver. 2.0.5)
--   NVML              :   /usr/lib/nvidia-384/libnvidia-ml.so
--
-- Python:
--   Interpreter       :   /beegfs/120x/home/ilia/nvcaffe_comp/bin/python2.7 (ver. 2.7.6)
--   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.6)
--   NumPy             :   /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include (ver 1.13.1)
--
-- Documentaion:
--   Doxygen           :   No
--   config_file       :
--
-- Install:
--   Install path      :   /beegfs/120x/home/ilia/caffe_builds/nvc/build/install
--
-- Configuring done
-- Generating done
-- Build files have been written to: /beegfs/120x/home/ilia/caffe_builds/nvc/build

When I tried to run training process with:
./build/tools/caffe train -solver='solver.prototxt'

I got following error:

I1019 15:17:12.441572 108568 solver.cpp:315] Iteration 0 (0.371277 s), loss = 1383.36
I1019 15:17:12.441620 108568 solver.cpp:332]     Train net output #0: loss_bbox = 8.39254e-06 (* 100 = 0.000839254 loss)
I1019 15:17:12.441634 108568 solver.cpp:332]     Train net output #1: loss_cls = 2.47564 (* 500 = 1237.82 loss)
I1019 15:17:12.441706 108568 solver.cpp:332]     Train net output #2: rpn_cls_loss = 0.693479 (* 100 = 69.3479 loss)
I1019 15:17:12.441738 108568 solver.cpp:332]     Train net output #3: rpn_loss_bbox = 0.93409 (* 100 = 93.409 loss)
I1019 15:17:12.441750 108568 sgd_solver.cpp:136] Iteration 0, lr = 5e-05, m = 0.5

*** Aborted at 1508415432 (unix time) try "date -d @1508415432" if you are using GNU date ***
PC: @     0x7f8922466b8d caffe::CuDNNConvolutionLayer<>::FindExConvAlgo()
*** SIGSEGV (@0x0) received by PID 108568 (TID 0x7f89242d4900) from PID 0; stack trace: ***
    @     0x7f8920263cb0 (unknown)
    @     0x7f8922466b8d caffe::CuDNNConvolutionLayer<>::FindExConvAlgo()
    @     0x7f892248b9f0 caffe::CuDNNConvolutionLayer<>::Reshape()
    @     0x7f89223a7d0b caffe::Layer<>::Forward()
    @     0x7f892261da3b caffe::Net::ForwardFromTo()
    @     0x7f892261db97 caffe::Net::Forward()
    @     0x7f8922620325 caffe::Net::ForwardBackward()
    @     0x7f8922630652 caffe::Solver::Step()
    @     0x7f8922631395 caffe::Solver::Solve()
    @           0x40d9e8 train()
    @           0x40ae18 main
    @     0x7f892024ef45 (unknown)
    @           0x40b6fb (unknown)
    @                0x0 (unknown)

It's a bug in elementwise/sum start from 0.16

The diff error in my topology was caused by the commted ShareDIff.
change back to old style walk-around the issue.
Just FYI.
case EltwiseParameter_EltwiseOp_SUM:
if (coeffs_[i] == 1.F) {
Btype* bottom_diff = bottom[i]->mutable_gpu_diff();
//bottom[i]->ShareDiff(top[0]);
caffe_copy(count, top_diff, bottom_diff);
} else {
Btype
bottom_diff = bottom[i]->mutable_gpu_diff();
caffe_gpu_scale(count, Btype(coeffs_[i]), top_diff, bottom_diff);
}
break;

Bugs with caffe-0.14-cnmem branch

I'm working with this code:

git checkout NVIDIA/v0.14.0-alpha
git merge lukeyeager/nvidia/versioning
git merge drnikolaev/caffe-0.14-cnmem

When I build and run the tests, I get two failures:

[  FAILED  ] CuDNNConvolutionLayerTest/0.TestGradientGroupCuDNN, where TypeParam = float (561 ms)

...

[ RUN      ] CuDNNConvolutionLayerTest/1.TestGradientGroupCuDNN
F1009 18:25:32.508326 17488 cudnn_conv_layer.cu:37] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)  CUDNN_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
    @     0x2aafa63a8daa  (unknown)
    @     0x2aafa63a8ce4  (unknown)
    @     0x2aafa63a86e6  (unknown)
    @     0x2aafa63ab687  (unknown)
    @     0x2aafa5cfd17a  caffe::CuDNNConvolutionLayer<>::Forward_gpu()
    @           0x7bc756  caffe::Layer<>::Forward()
    @           0x8c8890  caffe::GradientChecker<>::CheckGradientSingle()
    @           0x8d56d3  caffe::GradientChecker<>::CheckGradientExhaustive()
    @           0xa62bb2  caffe::CuDNNConvolutionLayerTest_TestGradientGroupCuDNN_Test<>::TestBody()
    @           0xbd6b23  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0xbcd5a7  testing::Test::Run()
    @           0xbcd64e  testing::TestInfo::Run()
    @           0xbcd755  testing::TestCase::Run()
    @           0xbd04a8  testing::internal::UnitTestImpl::RunAllTests()
    @           0xbd0747  testing::UnitTest::Run()
    @           0x7af817  main
    @     0x2aafabb81ec5  (unknown)
    @           0x7b62a2  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
make: *** [runtest] Error 2

The second one kills the test suite, so I can't say if any more would have failed or not.

ERROR on digits server when using bvlc_cub_v4_v5

Hi,
I' have compiled successfully the nvcaffe bvlc_cub_v4_v5 version, but when trying to execute digits I get the following error:

ERROR: Library at "libcaffe.so.1.0.0-rc3" does not have expected suffix "-nv". Are you using the NVIDIA/caffe fork? Invalid input
In nvcaffe/build/lib dir I see libcaffe.a libcaffe.so libcaffe.so.1.0.0-rc3

I can run successfully nvcaffe of master branch and also nvcaffe 0.15 by drnikolaev, but I fail to run on that drnikolaev bvlc_cub_v4_v5 version. I have just update DIGITS to the last version but the error is still here.
Any hints?
thanks

learn nvcaffe code

Hi, Sergei Nikolaev, Ph.D. I am a AI engineer, I want to explore the lower level of knowledge about AI framework, I have been studying nvcaffe-017.3 code for more than a month, I am still confused about the details of the code, can you share the notes about nvcaffe code or about documents that helping to understand nvcaffe code with me? Thanks for your response.

batch normalisation 16fp?

I am porting some caffe code to fp16. I face problem with batch normalisation (BN). It seems that cudnn doesn't support BN for fp16. The caffe engine seems prone to overflow if i use fp 16. Do you have a solution?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.