nvidia / digits Goto Github PK

View Code? Open in Web Editor NEW

4.1K 320.0 1.4K 49.96 MB

Deep Learning GPU Training System

Home Page: https://developer.nvidia.com/digits

License: BSD 3-Clause "New" or "Revised" License

Python 31.33% JavaScript 1.31% HTML 64.21% CSS 0.11% Shell 0.32% Lua 2.68% Makefile 0.01% Dockerfile 0.02%

deep-learning machine-learning gpu caffe torch

digits's Introduction

DIGITS

DIGITS (the Deep Learning GPU Training System) is a webapp for training deep learning models. The currently supported frameworks are: Caffe, Torch, and Tensorflow.

Feedback

In addition to submitting pull requests, feel free to submit and vote on feature requests via our ideas portal.

Documentation

Current and most updated document is availabel at NVIDIA Accelerated Computing, Deep Learning Documentation, NVIDIA DIGITS.

Installation

Installation method	Supported platform[s]	Available versions	Instructions
Source	Ubuntu 14.04, 16.04	GitHub tags	docs/BuildDigits.md

Official DIGITS container is available at nvcr.io via docker pull command.

Usage

Once you have installed DIGITS, visit docs/GettingStarted.md for an introductory walkthrough.

Then, take a look at some of the other documentation at docs/ and examples/:

Get help

Installation issues

First, check out the instructions above
Then, ask questions on our user group

Usage questions

First, check out the Getting Started page
Then, ask questions on our user group

Bugs and feature requests

Please let us know by filing a new issue
Bonus points if you want to contribute by opening a pull request!
- You will need to send a signed copy of the Contributor License Agreement to [email protected] before your change can be accepted.

Notice on security

Users shall understand that DIGITS is not designed to be run as an exposed external web service.

digits's People

Contributors

Stargazers

Watchers

Forkers

yanshanjing geekrick88 silky wanji haisee cfandy onlysang jethrotan wyxy2005 amos-zq jeanru dreammaster38 shyamalschandra raaka1 skyuuka huamichaelchen fdoperezi ifp-uiuc xxw345 rhelfan hxi charlestestorg amiltonwong hihihippp wycg1984 jinnim56395 saveforks srepho stevenryoung goulgold fucheng830 yiiwood yin-shane-xia jwdai azuredsky billycheng0629 feherbalazs bahaugen sravan2j spideryan mashru-islam seibert jayhetee arrmac ceperapetrov ml-ai-nlp-ir xiaozhuka jwgu wyvern92 hromadka altenli shuangao lukeyeager drozdvadym zuiwufenghua xiyuanhou orangelpai chagge xuanhan863 kylemcdonald zhmz90 rmcatee stevenlol profcab bebekifis deepomatic dexter1691 fenstea vislab2013 xsongx vanova pfshawn mrgloom joyofdata emergentorder trivedigaurav yanweifu rtvt123 dniku liyuanpng fmonti deshraj kazeka fantasticxp 52191114 ahmedosman djiayong5 nagyistoce cristianadinea mohendra colingogo jiayong semisight chandlerz xaccc arasharchor jferguson20 ashokpant cloud-cv macbull

digits's Issues

Is it possible to have a standalone solution

Is it possible to have a standalone solution?

So that I don't need to install those dependencies, last time I spend one day to install Caffe and only managed to install it on 2/3 of my computers due to OS (ubuntu 12.04/12.10/14.04) and dependency (boost,opencv,gcc,cuda,hdf5) issues.

I have disk space, and I have hardware, so I don't mind even if a standalone package is 10GB or more.

Dataset browser

It would look complete if there is a way to look at the images in the Dataset created. Right now it is difficult to peep in to lmdb and see how the images looks like (especially if we choose to not store the original image to save space)

MNIST training error

I installed digits from github and I am trying to run mnist example. I have opencv 3.0 on my system but still it is giving me the following error while training the leNet model.

can you please help me resolve the issue ?

train_db
OpenCV Error: Assertion failed (k == STD_VECTOR_MAT) in getMat, file /build/buildd/opencv-2.3.1/modules/core/src/matrix.cpp, line 918
terminate called after throwing an instance of 'cv::Exception'
what(): /build/buildd/opencv-2.3.1/modules/core/src/matrix.cpp:918: error: (-215) k == STD_VECTOR_MAT in function getMat
*** Aborted at 1427329465 (unix time) try "date -d @1427329465" if you are using GNU date ***
PC: @ 0x7fb97abe00d5 (unknown)
*** SIGABRT (@0x3e800003eab) received by PID 16043 (TID 0x7fb97c798940) from PID 16043; stack trace: ***
@ 0x7fb97af78cb0 (unknown)
@ 0x7fb97abe00d5 (unknown)
@ 0x7fb97abe383b (unknown)
@ 0x7fb97b3fe16d (unknown)
@ 0x7fb97b3fc1d6 (unknown)
@ 0x7fb97b3fc221 (unknown)
@ 0x7fb97b3fc438 (unknown)
@ 0x7fb97b9e6d31 (unknown)
@ 0x7fb97ba32fbc (unknown)
@ 0x7fb97b6c2e68 (unknown)
@ 0x7fb97c0fb569 caffe::DecodeDatumToCVMatNative()
@ 0x7fb97c0fc18d caffe::DecodeDatumNative()
@ 0x7fb97c0c5372 caffe::DataLayer<>::DataLayerSetUp()
@ 0x7fb97c0c7c76 caffe::BaseDataLayer<>::LayerSetUp()
@ 0x7fb97c0c7d79 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7fb97c0e90e3 caffe::Net<>::Init()
@ 0x7fb97c0eb671 caffe::Net<>::Net()
@ 0x7fb97c128f22 caffe::Solver<>::InitTrainNet()
@ 0x7fb97c129522 caffe::Solver<>::Init()
@ 0x7fb97c129b45 caffe::Solver<>::Solver()
@ 0x40a8e8 caffe::GetSolver<>()
@ 0x4061f7 train()
@ 0x40479b main
@ 0x7fb97abcb76d (unknown)
@ 0x404c21 (unknown)

How to shuffle data for training?

I am not sure, but all networks that was trained on two classes, converge very fast. And for the several first epochs, network recognizes only negative classes. Maybe it due to not shuffled data?

ERROR: Cuda version must be >= 6.5

Dear all,

I came across "ERROR: Cuda version must be >= 6.5" after following the install guide.

root@milton-Desktop:~/digits# ./digits-devserver 
  ___  _  ___ _ _____ ___
 |   \(_)/ __(_)_   _/ __|
 | |) | | (_ | | | | \__ \
 |___/|_|\___|_| |_| |___/

ERROR: Cuda version must be >= 6.5
Couldn't import dot_parser, loading of dot files will not be possible.
 * Running on http://0.0.0.0:5000/

I had upgraded CUDA into 7.0 version and checked it by tools "deviceQuery" as follows:

root@milton-Desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN"
  CUDA Driver Version / Runtime Version          7.0 / 7.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 6143 MBytes (6441730048 bytes)
  (14) Multiprocessors, (192) CUDA Cores/MP:     2688 CUDA Cores
  GPU Max Clock rate:                            876 MHz (0.88 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime Version = 7.0, NumDevs = 1, Device0 = GeForce GTX TITAN
Result = PASS
root@milton-Desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery#

It seems that "digits" cannot recognize my updated CUDA version. Could someone suggest me how to fix it?

(My system is ubuntu 12.04, 64 bit)

Thanks in advance~
Milton

ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

I have followed the steps exactly as in your guide using an Amazon AWS GPU instance. When I get to the running of the .\digit-server I get:

Cannot guess value for "caffe_root": caffe binary cannot be found
Cannot guess value for "gpu_list": Cannot query GPUs without a valid caffe_root

I also tried to download the .tar but when I run the ./runme file and I get

ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory

Do you have any suggestions about what I am doing wrong?

Add ability to continue training process after aborting

Now, when server stopped or there was another problem, I have only one method to continue training:

Creating new task based on previous

Perhaps, better to add this ability to the aborted task?

requirements for webinstall ubuntu 14.04 (grpahViz, cuDNN)

I have been working through a successful web-install on clean ubuntu 14.04

The requirements only stated to have driver 346 - so I haven't downloaded cuDNN or graphviz

When I attempt to visualize a network (visualize button ) when building a model

I return this error

Do i need to get graphviz and cuDNN on my own?

Thanks and my apologies if this is the wrong place for this question. Great work

Incorrect image sizes

Due to problem which was found in #29:

the size of images, when "New Image Classification Dataset" are created, are not corresponds to "Mona Lisa" example (width and height are reversed).

Maybe is better to show some hint for sizes?

PicklingError in logs

I'm seeing the following error in the status log. I assume this has something to do with saving intermediate network parameters via pickle.

Caught PicklingError while saving job: Can't pickle <class 'caffe_pb2.NetParameter'>: it's not found as caffe_pb2.NetParameter

Flask-WTF>=0.11 in xubuntu 14.04?

Everything goes fine when installing the requirements.txt by pip install, except Flask-WTF>=0.11, where I receive this output:

Downloading/unpacking Flask-WTF>=0.11
  Downloading Flask_WTF-0.11-py2.py3-none-any.whl
Cleaning up...
Exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1259, in prepare_files
    )[0]
IndexError: list index out of range

I suspect the reason is that the python-flaskext.wtf package in ubuntu 14.04 repository is version 0.6 not 0.11.
When I start the digits-devserver it gives me this output:

 ___  _  ___ _ _____ ___
 |   \(_)/ __(_)_   _/ __|
 | |) | | (_ | | | | \__ \
 |___/|_|\___|_| |_| |___/

Traceback (most recent call last):
  File "./digits-devserver", line 40, in <module>
    from digits.webapp import app, socketio, scheduler
  File "/home/mmoustafa/digits/digits/webapp.py", line 29, in <module>
    import digits.views
  File "/home/mmoustafa/digits/digits/views.py", line 14, in <module>
    import dataset.views
  File "/home/mmoustafa/digits/digits/dataset/views.py", line 6, in <module>
    import images.views
  File "/home/mmoustafa/digits/digits/dataset/images/views.py", line 9, in <module>
    import classification.views
  File "/home/mmoustafa/digits/digits/dataset/images/classification/views.py", line 12, in <module>
    from forms import ImageClassificationDatasetForm
  File "/home/mmoustafa/digits/digits/dataset/images/classification/forms.py", line 7, in <module>
    from wtforms.validators import ValidationError, StopValidation, Optional, DataRequired, NumberRange, AnyOf
ImportError: cannot import name DataRequired

How to fix this?
Thanks,

Handle GPU memory management

See #18.

DIGITS should handle GPU memory allocation for the user automatically. This could be done in a few ways:

Calculate how much memory will be required before the training starts and adjust the batch size automatically. This may not be possible - I'd have to dig into the caffe code to figure out whether they even know before running. There is a Memory required for data line in caffe's output, but it seems totally unrelated to the amount of memory used on the GPU.
Detect out-of-memory failures and automatically scale down the batch size and re-run the job until it fits in memory. This could take a while - even on a fast machine, the VGG network takes a few minutes before taking up its maximum amount of memory on the GPU. So, if we have to wait for caffe to fail 2 or 3 times before getting it right, this could be a major time suck.
Let cuDNN handle the memory management for us automatically. In section 3.11 of the cuDNN user doc, there are options for specifying how to choose the convolution algorithm. The CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT option could be used to say "use the fastest algorithm that fits within the specified memory budget." This would be a change to caffe, not to DIGITS, and it's not a complete solution since many people will be using caffe without cuDNN and maybe even without CUDA.

Problem installing caffe for digits

Hi, Im trying to install nvidia's version of caffe, with steps provided here, and everything works, until I run: make runtest, it outputs this:

.build_release/tools/caffe
dyld: Library not loaded: libcaffe-nv.so.0
Referenced from: /Users/caffeteria/caffe/.build_release/tools/caffe
Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

can anyone help how to solve this?

OS 10.10.2, I mostly used brew and pip

TypeError: google.visualization is undefined

while I'm using it offline, the graph of training can't be shown...
Could you pack the google package into the source?
Thanks!

Out of memory error

Although my error looks similar to issue 3, I thought I should open this as a separate issue. I choose alexnet as my model and leave all settings default. I have a 970m with 3GB of memory. The output on my terminal says:

2015-03-22 14:01:32 [20150322-140130-3826] [DEBUG] Train Caffe Model task queued.
2015-03-22 14:01:32 [20150322-140130-3826] [INFO ] Train Caffe Model task started.
2015-03-22 14:01:33 [20150322-140130-3826] [DEBUG] memory required: 793 MB
2015-03-22 14:01:34 [20150322-140130-3826] [DEBUG] memory required: 793 MB
2015-03-22 14:01:47 [20150322-140130-3826] [DEBUG] Network accuracy #0: 73.4714
2015-03-22 14:01:47 [20150322-140130-3826] [ERROR] Train Caffe Model: Check failed: error == cudaSuccess (2 vs. 0) out of memory
2015-03-22 14:01:48 [20150322-140130-3826] [ERROR] Train Caffe Model task failed with error code -6

From the caffe_output.log, I see:

I0322 14:01:34.870321 4054 solver.cpp:42] Solver scaffolding done.
I0322 14:01:34.870345 4054 solver.cpp:222] Solving
I0322 14:01:34.870350 4054 solver.cpp:223] Learning Rate Policy: step
I0322 14:01:34.870358 4054 solver.cpp:266] Iteration 0, Testing net (#0)
I0322 14:01:47.719110 4054 solver.cpp:315] Test net output #0: accuracy = 0.734714
I0322 14:01:47.719143 4054 solver.cpp:315] Test net output #1: loss = 1.53392 (* 1 = 1.53392 loss)
F0322 14:01:47.932499 4054 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7fe678bebc6c (unknown)
@ 0x7fe678bebbb8 (unknown)
@ 0x7fe678beb5ba (unknown)
@ 0x7fe678bee551 (unknown)
@ 0x7fe67901bb4b caffe::SyncedMemory::mutable_gpu_data()
@ 0x7fe67901c893 caffe::Blob<>::mutable_gpu_diff()
@ 0x7fe67904605f caffe::CuDNNPoolingLayer<>::Backward_gpu()
@ 0x7fe678f2d158 caffe::Net<>::BackwardFromTo()
@ 0x7fe678f2d211 caffe::Net<>::Backward()
@ 0x7fe678f49ca1 caffe::Solver<>::Step()
@ 0x7fe678f4a72f caffe::Solver<>::Solve()
@ 0x40610f train()
@ 0x40412b main
@ 0x7fe6780f2ec5 (unknown)
@ 0x404775 (unknown)
@ (nil) (unknown)

Support for GoogLeNet

GoogLeNet: bvlc_googlenet's train_val.prototxt says "Cannot specify more than one Accuracy Layer"

Tried to create a custom model and pasted the bvlc_googlenet's tain_val.prototxt... The initialization of the model failed saying the above error... Looks like a bug for me...

ctypes error loading libcudart library on Mac OS X

Line 84 in DIGITS/digits/device_query.py

cudart = ctypes.cdll.LoadLibrary('libcudart.so')

causes a load error on Mac OS X, given that dylib's are used.

This

cudart = ctypes.cdll.LoadLibrary('libcudart.dylib')

fixes it. Should probably check for the OS it's running on, or try both.

Adjust learning rate when batch size changes

See discussion in #44.

As Alex Krizhevsky explains in his paper One weird trick for parallelizing convolutional neural networks, the learning rate, momentum and weight decay are all dependent on the batch size (see section 5, page 5). It would be nice if DIGITS handled these calculations for you automatically so that you don't have to worry about it.

The issue is that different networks have different default learning rates and batch sizes. Is there a standard equation that fits all networks?

Use a different js graphing library

See #9 and #31.

I can't download Google Charts and include it with DIGITS because of their ToS. This means you have to be online to view the graphs.

Add travis CI integration

Contributors should be running the test suite themselves before submitting changes, but we should use Travis CI on GitHub anyway. Since Travis doesn't have GPUs on their build servers, we will just have to build caffe (and torch, theano, etc. in the future) in CPU-only mode. That won't give us full test coverage, but DIGITS should be as independent as possible from the backends anyway.

The main obstacle to overcome here is deciding how to clone and build caffe on the travis server. I'm considering adding caffe as a git submodule linking to the currently supported version. But this decision is related to the bigger issue of caffe integration mentioned in all of these issues.

Add ability to select which GPU to run on

When creating a model, users should be able to select which GPU they want to use. This enhancement should be designed with multi-GPU in mind, so that when caffe or torch releases multi-GPU support, DIGITS will be ready to assign multiple GPUs to a job.

Don't resize twice during inference

Thanks to @flx42 for reporting this.

I resize to the image_dims (e.g. 256x256) before passing the image to caffe.io.Transformer, which then resizes it to the crop_dims (e.g. 227x227). That's silly and inefficient.

Parse arbitrary network outputs

Currently, DIGITS is hard-coded to look for three outputs (see here):

Train 'loss'
Test 'loss'
Test 'accuracy'

Instead, it should parse and save all train outputs and all test outputs, then decide how to graph them later. This is required for network architectures like GoogLeNet (see #11), which has multiple loss layers AND multiple accuracy layers.

Loss graph did not show

Build digit from source on Centos 6. Everything works fine except the loss graph and histogram of images per category is missing from web interface. I double checked that all required python packages listed in requirement.txt have been installed. I guess I am missing some Javascript functions/packages

Paginate previous networks

Like the homepage. These can get out of hand if you're using DIGITS a lot.

runme.sh error "ImportError: No module named datetime"

./runme.sh gives an error that datatime module could not be found. But I checked with "python -c "from datetime import datetime, date"", no error was given.

What's could be the reason?

Thanks!

./runme.sh

---

 |   (_)/ __(_)_   _/ __|
 | |) | | (_ | | | | __ \
 |**_/|_|_**|_| |_| |___/

Traceback (most recent call last):
  File "digits/digits-devserver", line 40, in <module>
    from digits.webapp import app, socketio, scheduler
  File "/home/iranyu/DeepLearning/digits-1.0/digits/digits/webapp.py", line 6, in <module>
    from flask import Flask
  File "/home/iranyu/DeepLearning/digits-1.0/python-env/lib/python2.7/site-packages/flask/**init**.py", line 17, in <module>
    from werkzeug.exceptions import abort
  File "/home/iranyu/DeepLearning/digits-1.0/python-env/lib/python2.7/site-packages/werkzeug/**init**.py", line 154, in <module>
    **import**('werkzeug.exceptions')
  File "/home/iranyu/DeepLearning/digits-1.0/python-env/lib/python2.7/site-packages/werkzeug/exceptions.py", line 67, in <module>
    from werkzeug._internal import _get_environ
  File "/home/iranyu/DeepLearning/digits-1.0/python-env/lib/python2.7/site-packages/werkzeug/_internal.py", line 15, in <module>
    from datetime import datetime, date
ImportError: No module named datetime

[Suggestion]Add change the snapshot period item

I'd like to add the textbox of snapshot_epoch for changing the snapshot period.
Or my disk is going to explode...
Would you tell me which file should I change?
(I've found the digits/model/forms.py to add the box)

The NVIDIA branch of caffe does not past make runtest under openSuSE

I have been successful in building doing "make runtest" for master branch of caffe.

I was unsuccessful in getting DIGITS to work properly with it

Then I became hopeful when I found issues #12 "500: Internal server error , expect blob.data.ndim == 4" and followed to the the instructions letter in getting and buildng the NVIDIA branch of caffe from github ..... it compiles, however "make runtest" consistently fails on a number of different type of testing...
examples cited above:

Cuda number of devices: 2
Current device id: 0
Note: Randomizing tests' orders with a seed of 79470 .
[==========] Running 1068 tests from 198 test cases.
[----------] Global test environment set-up.
[----------] 1 test from HDF5DataLayerTest/3, where TypeParam = caffe::DoubleGPU
[ RUN ] HDF5DataLayerTest/3.TestRead
F0404 11:38:19.559420 18678 hdf5_data_layer.cpp:89] Failed to open source file:
*** Check failure stack trace: ***
@ 0x7f7c006a322d (unknown)
@ 0x7f7c006a4ffc (unknown)
@ 0x7f7c006a2e1c (unknown)
@ 0x7f7c006a590e (unknown)
@ 0x7f7c01057d01 caffe::HDF5DataLayer<>::LayerSetUp()
@ 0x92482a caffe::Layer<>::SetUp()
@ 0x92706f caffe::HDF5DataLayerTest_TestRead_Test<>::TestBody()
@ 0x989a33 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x9800a7 testing::Test::Run()
@ 0x98014e testing::TestInfo::Run()
@ 0x980255 testing::TestCase::Run()
@ 0x982f88 testing::internal::UnitTestImpl::RunAllTests()
@ 0x983227 testing::UnitTest::Run()
@ 0x686e6a main
@ 0x7f7bf6b07b05 __libc_start_main
@ 0x689b32 (unknown)
/bin/sh: line 1: 18678 Aborted /home/XMan/Documents/digits-1.0/caffe/build/test/test.testbin --gtest_shuffle
src/caffe/test/CMakeFiles/runtest.dir/build.make:49: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:273: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:281: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:216: recipe for target 'runtest' failed
make: *** [runtest] Error 2

[----------] 5 tests from MemoryDataLayerTest/0, where TypeParam = caffe::FloatCPU
[ RUN ] MemoryDataLayerTest/0.TestForward
*** Aborted at 1428163234 (unix time) try "date -d @1428163234" if you are using GNU date ***
PC: @ 0x7f533fb267d0 caffe::caffe_rng_gaussian<>()
*** SIGSEGV (@0x5850000) received by PID 21890 (TID 0x7f53402af7c0) from PID 92602368; stack trace: ***
@ 0x7f533555a200 (unknown)
@ 0x7f533fb267d0 caffe::caffe_rng_gaussian<>()
@ 0x699b89 caffe::GaussianFiller<>::Fill()
@ 0x6e04d5 caffe::MemoryDataLayerTest<>::SetUp()
@ 0x989a33 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x980040 testing::Test::Run()
@ 0x98014e testing::TestInfo::Run()
@ 0x980255 testing::TestCase::Run()
@ 0x982f88 testing::internal::UnitTestImpl::RunAllTests()
@ 0x983227 testing::UnitTest::Run()
@ 0x686e6a main
@ 0x7f5335546b05 __libc_start_main
@ 0x689b32 (unknown)
/bin/sh: line 1: 21890 Segmentation fault /home/XMan/Documents/digits-1.0/caffe/build/test/test.testbin --gtest_shuffle
src/caffe/test/CMakeFiles/runtest.dir/build.make:49: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 139
CMakeFiles/Makefile2:273: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:281: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
M

Cuda number of devices: 2
Current device id: 0
Note: Randomizing tests' orders with a seed of 42443 .
[==========] Running 1068 tests from 198 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from NetUpgradeTest
[ RUN ] NetUpgradeTest.TestImageNet
*** Aborted at 1428163262 (unix time) try "date -d @1428163262" if you are using GNU date ***
PC: @ 0x7f258eca0f18 (unknown)
*** SIGSEGV (@0x7fff00000000) received by PID 21933 (TID 0x7f25990ac7c0) from PID 0; stack trace: ***
@ 0x7f258e357200 (unknown)
@ 0x7f258eca0f18 (unknown)
@ 0x68a25c caffe::NetUpgradeTest_TestImageNet_Test::TestBody()
@ 0x989a33 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x9800a7 testing::Test::Run()
@ 0x98014e testing::TestInfo::Run()
@ 0x980255 testing::TestCase::Run()
@ 0x982f88 testing::internal::UnitTestImpl::RunAllTests()
@ 0x983227 testing::UnitTest::Run()
@ 0x686e6a main
@ 0x7f258e343b05 __libc_start_main
@ 0x689b32 (unknown)
/bin/sh: line 1: 21933 Segmentation fault /home/XMan/Documents/digits-1.0/caffe/build/test/test.testbin --gtest_shuffle
src/caffe/test/CMakeFiles/runtest.dir/build.make:49: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 139
CMakeFiles/Makefile2:273: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CM

No modifications allowed

The included LICENSE.pdf file states "If the SOFTWARE is provided in source form, Licensee may not modify or create derivative works of the SOFTWARE." How can developers work with this git repo?

Stock caffe integration

I dont know where to put this... ( i dont find a mailing list for DIGITS yet?)
it would be good if digits is made compatible with the stock Caffe..
Is there anything specific in NVIDIA caffe?

./zahoor

math domain error

I created my dataset and used the default AlexNet definitions. I get the error below and the caffe log says
"Memory required for data: 0"

Not sure why? The LMDB's seem to exist.

ERROR: math domain error

Traceback (most recent call last):
File "/home/valada/digits/digits/scheduler.py", line 386, in task_thread
task.run(**options)
File "/home/valada/digits/digits/task.py", line 196, in run
if not self.process_output(line):
File "/home/valada/digits/digits/model/tasks/caffe_train.py", line 475, in process_output
self.logger.debug('memory required: %s' % utils.sizeof_fmt(bytes_required))
File "/home/valada/digits/digits/utils/init.py", line 100, in sizeof_fmt
i = int(math.floor(math.log(size,1024)))
ValueError: math domain error

[ERROR] AssertionError: must specify a SoftmaxWithLoss layer; SOFTMAX_LOSS cant

when i want to run googlenet, googlenet use

layers {
bottom: "loss2/classifier"
bottom: "label"
top: "loss2/loss1"
name: "loss2/loss"
type: SOFTMAX_LOSS
loss_weight: 0.3

not SoftmaxWithLoss layer,

here error:
2015-03-28 12:29:48 [20150328-122946-19fd] [DEBUG] Train Caffe Model task queued.
2015-03-28 12:29:48 [20150328-122946-19fd] [ERROR] AssertionError: must specify a SoftmaxWithLoss layer ;

500: Internal server error , expect blob.data.ndim == 4

When testing a trained model (stock AlexNet) using a single image... i get "500: internal server error", Assertion failed: expect blob.data.ndim == 4.

can not create database

Traceback (most recent call last): File "/home/blazeli/digits/tools/create_db.py", line 21, in import leveldb ImportError: No module named leveldb

ps, i've installed leveldb and included the library, caffe is running properly

Running on CPU

Is there any way to set it to use CPU instead of GPU. I'm trying to explore this software from my home machine, which doesn't have an NVIDIA card. I'm getting this error:

Check failed: error == cudaSuccess (35 vs. 0)  CUDA driver version is insufficient for CUDA runtime version

cant create a database : Create DB (train): DbCreator.read_thread caught AttributeError: 'Datum' object has no attribute 'encoded'

when download digits from http://developer.nvidia.com/digits ;create a database from mnist_10k, in the end 100% ,i will error:

2015-03-28 12:26:50 [20150328-122603-3a5f] [WARNING] Create DB (train): DbCreator.read_thread caught AttributeError: 'Datum' object has no attribute 'encoded'
2015-03-28 12:26:50 [20150328-122603-3a5f] [WARNING] Create DB (train): DbCreator.read_thread caught AttributeError: 'Datum' object has no attribute 'encoded'
2015-03-28 12:26:50 [20150328-122603-3a5f] [WARNING] Create DB (train): DbCreator.read_thread caught AttributeError: 'Datum' object has no attribute 'encoded'
2015-03-28 12:26:50 [20150328-122603-3a5f] [WARNING] Create DB (train): DbCreator.read_thread caught AttributeError: 'Datum' object has no attribute 'encoded'
2015-03-28 12:26:51 [20150328-122603-3a5f] [ERROR] Create DB (train): no images added
2015-03-28 12:26:51 [20150328-122603-3a5f] [ERROR] Create DB (train) task failed with error code 1

becase i choose "Save encoded JPEGs"

when i dont choose "Save encoded JPEGs",it was done!, it ok.

What is the requirements for the data folder

When I try to import data, I have this problem. My folder contain sub folder for each class. I am not sure if there are other requirements for the data folder. If there is any requirements for the data folder, could you please inform me of it

ERROR: no images added

Selecting a pretrained model from the "Previous networks" tab doesn't work

For now, you can just click the "Customize" button and submit the job from that tab.

when will you release digits for windows,

there's one caffe-windows at http://pan.baidu.com/s/1eQ6bWK6

Add support for more image formats

Is it possible to create dataset from .bmp images?

See @sersajur's comment in #7.

Currently, the only supported image types are JPEG and PNG (see here and here). I'll add support for bmp and anything else that Pillow 2.3.0 can read.

Train Caffe Model task failed with error code -6

I've sccessfully imported a dataset by following the Getting Started page. However, after I created a new model (i.e. LeNet), there is an error about training caffe model. The error message is: [ERROR] Train Caffe Model task failed with error code -6. May I know how to solve that?

Thank you.

Output accuracy on a test set

Currently we have the ability to see the top images for each category for a given test set, or the predicted classes and visualisations for one image, but it would be good to get an overall classification accuracy for a test set as a simple numeric output.

Report errors in load_image() rather than suppressing them

Originally mentioned in #7.

Images fail to load fairly often, and it would be very helpful to propagate those exceptions out of digits.utils.image.load_image() into the functions which call it, to be handled more appropriately in each case.

is storage format different in lmdb created by digits?

I wanted to create a confusion matrix by reading the test data from lmdb created by digits. i had a old script which reads the data from lmdb as datum and does it. but it is not working now.

Are the data format in lmdb created by caffe and digits different?
Caffe stores them as datums and looks like digits stores them as jpeg.

is it possible to maintain compatibility between storage formats?

Status of dataset and trained model "Aborted"

I followed the getting started guide and created a dataset for the mnist_10k images and sucessfully trained the default lenet. After stopping the server and starting it up again the status for my dataset and trained model changes to "Aborted".

[ERROR] ValueError: math domain error

[DEBUG] Train Caffe Model task queued.
2015-04-07 19:32:44 [20150407-193242-7b4c] [INFO ] Train Caffe Model task started.
2015-04-07 19:32:45 [20150407-193242-7b4c] [ERROR] ValueError: math domain error

I don't know why the firefox getdata from www.google.com.....

I did the example follow by readme.md

who can explain,3 q

caffe BGR issue

hey,

After I trained the model successfully, I tried to test one image, and I found it does not do channel swap. So I am a little confused.

Here is my understanding:

Inside caffe, if you let caffe do decoding on the image, it will always be BGR format(by opencv). And I did not see any code inside digits, which will convert loaded images into BGR mode. Therefore,

digits create db -----> with encoded images
caffe train the db ------> with BGR format
digits test one image -------> with RGB format

So the final result won't work as expected.

Did I miss anything?

load a caffe pre-trained model?

It is not an issue per se.
Is it possible to load a pre-trained model from caffe (still nvidia branch) command line? For instance, if I want to start from the imagenet model (and weights) and modify/train the last layer, etc... or load something from the zoo.
Thanks,

Display images in dataset

In the web UI, load and display some of the images in the dataset to help users remember which image set this is.

50% Prediction Errors at "100%" accuracy

When training an AlexNet with two classes of 5,000 images per each class in separate training and validation directories (total 20,000 images). The training and val loss values drop to near zero and the accuracy goes to an unbelievable 100%. Doing single image tests from the validation set for the second class is almost always an incorrect prediction. Single image tests from the validation set for the first class is always a correct prediction. Am I misunderstanding what is being reported as "Accuracy"? the images are png files that are approximately 800x600.

I have gotten the same result with much smaller subset of the data 1000 image per class.