uoguelph-mlrg / theano_alexnet Goto Github PK

Theano-based Alexnet

License: BSD 3-Clause "New" or "Revised" License

Python 99.76% Shell 0.24%

theano_alexnet's Introduction

AlexNet Implementation with Theano

Demonstration of training an AlexNet in Python with Theano. Please see this technical report for a high level description. theano_multi_gpu provides a toy example on how to use 2 GPUs to train a MLP on the mnist data.

If you use this in your research, we kindly ask that you cite the above report:

@article{ding2014theano,
  title={Theano-based Large-Scale Visual Recognition with Multiple GPUs},
  author={Ding, Weiguang and Wang, Ruoyan and Mao, Fei and Taylor, Graham},
  journal={arXiv preprint arXiv:1412.2302},
  year={2014}
}

Dependencies

How to run

Prepare raw ImageNet data

Download ImageNet dataset and unzip image files.

Preprocess the data

This involves shuffling training images, generating data batches, computing the mean image and generating label files.

Steps

Set paths in the preprocessing/paths.yaml. Each path is described in this file.
Run preprocessing/generate_data.sh, which will call 3 python scripts and do all the mentioned steps. It runs for about 1~2 days. For a quick trial of the code, run preprocessing/generate_toy_data.sh, which takes ~10 minutes and proceed.

preprocessing/lists.txt is a static file that lists what files should be created by running generate_data.sh.

Train AlexNet

Set configurations

config.yaml contains common configurations for both the 1-GPU and 2-GPU version.

spec_1gpu.yaml and spec_2gpu.yaml contains different configurations for the 1-GPU and 2-GPU version respectively.

If you changed preprocessing/paths.yaml, make sure you change corresponding paths in config.yaml, spec_1gpu.yaml and spec_2gpu.yaml accordingly.

Start training

1-GPU version, run:

THEANO_FLAGS=mode=FAST_RUN,floatX=float32 python train.py

2-GPU version, run:

THEANO_FLAGS=mode=FAST_RUN,floatX=float32 python train_2gpu.py

Validation error and loss values are stored as weights_dir/val_record.npy

Here we do not set device to gpu in THEANO_FLAGS. Instead, users should control which GPU(s) to use in spec_1gpu.yaml and spec_2gpu.yaml.

Pretrained AlexNet

Pretrained AlexNet weights and configurations can be found at pretrained/alexnet

Acknowledgement

Frédéric Bastien, for providing the example of Using Multiple GPUs

Lev Givon, for helping on inter process communication between 2 gpus with PyCUDA, Lev's original script https://gist.github.com/lebedov/6408165

Guangyu Sun, for help on debugging the code

theano_alexnet's People

Contributors

Stargazers

Watchers

Forkers

mpezeshki aruneinstein ruoyanwang gwding berleon npow jethrotan pharrell90 cfandy anuraggargnitk tubaybb321 qyouurcs atveit yanweifu jellis505 mrgloom peiswang mesnilgr lenovor smallcattom sjtu-yys scutzb hma02 mindis euwen twistedmove yingmin-li negar-rostamzadeh lixiangnlp fredericmao jiangzhw vipuldivyanshu92 turingki lanlianhuaer dgq2010 tw0023 antticai bikong2 andrubrown zhangaustin sagarjoglekar tvijay333 jimchenhub imgemp anurive hyqleonardo wesavetheworld dogaceylan strin fongfu wanjinchang crobertob hfxunlp whiteisclosing ankitsamantara nobodyonly fangzheng354 ml-ai-nlp-ir myt00seven albertghtoun yfang0823 robi56 kahunalu ilikelucifer highwayns happylifeforever ziyubiti cuiliwen0512 ahagp lt1410 yangliuy supermancc mayanxin89 jackytung wukailun thedarklord2 chenzhongde fengjiran wolfustc johny-c vkhokhla thuxugang strongwolf cyw3 lvjunmei nihao111 jianliangw michelyang kwccoin toxato thomaxyoung zhouyonglong magotraa doddaiah praveenmunagapati zyfnhct davidishere yndu13 techbhatia afcarl

theano_alexnet's Issues

Will train.py end itself?

Hi,

I'm playing with your code and so far it's working nicely in training.

However, after I got the message "Optimization Complete", the program is hanging there.
Shall I quit it directly? or it's running some other operations?
I noticed my CPU usage and GPU temperature are back at normal stage.

Regards,
Hu Yuhuang.

Training Cost NAN

Hi, I would like to train AlexNet on ImageNet. While after 20 iterations, training cost becomes nan.
Here are the details:

Should I set a smaller learning rate? Could you give me some suggestions?

Thank you~

crop size and difference between cudnn and cuda-convnet

I have a wired issue. When I am using the crop size of 227 for the images, as done in the script, I can train the network using both cudnn and cuda-convent libraries. However, when I change the crop size to 224 (and corresponding input size to 224 in the alex_net.py file), I can only run the training using cudna-copnvent library and the cudnn implementation does not work. Can any of the contributors help me on this?

Forward and Backward Propagation

I am interested to see where forward and backward propagation is happening in the code. Can you point me to that specific portion of the code ?

meta data files

I see that we need to have several meta data files listed in paths.yaml like

meta_clsloc_mat: '/mnt/data/datasets/lsvrc_2014/ILSVRC2014_devkit/data/meta_clsloc.mat'
val_label_file: '/mnt/data/datasets/lsvrc_2014/ILSVRC2014_devkit/data/ILSVRC2014_clsloc_validation_ground_truth.txt'

valtxt_filename: '/scratch/ilsvrc12/misc/val.txt'
traintxt_filename: '/scratch/ilsvrc12/misc/train.txt'

Can you please provide these meta data files or point me to the location where we can download these? I have downloaded imagenet images from imagenet site, but do not know where to get these from.

For caffe/torch, these are generated by some scripts.

Prakash

RuntimeError "Could not compile cuda_convnet"

I followed instructions, but couldn't run train.py. Here is my error message:

In file included from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0,
from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
from /usr/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from /usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/cuda_ndarray.cuh:35,
from /home/ivan/Programs/Library/pylearn2/pylearn2/sandbox/cuda_convnet/nvmatrix.cuh:49,
from mod.cu:130:
/usr/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
/usr/local/cuda-6.5/bin/..//include/cublas.h(90): error: more than one instance of overloaded function "cublasGetVersion_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(102): error: more than one instance of overloaded function "cublasSnrm2_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(103): error: more than one instance of overloaded function "cublasDnrm2_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(104): error: more than one instance of overloaded function "cublasScnrm2_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(105): error: more than one instance of overloaded function "cublasDznrm2_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(108): error: more than one instance of overloaded function "cublasSdot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(110): error: more than one instance of overloaded function "cublasDdot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(112): error: more than one instance of overloaded function "cublasCdotu_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(114): error: more than one instance of overloaded function "cublasCdotc_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(116): error: more than one instance of overloaded function "cublasZdotu_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(118): error: more than one instance of overloaded function "cublasZdotc_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(122): error: more than one instance of overloaded function "cublasSscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(123): error: more than one instance of overloaded function "cublasDscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(124): error: more than one instance of overloaded function "cublasCscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(125): error: more than one instance of overloaded function "cublasZscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(127): error: more than one instance of overloaded function "cublasCsscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(128): error: more than one instance of overloaded function "cublasZdscal_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(131): error: more than one instance of overloaded function "cublasSaxpy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(133): error: more than one instance of overloaded function "cublasDaxpy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(135): error: more than one instance of overloaded function "cublasCaxpy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(137): error: more than one instance of overloaded function "cublasZaxpy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(141): error: more than one instance of overloaded function "cublasScopy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(143): error: more than one instance of overloaded function "cublasDcopy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(145): error: more than one instance of overloaded function "cublasCcopy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(147): error: more than one instance of overloaded function "cublasZcopy_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(151): error: more than one instance of overloaded function "cublasSswap_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(152): error: more than one instance of overloaded function "cublasDswap_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(153): error: more than one instance of overloaded function "cublasCswap_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(154): error: more than one instance of overloaded function "cublasZswap_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(157): error: more than one instance of overloaded function "cublasIsamax_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(158): error: more than one instance of overloaded function "cublasIdamax_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(159): error: more than one instance of overloaded function "cublasIcamax_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(160): error: more than one instance of overloaded function "cublasIzamax_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(163): error: more than one instance of overloaded function "cublasIsamin_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(164): error: more than one instance of overloaded function "cublasIdamin_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(166): error: more than one instance of overloaded function "cublasIcamin_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(167): error: more than one instance of overloaded function "cublasIzamin_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(170): error: more than one instance of overloaded function "cublasSasum_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(171): error: more than one instance of overloaded function "cublasDasum_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(172): error: more than one instance of overloaded function "cublasScasum_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(173): error: more than one instance of overloaded function "cublasDzasum_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(176): error: more than one instance of overloaded function "cublasSrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(178): error: more than one instance of overloaded function "cublasDrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(180): error: more than one instance of overloaded function "cublasCrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(182): error: more than one instance of overloaded function "cublasZrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(185): error: more than one instance of overloaded function "cublasCsrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(187): error: more than one instance of overloaded function "cublasZdrot_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(191): error: more than one instance of overloaded function "cublasSrotg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(192): error: more than one instance of overloaded function "cublasDrotg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(193): error: more than one instance of overloaded function "cublasCrotg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(195): error: more than one instance of overloaded function "cublasZrotg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(199): error: more than one instance of overloaded function "cublasSrotm_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(201): error: more than one instance of overloaded function "cublasDrotm_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(205): error: more than one instance of overloaded function "cublasSrotmg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(207): error: more than one instance of overloaded function "cublasDrotmg_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(212): error: more than one instance of overloaded function "cublasSgemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(215): error: more than one instance of overloaded function "cublasDgemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(218): error: more than one instance of overloaded function "cublasCgemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(221): error: more than one instance of overloaded function "cublasZgemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(226): error: more than one instance of overloaded function "cublasSgbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(230): error: more than one instance of overloaded function "cublasDgbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(234): error: more than one instance of overloaded function "cublasCgbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(238): error: more than one instance of overloaded function "cublasZgbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(244): error: more than one instance of overloaded function "cublasStrmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(246): error: more than one instance of overloaded function "cublasDtrmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(248): error: more than one instance of overloaded function "cublasCtrmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(250): error: more than one instance of overloaded function "cublasZtrmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(254): error: more than one instance of overloaded function "cublasStbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(256): error: more than one instance of overloaded function "cublasDtbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(258): error: more than one instance of overloaded function "cublasCtbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(260): error: more than one instance of overloaded function "cublasZtbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(264): error: more than one instance of overloaded function "cublasStpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(266): error: more than one instance of overloaded function "cublasDtpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(268): error: more than one instance of overloaded function "cublasCtpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(270): error: more than one instance of overloaded function "cublasZtpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(273): error: more than one instance of overloaded function "cublasStrsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(275): error: more than one instance of overloaded function "cublasDtrsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(277): error: more than one instance of overloaded function "cublasCtrsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(279): error: more than one instance of overloaded function "cublasZtrsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(283): error: more than one instance of overloaded function "cublasStpsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(286): error: more than one instance of overloaded function "cublasDtpsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(288): error: more than one instance of overloaded function "cublasCtpsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(290): error: more than one instance of overloaded function "cublasZtpsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(294): error: more than one instance of overloaded function "cublasStbsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(298): error: more than one instance of overloaded function "cublasDtbsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(301): error: more than one instance of overloaded function "cublasCtbsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(305): error: more than one instance of overloaded function "cublasZtbsv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(310): error: more than one instance of overloaded function "cublasSsymv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(313): error: more than one instance of overloaded function "cublasDsymv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(316): error: more than one instance of overloaded function "cublasChemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(319): error: more than one instance of overloaded function "cublasZhemv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(324): error: more than one instance of overloaded function "cublasSsbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(327): error: more than one instance of overloaded function "cublasDsbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(330): error: more than one instance of overloaded function "cublasChbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(333): error: more than one instance of overloaded function "cublasZhbmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(338): error: more than one instance of overloaded function "cublasSspmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(341): error: more than one instance of overloaded function "cublasDspmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(344): error: more than one instance of overloaded function "cublasChpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(347): error: more than one instance of overloaded function "cublasZhpmv_v2" has "C" linkage
/usr/local/cuda-6.5/bin/..//include/cublas.h(353): error: more than one instance of overloaded function "cublasSger_v2" has "C" linkage
Error limit reached.
100 errors detected in the compilation of "/tmp/tmpxft_000009c9_00000000-8_mod.cpp1.ii".
Compilation terminated.

['nvcc', '-shared', '-g', '-O3', '-use_fast_math', '-arch=sm_30', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a70a88a837011,-fPIC', '-Xlinker', '-rpath,/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray', '-Xlinker', '-rpath,/usr/local/cuda-6.5/lib', '-Xlinker', '-rpath,/usr/local/cuda-6.5/lib64', '-I/home/ivan/Programs/Library/pylearn2/pylearn2/sandbox/cuda_convnet/', '-I/usr/lib/python2.7/dist-packages/numpy/core/include', '-I/usr/include/python2.7', '-I/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda', '-o', '/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_convnet/cuda_convnet.so', 'mod.cu', '-L/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray', '-L/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_convnet', '-L/usr/local/cuda-6.5/lib', '-L/usr/local/cuda-6.5/lib64', '-L/usr/lib', '-lpython2.7', '-lcublas', '-lcudart']
ERROR (pylearn2.sandbox.cuda_convnet.convnet_compile): Failed to compile /home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_convnet/mod.cu ('nvmatrix_kernels.cu', 'nvmatrix.cu', 'conv_util.cu', 'filter_acts.cu', 'img_acts.cu', 'weight_acts.cu'): ('nvcc return status', 4, 'for cmd', 'nvcc -shared -g -O3 -use_fast_math -arch=sm_30 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=d67f7c8a21306c67152a70a88a837011,-fPIC -Xlinker -rpath,/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray -Xlinker -rpath,/usr/local/cuda-6.5/lib -Xlinker -rpath,/usr/local/cuda-6.5/lib64 -I/home/ivan/Programs/Library/pylearn2/pylearn2/sandbox/cuda_convnet/ -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda -o /home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_convnet/cuda_convnet.so mod.cu -L/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_ndarray -L/home/ivan/.theano/compiledir_Linux-3.13.0-44-generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/cuda_convnet -L/usr/local/cuda-6.5/lib -L/usr/local/cuda-6.5/lib64 -L/usr/lib -lpython2.7 -lcublas -lcudart')
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(_self._args, *_self._kwargs)
File "/home/ivan/Code/saliency/python/deep_object/theano_alexnet-master/train.py", line 56, in train_net
shared_x, shared_y, rand_arr, vels) = compile_models(model, config)
File "/home/ivan/Code/saliency/python/deep_object/theano_alexnet-master/alex_net.py", line 191, in compile_models
(rand, rand_arr)])
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 512, in pfunc
on_unused_input=on_unused_input)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1312, in orig_function
defaults)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1181, in create
_fn, _i, _o = self.linker.make_thunk(input_storage=input_storage_lists)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 434, in make_thunk
output_storage=output_storage)[:3]
File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 847, in make_all
no_recycling))
File "/home/ivan/Programs/Library/pylearn2/pylearn2/sandbox/cuda_convnet/base_acts.py", line 152, in make_thunk
raise RuntimeError('Could not compile cuda_convnet')
RuntimeError: ('The following error happened while compiling the node', <pylearn2.sandbox.cuda_convnet.filter_acts.FilterActs object at 0x7f009a2df350>(GpuContiguous.0, GpuContiguous.0), '\n', 'Could not compile cuda_convnet')

I am not sure where to go from here. Any ideas?

Trained model file

Hello
Are you going to make the trained model available for use?

ValueError: GpuElemwise. Input dimension mis-match

what is the dynamic range of the input pixels that the network accepts?

do it take images in range of 0-1, 0-255, or some preprocessed?

Execute alexnet using ligbpuarray backend

I am trying to execute alexnet using new libgpuarray backend for 1 gpu. The modifications that I have done to the 1 gpu sample are as in - 1gpu_libgpuarray_patch.txt

However, with these changes I get following error -

ValueError: ('The following error happened while compiling the node', DnnVersion(), '\n', 'context name None is already defined')
Complete error log - 1gpu_libgpuarray_error.txt

Further updating train.py to use
theano.gpuarray.use("cuda")instead of theano.gpuarray.use(config['gpu']) then it starts training. But I don't think that this is correct. Please advise.

error on Windows 10

Hi,
Thank you for the repository.
I have installed the requirements and started the process as mentioned.
I could prepare the prepossessed data. However, when I execute Train.py,
" I get the error "ERROR"TypeError: Cannot convert Type TensorType(int32, vector) (of Variable <TensorType(int32, vector)>) into Type TensorType(int64, vector). You can try to manually convert <TensorType(int32, vector)> into a TensorType(int64, vector)."

Results from intermediate stages from ConvPoolLayer

Hi any suggestions on how i can obtain the outputs from the convolution-pre-nonlinearity, the nonlinearity, the normalization steps individually?

pycuda mem_get_ipc_handle() error on Windows 10

I would like to run the theano_alexnet training from this useful github project.
My computer is a Windows 10 native-machine 64 bit Intel core i7. I use WinPython-64bit-3.4.4.4QT5 from WinPython 3.4.4.3, Visual Studio 2015 Community Edition Update 3, CUDA 8.0.44 (64-bit), cuDNN v5.1 (August 10, 2016) for CUDA 8.0, Git source control based on MinGW compiler and OpenBLAS 0.2.14. As fundamental python libraries Theano is 0.9.0beta1 version, Scipy is 0.19.0, Keras 1.2.2, Lasagne 0.2.dev1, Numpy 1.11.1, hickle 2.0.4, h5py 2.6.0, pycuda, pylearn2, zeromq. I received help from theano_group on google. I have successfully pre-processed a subset of the ImageNet data using the script generate_data.sh, which generated all of the expected folders and files. The subset of data that are used are compressed into 195 .hkl (hickle) files for validation (each file is about 50 Mb) in the folder Validation_Alexnet_b256_b_256.0 and 0000_0.hkl, 0000_1.hkl,...0194_0.hkl,0194_1.hkl files (each file is about 25 Mb) in the folder Validation_Alexnet_b256_b_128.0. In the training folder there are no files. When I'm trying to run the train.py it releases me these errors:

C:\deep_learning\alexnet>python train.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 740M (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

... building the model

conv (cudnn) layer with shape_in: (3, 227, 227, 256)
Process Process-1:
Traceback (most recent call last):
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\deep_learning\alexnet\train.py", line 52, in train_net
model = AlexNet(config)
File "C:\deep_learning\alexnet\alex_net.py", line 62, in init
lib_conv=lib_conv,
File "./lib\layers.py", line 168, in init
dnn.dnn_conv(img=input_shuffled[:, :self.channel / 2,
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\var.py", line 540, in getitem
return theano.tensor.subtensor.advanced_subtensor(self, *args)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py", line 604, in call
node = self.make_node(*inputs, **kwargs)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py", line 2140, in make_node
index = tuple(map(as_index_variable, index))
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py", line 2081, in as_index_variable
return make_slice(idx)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py", line 604, in call
node = self.make_node(*inputs, **kwargs)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py", line 39, in make_node
list(map(as_int_none_variable, inp)),
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py", line 20, in as_int_none_variable
raise TypeError('index must be integers')
TypeError: index must be integers

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Someone could help me to know what it is wrong?
Thanks in advance for expert help and your time.
Greetings,
Goffredo

Testing network on CPU

Hi, I would like to use the pretrained model to classify some image and I have a CPU only machine.
Can you give me some instruction on how to do this.

Thank you

Step_idx bug when resuming training

The step_idx should be changed if using pre-trained weight for resuming training state. For example, add
step_idx = epoch / 20 after the following line:
23440c6#diff-e44f4a60e89e820dde9bb27afa634965R98

The same part in train_2gpu.py should be changed as well.

Preprocessing with respect to aspect ratio

As mentioned in #19 here. I think we need to do the preprocessing with respect to the aspect ratio of the image like pytorch is doing here. Instead of simply do

img = scipy.misc.imresize(img, target_shape)

as done here.

This will, however, raise a problem when storing the resulting numpy ndarrays into hickle files. The shapes of the numpy ndarrays will be irregular because the images are of different widths and heights.

This issue is opened for implementing this enhancement feature. If the aspect ratio is respected, the training quality is expected to be improved. Any suggestions are welcome.

Other datasets.

Does it work with ILSVRC 2010 dataset?

For new dataset

Dear MR：

Recently I was trying to do some work on my own dataset but I have some questions on your preprocessing data part. What the .mat file(meta_clsloc_mat)mean? And the val_label_file? Any different with the caffe style validation labels?

Thanks for your work and help!

Merge split layer weights

Do you know how should I merge the the layer weights that are split in the two processing channels (i.e. layers 2,4,5), in order to load them to an alexnet implementation?
I donwloaded the weights, loaded them and merged them along the number-of-filters dimension but the network output is junk.

version of dependencies

Hi,

Can you tell me the version of the different dependencies that you have listed to run your code.

Thanking you.

Kind regards..

Mean image for the pre-trained net?

You conveniently link to a pre-trained net, but it omits the mean image used when training it. Could you make it available?

Can you also use theano_alexnet and load weights from the caffe trained network?

I was trying to load in caffe weights instead of the weights that are used here. The main problem then is how to cope with the grouping feature. As a small test I tried to match W0_1_65 (shape: 48x5x5x128) with the the upper (and lower) half of caffe conv2 weights (shape: 96x5x5x128 after dimshuffle) which is not at all correct. The key thing I am wondering is how to split up caffe weights W for these grouping layers into W0 and W1. Anyone has any ideas?

Many thanks in advance!

An update must have the same type as the original shared variable

Can you tell me the exact version of theano that you use? Because the new version of theano using float64 as gradient but the weight is still float 32, which will cause the problem of update error.

latest update to conda and I started getting this issue: TypeError: slice indices must be integers or None or have an index method

I was able to execute the code on theano0.9.0 on windows and then I update the latest conda package: conda update --all and this issue..

... training
Process Process-1:
Traceback (most recent call last):
File "C:\Users\arjun\Anaconda2\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Users\arjun\Anaconda2\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, self._kwargs)
File "D:\Rough\random\xxx\xxxxxyyyyyy\xx\yy\xxxxxyyyy\train.py", line 127, in train_net
recv_queue=load_recv_queue)
File "D:\Rough\random\xxx\xxxxxyyyyyy\xx\yy\xxxxxyyyy\train_funcs.py", line 166, in train_model_wrap
batch_img = crop_and_mirror(batch_img, param_rand, flag_batch=flag_batch)
File "D:\Rough\random\xxx\xxxxxyyyyyy\xx\yy\xxxxxyyyy\proc_load.py", line 75, in crop_and_mirror
crop_ys:crop_ys + cropsize]
TypeError: slice indices must be integers or None or have an index** method

Error with theano v0.9.0

I observed the following error while executing alexnet with theano v0.9.0:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "train.py", line 54, in train_net
    model = AlexNet(config)
  File "<pathto>/alexnet-theano/alex_net.py", line 130, in __init__
    self.errors_top_5 = softmax_layer8.errors_top_x(y, 5)
  File "./lib/layers.py", line 312, in errors_top_x
    return T.mean(T.min(T.neq(y_pred_top_x, y_top_x), axis=1))
  File "<pathto>/site-packages/theano/tensor/basic.py", line 1742, in min
    raise NotImplementedError()
NotImplementedError

There may be something wrong with your code:

I checked that in make_train_val_txt.py, the value of the variable dict_orig_id_to_sorted_id may be wrong!!

can't reproduce results with pre-trained models

Hi - thanks a lot for releasing your code.
I downloaded the img_mean + parameters of your model.
As a sanity check, I just ran validate_performance with your model but I can't reproduce the accuracy.

Here is the output of the script:

... building the model
conv (cudnn) layer with shape_in: (3, 227, 227, 256)
conv (cudnn) layer with shape_in: (96, 27, 27, 256)
conv (cudnn) layer with shape_in: (256, 13, 13, 256)
conv (cudnn) layer with shape_in: (384, 13, 13, 256)
conv (cudnn) layer with shape_in: (384, 13, 13, 256)
fc layer with num_in: 9216 num_out: 4096
dropout layer with P_drop: 0.5
fc layer with num_in: 4096 num_out: 4096
dropout layer with P_drop: 0.5
softmax layer with num_in: 4096 num_out: 1000
... training
weight loaded: W_0_65
weight loaded: b_0_65
weight loaded: W0_1_65
weight loaded: W1_1_65
weight loaded: b0_1_65
weight loaded: b1_1_65
weight loaded: W_2_65
weight loaded: b_2_65
weight loaded: W0_3_65
weight loaded: W1_3_65
weight loaded: b0_3_65
weight loaded: b1_3_65
weight loaded: W0_4_65
weight loaded: W1_4_65
weight loaded: b0_4_65
weight loaded: b1_4_65
weight loaded: W_5_65
weight loaded: b_5_65
weight loaded: W_6_65
weight loaded: b_6_65
weight loaded: W_7_65
weight loaded: b_7_65

# printing the current filename for validation
/mnt/imagenet/val_hkl_b256_b_256/0000.hkl loaded

# prediction of the network to check with the labels if it was just an index offset problem

prediction [147 778 278 778 490 671 986 976  85 121 678 218 429 628 582 973 741 404
 910 247 439 384 602 865 459 528 781 681 985 467 998 803 309 870 870 654
 311 945 642 989 685 114 422 909 643 996 980  55  77 597 610 401 110 989
 952 649 989 985 541 469 582 871 697 985 812 302 860 586 425 416 518 732
  64  27 984 832 124 948 946 989 669 946 286 485 844 324 855 481 359 738
 949 932 786 324 251 769 122 554 399  58 985 853 510 946 388 325 979  88
 687 986 723 806 199 874 945 640  59 613 317  17 936 678   1 689 780 418
 636 738 109 923 986 723 855 408 310 509 865 695 979 389 946  78  47 118
 690 630 507 631 949 897 122   0 766 943  82 553 995 945 615 813 842 777
 374 298 986 932 536 759 379 458 665 990 611 952 419 678 996 336 951 446
 985 114 383 897 278 805 288 749 589 328 723 648 677 322 409 996 569 357
 672 951 301  77 556 313 644 324 498 291 942 461 677 956 742 954 702 101
 424 836 986 796 956 177 798 879 292  57 667 902 760 825 707 996 678 427
 532 942 945 611 949 937 303 308 307 375 324 388 492 279 945 856 160 997
 990 852 553 990]

label ground truth [ 65 970 230 809 516  57 334 415 674 332 109 286 370 757 595 147 108  23
 478 517 334 173 948 727  23 846 270 167  55 858 324 573 150 981 586 887
  32 398 777  74 516 756 129 198 256 725 565 167 717 394  92  29 844 591
 358 468 259 994 872 588 474 183 107  46 842 390 101 887 870 841 467 149
  21 476  80 424 159 275 175 461 970 160 788  58 479 498 369  28 487  50
 270 383 366 780 373 705 330 142 949 349 473 159 872 878 201 906  70 486
 632 608 122 720 227 686 173 959 638 646 664 645 718 483 852 392 311 457
 352  22 934 283 802 553 276 236 751 343 528 328 969 558 163 328 771 726
 977 875 265 686 590 975 620 637  39 115 937 272 277 763 789 646 213 493
 647 504 937 687 781 666 583 158 825 212 659 257 436 196 140 248 339 230
 361 544 935 638 627 289 867 272 103 584 180 703 449 771 118 396 934  16
 548 993 704 457 233 401 827 376 146 606 922 516 284 889 475 978 475 984
  16  77 610 254 636 662 473 213  25 463 215 173  35 741 125 787 289 425
 973   1 167 121 445 702 532 366 678 764 125 349  13 179 522 493 989 720
 438 660 983 533]

# error on the first valid batch
validation error 100.000000 %
top 5 validation error 100.000000 %
validation loss 16.122633

Any idea on what went wrong here? It goes off for the rest of the dataset too.