Trial on kaggle imagenet object localization by yolov3 in google cloud

project:https://www.kaggle.com/c/imagenet-object-localization-challenge
yolo:https://pjreddie.com/darknet/yolo/
nice yolo explanation:https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-28b1b93e2088
My operation system is Mac OS, although it doesn't matter much, because we do most steps on google cloud.
Let's do it step by step:

1.Basic environment preparation:

I.Apply a google cloud account.

ps:Google provide $300 credit trial time for first time sign up account.

II.Create a ubuntu 16.04 compute engine instance on google cloud, with 400G SSD disk, 4 cores' cpu and, 15G memories.

we will change it a little bit later coz GPU/CPU extention or other reasons, but now it's enough.

III.Apply quotas increasing on Nvidia tesla K80/P100/V100, coz we don't have permission to use gpu default.

GPUs cost credit so fast, so we can choose it by needed. For me, I just increase 1 K80s for test, 4 P100s for training our model, haven't tried on V100 yet.

IV.SSH connection by RSA keys.

#:ssh-keygen -t rsa -f ~/.ssh/gc_rsa -C anynamehere
No pass word is easy for login.
#:cd ~/.ssh
#:vi gc_rsa.pub
then go to google cloud, copy everything in gc_rsa.pub to ubuntu instance SSH key part.
#:chmod 400 gc_rsa
#:ssh -i gc_rsa anynamehere@your google cloud external ip
we can also connect by 'FileZilla', no more words here.

V.pip installation

#:sudo apt update
#:sudo apt upgrade
#:sudo apt-get -y install python-pip
#:sudo apt-get -y install python3-pip

VI.kaggle-cli installation

#:pip install kaggle-cli

2.Dataset download

#:kg download -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge
// dataset is about 160G, so it will cost about 1 hour if your instance download speed is around 42.9 MiB/s.
// let's open another ssh connection to do next step when it's doing the download process.

3.Opencv-3.4.0 installation(we will turn on opencv option in yolo project later for better image processing)

execute all the steps in the following url.
http://www.python36.com/how-to-install-opencv340-on-ubuntu1604/

4.Cuda 9.0 with cudnn 7.0 installation

we can use the fallowing bash script, download it and execute it in instance.
https://gist.github.com/ashokpant/5c4e9481615f54af4025ab2085f85869#file-cuda_9-0_cudnn_7-0-sh

5.Cudnn library configuration

go to https://developer.nvidia.com/rdp/cudnn-download to download cuDNN v7.0.5 Library for Linux CUDA 9.0
it's name should be cudnn-9.0-linux-x64-v7.tgz, we use scp command or filezilla to move this package from local machine to remote instance.
#:scp -i ~/.ssh/gc_rsa Downloads/cudnn-9.0-linux-x64-v7.tgz anynamehere@your google cloud external ip:~/
// come to instance window
#:tar zxvf cudnn-9.0-linux-x64-v7.tgz
#:cd cuda
#:sudo cp include/* /usr/local/cuda-9.0/include/
#:sudo cp lib64/* /usr/local/cuda-9.0/lib64/
#:echo 'export PATH=/usr/local/cuda-9.0/bin:$PATH' >> ~/.bashrc
#:echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc
#:source ~/.bashrc

6.Shutdown instance, add 1 piece of K80 GPU, then boot instance again.

ps:For frugality, we can revise number of cpu cores from 4 to 2
// view GPUs detailed info
#: nvidia-smi
// view the number of CPUs
#:nproc

7.X11 installatoin both of instance and our local machine, so that we can see our predicted image remotely.

#:sudo apt-get install xorg openbox
// what I need on my mac is XQuartz.
// install feh, so that we can see any picture remotely.
#:sudo apt install feh
// logout from instance, connect it with additional parameter, then test it
#:ssh -Y -i ~/.ssh/gc_rsa anynamehere@your google cloud external ip
#:feh darknet/data/dog.jpg

8.Yolo installation

#:git clone https://github.com/pjreddie/darknet
#:cd darknet
#:make

9.Test yolov3

// Actually we've done a good job until now, but we still can't see expected result if we won't change Makefile a little bit,
// I haven't figured out the reason, although let's just change it now.
#:cd darknet
#:sed -i 's/GPU=./GPU=1/' Makefile
#:sed -i 's/CUDNN=./CUDNN=0/' Makefile
#:sed -i 's/OPENCV=./OPENCV=1/' Makefile
#:sed -i 's/OPENMP=./OPENMP=1/' Makefile
#:sed -i 's/DEBUG=./DEBUG=0/' Makefile
#:make
#:wget https://pjreddie.com/media/files/yolov3.weights
#:./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

10.Now let's train it.

I.training data preprocessing.

#:cd ~
#:tar zxvf imagenet_object_localization.tar.gz
// delete package so that we'll have enough disk space.
#:rm imagenet_object_localization.tar.gz
// view disk space info.
#: df -h
// Data preparation
#:unzip LOC_synset_mapping.txt.zip
#:mkdir ILSVRC/Data/CLS-LOC/train/images
#:mv ILSVRC/Data/CLS-LOC/train/n* ILSVRC/Data/CLS-LOC/train/images/
#:mv ILSVRC/Data/CLS-LOC/val/ ILSVRC/Data/CLS-LOC/images
#:mkdir ILSVRC/Data/CLS-LOC/val/
#:mv ILSVRC/Data/CLS-LOC/images ILSVRC/Data/CLS-LOC/val/images
#:git clone https://github.com/mingweihe/ImageNet
#:pip3 install pandas
#:pip3 install pathlib
#:cd ImageNet
// generating all training formatted label files costs about 20 minutes
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/train ../ILSVRC/Data/CLS-LOC/train/labels 1
// generating all validation formatted label files
#:python3 generate_labels.py ../LOC_synset_mapping.txt ../ILSVRC/Annotations/CLS-LOC/val ../ILSVRC/Data/CLS-LOC/val/labels 0
#:cd ~
#:find `pwd`/ILSVRC/Data/CLS-LOC/train/labels/ -name \*.txt > darknet/data/inet.train.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.train.list
#:sed -i 's/labels/images/g' darknet/data/inet.train.list
#:find `pwd`/ILSVRC/Data/CLS-LOC/val/labels/ -name \*.txt > darknet/data/inet.val.list
#:sed -i 's/\.txt/\.JPEG/g' darknet/data/inet.val.list
#:sed -i 's/labels/images/g' darknet/data/inet.val.list

II.pretrained weights preparation.

#:cd darknet
#:wget https://pjreddie.com/media/files/darknet53.conv.74

III.Traning

#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg darknet53.conv.74
// we can also restart training from a checkpoint:
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup

IV.Training with multiple GPUs

// shutdown instance, increase number of GPUs from 1 piece's K80 to 4 pieces' P100, with 6 CPUs.
// boot instance, start training using following command
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// continue from checkpoints we can replace darknet53.conv.74 with backup file.

V.Training without ssh connection

#:screen
#:./darknet detector train ~/ImageNet/ILSVRC.data ~/ImageNet/yolov3-ILSVRC.cfg backup/yolov3-ILSVRC.backup -gpus 0,1,2,3
// press Keys of <Ctrl+a>
// then press <d> key
// now we have our task done detached from our local machine
// If we wanna put task back, we can connect ssh, then:
#:screen -r
// For more detailed instruction, just google "linux screen detach".

11.Prediction and transfer predictions into CSV file.

#:unzip LOC_sample_submission.csv.zip
#:mkdir ~/submissions
#:python3 ~/ImageNet/predict.py

12.Submit our predictions.

#kg submit <submission-file> -u <your kaggle username> -p <your kaggle password> -c imagenet-object-localization-challenge -m "my submission"
(optional way is submitting it on kaggle website by using any web browser.)

fengrk / imagenet Goto Github PK

imagenet's Introduction