Coder Social home page Coder Social logo

imagenet-fast's Introduction

End to end ImageNet training

  1. Spin up an AWS instance
  2. Train ImageNet
  3. Save weights and loss

AWS

Check README in AWS directory for instructions on...

  1. Creating Spot instances
  2. Downloading and formatting ImageNet
  3. Running CIFAR10 on an instance

FP16

  1. For testing fast.ai model compatability with single precision floating point

ImageNet

Stay tuned!

cifar10

imagenet-fast's People

Contributors

bearpelican avatar brettkoonce avatar jph00 avatar shoaibahmed avatar sjdlloyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imagenet-fast's Issues

How to use Alexnet with this library?

One of the examples in the readme file is about Alexnet, but it looks like it is not supported in the fast_imagenet.py. Am I missing something or support for this network is discontinued?

DAWN 3 hr training version

Hi all,

I was trying to reproduce the 2:57:28 training time mentioned here https://dawn.cs.stanford.edu/benchmark/#imagenet-train-time.

I was wondering which .py file should I use, imagenet_nv/main.py or imagenet_nv/fastai_imagenet.py?

Also, when I tried to run imagenet_nv/fastai_imagenet.py, I got the following error:

What I executed:

python fastai_imagenet.py $imagenetDir -a resnet18 --save-dir logs/ --prof

What I got:

Running script with args: Namespace(arch='resnet18', batch_size=256, cycle_len=1, data='/media/data001/jingyao/imagenet_object_localization/ILSVRC/Data/CLS-LOC/', dist_backend='nccl', dist_url='file://sync.file', epochs=90, fp16=False, loss_scale=1, lr=0.1, momentum=0.9, pretrained=False, print_freq=50, prof=True, rank=0, resume='', save_dir='logs/', sz=224, train_128=False, use_clr=None, use_tta=False, weight_decay=0.0001, workers=4, world_size=1)
Traceback (most recent call last):
File "fastai_imagenet.py", line 354, in
main()
File "fastai_imagenet.py", line 337, in main
loss_scale=args.loss_scale, **sargs)
File "/home/jingyao/Research/fastai/fastai/learner.py", line 302, in fit
return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
File "/home/jingyao/Research/fastai/fastai/learner.py", line 249, in fit_gen
swa_eval_freq=swa_eval_freq, **kwargs)
File "/home/jingyao/Research/fastai/fastai/model.py", line 123, in fit
model_stepper = stepper(model, opt.opt if hasattr(opt,'opt') else opt, crit, **kwargs)
TypeError: init() got an unexpected keyword argument 'sampler'

I also realized that #9 has a similar issue, and can you @bearpelican look into it please? Any help would be much appreciated!

What is the blacklist?

I see that the script blacklist.sh removes about 1.7k images from the ImageNet val folders. Why do you do that?

Not able to run dawn_mod.py

Hi Folks, thank you for doing this great work at fastai.

I am trying to reproduce the following

python dawn_mod.py ~/data/cifar10/ --save-dir ~/data/cf_train_save/wrn_v5 -a wrn_22 --fp16 --loss-scale 512 --epochs 1 --cycle-len 30 --lr 1.5 --wd 1e-4 --use-clr 20,20,0.95,0.85

and I am getting this error:

Traceback (most recent call last):
  File "dawn_mod.py", line 343, in <module>
    main()
  File "dawn_mod.py", line 329, in main
    **sargs)
  File "/home/ammar/.local/lib/python3.6/site-packages/fastai/learner.py", line 287, in fit
    return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
  File "/home/ammar/.local/lib/python3.6/site-packages/fastai/learner.py", line 234, in fit_gen
    swa_eval_freq=swa_eval_freq, **kwargs)
  File "/home/ammar/.local/lib/python3.6/site-packages/fastai/model.py", line 112, in fit
    model_stepper = stepper(model, opt.opt if hasattr(opt,'opt') else opt, crit, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'sampler'

Any ideas? Has the API for fastai changed? I used pip install for all the dependencies and fastai itself.

How to use multi-GPU to train models

When I use python -m multiproc main.py ..., I meet following error:

Traceback (most recent call last):
File "/home/ssm/imagenet-fast-master/imagenet_nv/main.py", line 418, in
if name == 'main': main()
File "/home/ssm/imagenet-fast-master/imagenet_nv/main.py", line 117, in main
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size)
File "/home/ssm/miniconda3/envs/fastai/lib/python3.6/site-packages/torch/distributed/init.py", line 49, in init_process_group
group_name, rank)
RuntimeError: _Map_base::at

My environment is as follows:
Ubuntu 16.04

part of conda environments:
torch==0.3.1
torchtext==0.2.3
torchvision==0.2.0
fastai==0.7.0

CUDA half precision check not getting 884 calls

I'm using nvidia-docker with CUDA 9 and GTX 1080's, I think they support half precision?

Ran the steps to check if CUDA is utilizing fp16 and cat nvprof_output.txt | grep fp16_s884 returned nothing. What should I do next?

profile_fp16.py output:

cuda version= 9.0.176
cudnn version= 7102
vgg16 FP 32 avg over 100 runs: 136.0033893585205 ms
vgg16 FP 16 avg over 100 runs: 112.796950340271 ms

How to use multi-GPU to train models

When I use python -m multiproc main.py ..., I meet following error:

Traceback (most recent call last):
File "/home/ssm/imagenet-fast-master/imagenet_nv/main.py", line 418, in
if name == 'main': main()
File "/home/ssm/imagenet-fast-master/imagenet_nv/main.py", line 117, in main
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size)
File "/home/ssm/miniconda3/envs/fastai/lib/python3.6/site-packages/torch/distributed/init.py", line 49, in init_process_group
group_name, rank)
RuntimeError: _Map_base::at

My environment is as follows:
Ubuntu 16.04

part of conda environments:
torch==0.3.1
torchtext==0.2.3
torchvision==0.2.0
fastai==0.7.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.