stanford-futuredata / dawn-bench-entries Goto Github PK

View Code? Open in Web Editor NEW

256.0 34.0 74.0 506 KB

DAWNBench: An End-to-End Deep Learning Benchmark and Competition

Home Page: http://dawn.cs.stanford.edu/benchmark/

Python 100.00%

imagenet squad cifar10 inference training deeplearning

dawn-bench-entries's People

Contributors

Stargazers

Watchers

Forkers

yanzhaowu yaroslavvb sb2nov aiedward ananthc daisyden brettkoonce chrisying jesterhazy binga congxu-ml dmrd cfld bkj jph00 codyaustun shubhampachori12110095 carmamir bearpelican frankchn davidcpage laurentmazare sleepfin tccccd alyyyyyy shaohuawu2018 yonghoonkwon agdyangkang greenfigo2015 mzhangatge vpomponiu felixgwu terminatorzwm nangeblog baidu-usa-gait-leopard setuc lxylzxc googol-lab pengli scarlett2018 gnuwind leejiajun pengqi220 gammalab-hpc geekerzli codeforfun12 veitch-li lambdacoldstorage lvniqi endacyayisenga codeforfun9 tiagoakle yuv4r4j alibaba-yiwuyao njcurtis3 huaizhengzhang raoxing alicloud-damo-hci listenlink royzon kevin-carrero wangzeyao yellowpsyduck d3v3l0 whu-dft yangkai92 vinhphu3000 eruffaldi habvt bchavan22 socioprophet gkuo06

dawn-bench-entries's Issues

Initialization / data preparation / checkpointing time included?

I assumed the hours field reported in dawn benchmark is between the time each checkpoint is saved and the start of the program. Can we exclude initialization time before training. For example, we can load the entire CIFAR dataset to memory first. And also saving checkpoints to disks is expensive especially when the training is super fast. Can we exclude the checkpointing time as well?

Besides can we report training progress that saved every x epochs?

Clarification about training time for Imagenet

I had a quick question. In measuring the training cost and time should the job be run as:

Series of train and eval steps (1 epoch at a time) and then the total time is measured for both the train and eval combined.
OR
Do the training and store checkpoints at each epoch. This is the training cost.
Eval results can be generated posthoc from the checkpoints but don't contribute to the training time.

What are the guidelines on how should the job be configured here?

where to add repro instructions?

I have reproduction instructions for the resnet50 as a google doc, where should I reference it?
https://docs.google.com/document/d/1I6sjUpU1myzQGqcX3NyezSkChVZIui8Gxo5vove9ZV0/edit#heading=h.gmrlbtx6xbvi

Blacklisted or non-blacklisted validation set:

The ImageNet validation set consists of 50,000 images. In the 2014 devkit, there is a list of 1762 "blacklisted" files. When we report the top-5 accuracy, should we use the blacklisted or non-blacklisted version? In Google's submission, results are obtained by using the full 50,000 including those blacklisted images. But some submissions used the blacklisted version. Just make sure we're comparing the same thing.

Printing # of epochs and training time during training

Hi, @yaroslavvb

What script and flags are you using to get the training results with Resnet50-Imagenet?. I am running the benchmarks tf_cnn_benchmarks.py with Imagenet, and I need to print the epochs and training time during the training. I don't find the flags to activate the printing of this information, or do I need to modify the script to do it?

Here is the display I am getting:
Step Img/sec total_loss top_1_accuracy top_5_accuracy
1 images/sec: 451.0 +/- 0.0 (jitter = 0.0) 8.168 0.003 0.005

AmoebaNet does not use the full validation set

@bignamehyp as per the other thread, setting "eval_batch_size=1024" is incorrect, as it will skip parts of the validation set. Could you rerun the validation checkpoints using a correct batch size and update the TSVs and json files?

Command line and blacklist.sh

@jph00 Sorry I didn't notice this while reviewing the pull request, but from the command line in your commit below, it looks like you still call blacklist.sh (train2.sh#L71-L76). Is that correct? Am I missing something?

dawn-bench-entries/ImageNet/train/fastai_pytorch.json

Line 6 in 03673e5

    
           "codeURL": "https://github.com/fastai/imagenet-fast/blob/c4b225555e333a1a2702d2b291b5082bfa6d6a0a/aws/do_nosh.sh",

Will the DawnBench ImageNet inference match still be played?

Will the DawnBench ImageNet inference match still be played? So far, Only Huawei, Alibaba, Intel, three companies have been submitted. Thanks.

Questions on inference latency

For DAWNBench latency rule,
I have a question and need your confirmation:

when we calculate the latency, could we ignore image processing time?
for example, we hanle image processing(including decoding, resize and crop) offline ?

Thanks

Questions on inference Latency

Hello~
I'm trying to reproduce the PingAn GammaLab & PingAn Cloud team's work, which is the No.1 in inference latency benchmark. This work uses this model to evaluate the inference time.

I notice that this model is not the original resnet50, and the network architecture is quite different from resnet50.

So, could you help me to confirm their real network architecture?
And i'm wondering is it allowed to use these light network like mobilenet there?

Question on inference cost

Hi,
To calculate the inference cost, is it permitted to use dual-instances in one VM?
To say it explicitly:

Launch 2 inference processes(or threads), each serving 25k images in imagenet-2012-val
Get the total time and calculate the average cost for every 10k images.
The formula just like: max[sum(process1_time), sum(process2_time)] / 50 * 10 * vm_cost_per_milliseconds.

For inference latency, we will still use: [sum(process1_time) + sum(process2_time)] / total_images to measure per image latency.

Thx very much.

Clarification about checkpoints in training on Imagenet

We're working on a DAWNBench entry that uses a slice of a TPU pod. Each epoch is processed so quickly on the pod that a significant amount of time is now being spent on saving checkpoints. Would it be possible for us to provide a submission where we only checkpoint once at the end of the training run and then run eval to validate accuracy?

As another possibility, we could provide data on two runs, one with checkpointing enabled for every epoch and the other with checkpointing disabled until the end. You could use the timing of the run without checkpointing but inspect the accuracy values along the way via the auxiliary run with checkpointing.

Please let us know if either of these paths would be acceptable.

Propose to add "number of epochs" column to the "Training Time" ranking result

Since the "number of epochs" is the primary factor determining the actual computation required for training, is it possible to list it in the "Training Time" ranking result as well? Thanks.

Can we exclude the accuracy computation from the inference time and preload the dataset into memory?

Questions on inference latency/cost

Hello,

I am understanding the latency rule in DAWNBench:
• Latency: Use a model that has a top-5 validation accuracy of 93% or greater. Measure the total time needed to classify all 50,000 images in the ImageNet validation set one-at-a-time, and then divide by 50,000

I am not sure how to better understand "one-at-a-time" here, so I raised some questions here and need your confirmation:

Does it allow the pipeline of image processing and CNN inference?
Does it allow preprocessed images (resize and crop done offline)?
Does it allow dummy data?

Thanks.

How to get the time for SQuAD training?

@deepak Narayanan

I can run through the code https://github.com/stanford-futuredata/dawn-bench-models/tree/master/tensorflow/SQuAD with the command: CUDA_VISIBLE_DEVICES=3,4 python -m basic.cli --mode train --noload --len_opt --cluster

But I don't know how to get the time results, such as https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/SQuAD/train/dawn_bidaf_1p100-dawn_tensorflow.tsv
How the get the times for run one epoch?

Resubmission should not be allowed after competition deadline

I noticed that there were resubmissions from fast.ai for the ImageNet training track after competition deadline:
https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/ImageNet/train/fastai_pytorch.json
https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/ImageNet/train/fastai_pytorch.tsv

Their new result was baked from lots of code changes including hyper-parameter tuning and model changes. I don't think it is fair to other participants. Resubmission should not be allowed after competition deadline.

As a fair alternative, I would suggest the organizer to create a new ranking list for ImageNet without those blacklisted images. Any submission prior to the deadline can be ranked in the respective list. This can avoid confusion and also honor the game rules.

Can't find model file (resnet_imagenet.uff) in Huawei inference Latency

In DAWNBench Inference Latency, I saw Huawei got the first place (0.4945ms), which is great. So I want to try to use their code to reinference ImageNet on my T4. However, I can't find model file(resnet_imagenet.uff). @codyaustun
@codeforfun9, could you please provide this file in your repo? If so, I would be grateful.

Thanks

Kindly requesting some info

Hi everyone, thanks for this amazing work.
I was wondering if you could shed some light on the following questions.

I see different benchmarks on different dataset and different hardware but I am having some trouble inferring the following information:

For instance we see a difference by half at training time on TPUs but this between different models, i.e. resnet & amoeba-net.

I would like to know what speed gain if:

We have the same model on same hardware but different version of library, of instance TF 1.7 vs TF 1.8. In other words how much of that speed gain is solely based on the new software release
Test the same model on different hardware but same version of library so that we can understand how much of the percentage gain is solely from the hardware.

Thanks in advance!