stanford-futuredata / dawn-bench-entries Goto Github PK
View Code? Open in Web Editor NEWDAWNBench: An End-to-End Deep Learning Benchmark and Competition
Home Page: http://dawn.cs.stanford.edu/benchmark/
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Home Page: http://dawn.cs.stanford.edu/benchmark/
I assumed the hours field reported in dawn benchmark is between the time each checkpoint is saved and the start of the program. Can we exclude initialization time before training. For example, we can load the entire CIFAR dataset to memory first. And also saving checkpoints to disks is expensive especially when the training is super fast. Can we exclude the checkpointing time as well?
Besides can we report training progress that saved every x epochs?
I had a quick question. In measuring the training cost and time should the job be run as:
What are the guidelines on how should the job be configured here?
I have reproduction instructions for the resnet50 as a google doc, where should I reference it?
https://docs.google.com/document/d/1I6sjUpU1myzQGqcX3NyezSkChVZIui8Gxo5vove9ZV0/edit#heading=h.gmrlbtx6xbvi
The ImageNet validation set consists of 50,000 images. In the 2014 devkit, there is a list of 1762 "blacklisted" files. When we report the top-5 accuracy, should we use the blacklisted or non-blacklisted version? In Google's submission, results are obtained by using the full 50,000 including those blacklisted images. But some submissions used the blacklisted version. Just make sure we're comparing the same thing.
Hi, @yaroslavvb
What script and flags are you using to get the training results with Resnet50-Imagenet?. I am running the benchmarks tf_cnn_benchmarks.py with Imagenet, and I need to print the epochs and training time during the training. I don't find the flags to activate the printing of this information, or do I need to modify the script to do it?
Here is the display I am getting:
Step Img/sec total_loss top_1_accuracy top_5_accuracy
1 images/sec: 451.0 +/- 0.0 (jitter = 0.0) 8.168 0.003 0.005
@bignamehyp as per the other thread, setting "eval_batch_size=1024" is incorrect, as it will skip parts of the validation set. Could you rerun the validation checkpoints using a correct batch size and update the TSVs and json files?
@jph00 Sorry I didn't notice this while reviewing the pull request, but from the command line in your commit below, it looks like you still call blacklist.sh (train2.sh#L71-L76). Is that correct? Am I missing something?
Will the DawnBench ImageNet inference match still be played? So far, Only Huawei, Alibaba, Intel, three companies have been submitted. Thanks.
For DAWNBench latency rule,
I have a question and need your confirmation:
when we calculate the latency, could we ignore image processing time?
for example, we hanle image processing(including decoding, resize and crop) offline ?
Thanks
Hello~
I'm trying to reproduce the PingAn GammaLab & PingAn Cloud team's work, which is the No.1 in inference latency benchmark. This work uses this model to evaluate the inference time.
I notice that this model is not the original resnet50, and the network architecture is quite different from resnet50.
So, could you help me to confirm their real network architecture?
And i'm wondering is it allowed to use these light network like mobilenet there?
Hi,
To calculate the inference cost, is it permitted to use dual-instances in one VM?
To say it explicitly:
For inference latency, we will still use: [sum(process1_time) + sum(process2_time)] / total_images to measure per image latency.
Thx very much.
We're working on a DAWNBench entry that uses a slice of a TPU pod. Each epoch is processed so quickly on the pod that a significant amount of time is now being spent on saving checkpoints. Would it be possible for us to provide a submission where we only checkpoint once at the end of the training run and then run eval to validate accuracy?
As another possibility, we could provide data on two runs, one with checkpointing enabled for every epoch and the other with checkpointing disabled until the end. You could use the timing of the run without checkpointing but inspect the accuracy values along the way via the auxiliary run with checkpointing.
Please let us know if either of these paths would be acceptable.
Since the "number of epochs" is the primary factor determining the actual computation required for training, is it possible to list it in the "Training Time" ranking result as well? Thanks.
Hello,
I am understanding the latency rule in DAWNBench:
โข Latency: Use a model that has a top-5 validation accuracy of 93% or greater. Measure the total time needed to classify all 50,000 images in the ImageNet validation set one-at-a-time, and then divide by 50,000
I am not sure how to better understand "one-at-a-time" here, so I raised some questions here and need your confirmation:
Thanks.
@deepak Narayanan
I can run through the code https://github.com/stanford-futuredata/dawn-bench-models/tree/master/tensorflow/SQuAD with the command: CUDA_VISIBLE_DEVICES=3,4 python -m basic.cli --mode train --noload --len_opt --cluster
But I don't know how to get the time results, such as https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/SQuAD/train/dawn_bidaf_1p100-dawn_tensorflow.tsv
How the get the times for run one epoch?
I noticed that there were resubmissions from fast.ai for the ImageNet training track after competition deadline:
https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/ImageNet/train/fastai_pytorch.json
https://github.com/stanford-futuredata/dawn-bench-entries/blob/master/ImageNet/train/fastai_pytorch.tsv
Their new result was baked from lots of code changes including hyper-parameter tuning and model changes. I don't think it is fair to other participants. Resubmission should not be allowed after competition deadline.
As a fair alternative, I would suggest the organizer to create a new ranking list for ImageNet without those blacklisted images. Any submission prior to the deadline can be ranked in the respective list. This can avoid confusion and also honor the game rules.
In DAWNBench Inference Latency, I saw Huawei got the first place (0.4945ms), which is great. So I want to try to use their code to reinference ImageNet on my T4. However, I can't find model file(resnet_imagenet.uff). @codyaustun
@codeforfun9, could you please provide this file in your repo? If so, I would be grateful.
Thanks
Hi everyone, thanks for this amazing work.
I was wondering if you could shed some light on the following questions.
I see different benchmarks on different dataset and different hardware but I am having some trouble inferring the following information:
For instance we see a difference by half at training time on TPUs but this between different models, i.e. resnet & amoeba-net.
I would like to know what speed gain if:
We have the same model on same hardware but different version of library, of instance TF 1.7 vs TF 1.8. In other words how much of that speed gain is solely based on the new software release
Test the same model on different hardware but same version of library so that we can understand how much of the percentage gain is solely from the hardware.
Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.