Stanford DAWNBench v1 Results: Intel’s Xeon Takes Inference Crown, NVIDIA V100 And Google TPU Achieve New Performance Milestones

Author Photo
May 9, 2018

Researchers at Stanford have posted the results of the Dawn Benchmark and Competition and contains some interesting numbers that show how much of a difference optimization can make to training times and cost. Interestingly, the results appear to show that there is no single all-rounded winner when it comes to AI workloads, in fact the result is a splattering of achievements between Intel’s Xeon, Google’s TPU v2 and NVIDIA’s graphics processors.


intel-graphics-card-discrete-gaming-solution_7Related Intel Teases ‘Arctic Sound’ Discrete Graphics Cards For 2020 – First Full on Dedicated Products Aimed at The Gaming Market

Intel Xeon-only configuration takes the inference latency and cost efficiency throne

While I would urge anyone seriously interested in the results to head over to the results page and see it in its entirety, we have taken the liberty to pick out some of the jucier bits and post them below. The community was able to achieve some truly impressive feats of performance optimization and cost efficiency. Where previously it took more than 10 days to train ImageNet, it can now be done in just under 31 minutes by using half a Google TPU v2 pod showcasing a speed up of 477x.

The inference and cost champion on the other hand turned out to be Intel Xeon Scalable processors (no GPUs) which were able to process 10,000 images for the mere price of $0.02 and a latency of 9.96 milliseconds. The researchers were using an Intel Optimized Caffe and the closest competition was using an NVIDIA K80 GPU along with 4 CPUs for a cost of $0.07 and a latency of 29.4 ms. Needless to say, this is quite an impressive achievement considering you can get a miulti-factor performance and cost upgrade using only CPUs.

27149-39968-foreshadow-vulnerability-logo-2-lRelated Move over Spectre and Meltdown: Intel Details New “Foreshadow” Security Exploit, Stock Price Sinks

Team from fast.AI achieves results faster than advertised by NVIDIA using 8x V100s and sets new CIFAR10 record

Another highlight of the event was the team from fast.AI which used an innovative method to drastically reduce training times and using 8x V100 GPUs set a new land speed record for CIFAR10 training. The approach initially feeds the net with low resolution images to reduce processing time in the start and gradually increases the resolution. This method cuts down on training times without compromising on any final accuracy of the model.

In fact, the fast.AI team was able to achieve a 52x speedup using the NVIDIA V100s and drop the training time from 2 hours 31 minutes all the way down to 2 minutes and 54 seconds. In doing so, they also managed to reduce the cost from $8.35 to $0.26. In fact, they even demonstrated that you can train a model on CIFAR10 in a reasonable amount of time for free using nothing but Google Colaborator.

Other curated highlights from the first iteration of DAWNBench v1:

via DAWNBench v1, Stanford