Stanford DAWNBench v1 Results: Intel’s Xeon Takes Inference Crown, NVIDIA V100 And Google TPU Achieve New Performance Milestones

Usman Pirzada • May 9, 2018 at 12:42pm EDT

Researchers at Stanford have posted the results of the Dawn Benchmark and Competition and contains some interesting numbers that show how much of a difference optimization can make to training times and cost. Interestingly, the results appear to show that there is no single all-rounded winner when it comes to AI workloads, in fact the result is a splattering of achievements between Intel's Xeon, Google's TPU v2 and NVIDIA's graphics processors.

Intel Xeon-only configuration takes the inference latency and cost efficiency throne

While I would urge anyone seriously interested in the results to head over to the results page and see it in its entirety, we have taken the liberty to pick out some of the jucier bits and post them below. The community was able to achieve some truly impressive feats of performance optimization and cost efficiency. Where previously it took more than 10 days to train ImageNet, it can now be done in just under 31 minutes by using half a Google TPU v2 pod showcasing a speed up of 477x.

The inference and cost champion on the other hand turned out to be Intel Xeon Scalable processors (no GPUs) which were able to process 10,000 images for the mere price of $0.02 and a latency of 9.96 milliseconds. The researchers were using an Intel Optimized Caffe and the closest competition was using an NVIDIA K80 GPU along with 4 CPUs for a cost of $0.07 and a latency of 29.4 ms. Needless to say, this is quite an impressive achievement considering you can get a miulti-factor performance and cost upgrade using only CPUs.

Team from fast.AI achieves results faster than advertised by NVIDIA using 8x V100s and sets new CIFAR10 record

Another highlight of the event was the team from fast.AI which used an innovative method to drastically reduce training times and using 8x V100 GPUs set a new land speed record for CIFAR10 training. The approach initially feeds the net with low resolution images to reduce processing time in the start and gradually increases the resolution. This method cuts down on training times without compromising on any final accuracy of the model.

In fact, the fast.AI team was able to achieve a 52x speedup using the NVIDIA V100s and drop the training time from 2 hours 31 minutes all the way down to 2 minutes and 54 seconds. In doing so, they also managed to reduce the cost from $8.35 to $0.26. In fact, they even demonstrated that you can train a model on CIFAR10 in a reasonable amount of time for free using nothing but Google Colaborator.

Other curated highlights from the first iteration of DAWNBench v1:

For ImageNet inference, Intel submitted the best result in both cost and latency. Using an Intel optimized version of Caffe on high performance AWS instances, they reduced per image latency to 9.96 milliseconds and processed 10,000 images for $0.02.

ResNet50 can now be trained on ImageNet in as little as 30 minutes with checkpointing and 24 minutes without checkpointing using half of a Google TPUv2 Pod, representing a 477x speed-up!

The cheapest submission for ResNet50 on ImageNet ran in 8 hours 53 minutes for a total of $58.53 on a Google TPUv2 machine using TensorFlow 1.8.0-rc1, which is a 19x cost improvement over our best seed entry that used 8 Nvidia K80 GPUs on AWS.

Other hardware and cloud providers weren’t far behind! Using PyTorch with 8 Nivida V100 GPUs on AWS, fast.ai was able to train ResNet50 in 2 hours 58 minutes for a total of $72.50 with a progressive resizing technique from “Progressive Growing of GANs for Improved Quality, Stability, and Variation” and “Enhanced Deep Residual Networks for Single Image Super-Resolution” that increased the resolution of images over training to get higher throughput (images per second) at the beginning without loss in final accuracy.

With only CPUs, Intel used 128 AWS instances with 36 cores each to train ImageNet in 3 hours and 26 minutes.

ResNet164 from “Identity Mappings in Deep Residual Networks ” that trained in 2 hours and 31 minutes on a Nvidia P100, training time fell to 2 minutes and 54 seconds thanks to fast.ai and their student team. Using a Custom Wide ResNet architecture and 8 Nvidia V100s, they achieved a 52x speed-up.

The team from fast.ai also dropped training cost from $8.35 to $0.26. Going even further they showed you can train a model on CIFAR10 in a reasonable amount of time for free using Google Colaboratory.

via DAWNBench v1, Stanford

About the author: PC Hardware and Technology Enthusiast, Blood of Silicon (1 nm),

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Stanford DAWNBench v1 Results: Intel’s Xeon Takes Inference Crown, NVIDIA V100 And Google TPU Achieve New Performance Milestones

Stanford DAWNBench v1 Results: Intel’s Xeon Takes Inference Crown, NVIDIA V100 And Google TPU Achieve New Performance Milestones

Intel Xeon-only configuration takes the inference latency and cost efficiency throne

Team from fast.AI achieves results faster than advertised by NVIDIA using 8x V100s and sets new CIFAR10 record

Trending Stories

Intel’s Former CEO Gelsinger Admits Firm ‘Scoffed’ at NVIDIA’s GPUs While Riding High on CPU Dominance & Makes Big Quantum Computing Claims

Square Enix’s Final Fantasy VII Rebirth Looks Like a Remaster on PC, as Shader Injector 2.0 Delivers Series’ Best Visuals

Crimson Desert’s BlackSpace Engine Topped Death Stranding 2 and DOOM for Best Technical Innovation While Patch 1.14 Rolls Out Cross-Save

GameStop May Have Leaked Zelda: Ocarina of Time Remake Pre-Orders for August 4, Hinting First Real Footage Isn’t Far

Ubisoft Copies The Crimson Desert’s Playbook, As Assassin’s Creed Black Flag Resynced Ditches Roadmap For Community Feedback

Popular Discussions

AMD Radeon Drivers Silently Add Multi Frame Generation “MFG 8x”, Ray Regeneration, and Neural Radiance Overrides, Hinting At A Bigger FSR Push

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

Stanford DAWNBench v1 Results: Intel’s Xeon Takes Inference Crown, NVIDIA V100 And Google TPU Achieve New Performance Milestones

Related Story Intel’s Former CEO Gelsinger Admits Firm ‘Scoffed’ at NVIDIA’s GPUs While Riding High on CPU Dominance & Makes Big Quantum Computing Claims

Intel Xeon-only configuration takes the inference latency and cost efficiency throne

Team from fast.AI achieves results faster than advertised by NVIDIA using 8x V100s and sets new CIFAR10 record

Further Reading

Trending Stories

Popular Discussions