Deep Learning Breakthrough Results In A 44-Core Intel Xeon Destroying NVIDIA Tesla V100 GPU

Mar 5, 2020

[Edited 1:26 PM GMT+5] It would appear that the press release was a bit misleading. The actual comparison here is between a 2P system housing two 22-core Xeon CPUs with hyperthreading disabled vs one single Tesla V100. It is still an absolutely substantial speedup and the ramifications are more or less the same but I apologise for the error. Changes have been made in the original text wherever needed.

Something that will almost certainly be followed very closely by investors and professionals alike just occurred during a collaboration between Rice University and Intel Corporation. In what appears to be an absolutely insane speedup, researchers were able to use a 44-core Intel Xeon setup to beat an NVIDIA Tesla V100 by 3.5 times! CPUs usually perform far worse than GPUs when it comes to training deep neural networks (because of the highly parallel architecture) and this would be the first time a CPU has been leveraged this effectively for deep learning.

SLIDE algorithm makes a 44-core Intel Xeon CPU setup 3.5 times faster than NVIDIA Tesla V100 GPUs in AI deep learning

It has become almost common sense that GPUs will always be far superior to CPUs when it comes to training DL (deep neural) networks but these researchers from Rice University have succeeded in questioning this very basic tenet of DL. For what seems to be the very first time, a CPU has not only matched but absolutely destroyed GPU-based implementations and resulted in a confoundingly huge speedup.

SLIDE lead inventor Anshumali Shrivastava is an assistant professor of computer science in Rice University’s Brown School of Engineering. (Photo by Jeff Fitlow/Rice University)

Before we go any further, here is an extract from their press release:

Rice University computer scientists have overcome a major obstacle in the burgeoning artificial intelligence industry by showing it is possible to speed up deep learning technology without specialized acceleration hardware like graphics processing units (GPUs).

SLIDE doesn’t need GPUs because it takes a fundamentally different approach to deep learning. The standard “back-propagation” training technique for deep neural networks requires matrix multiplication, an ideal workload for GPUs. With SLIDE, Shrivastava, Chen and Medini turned neural network training into a search problem that could instead be solved with hash tables.

This radically reduces the computational overhead for SLIDE compared to back-propagation training. For example, a top-of-the-line GPU platform like the ones Amazon, Google and others offer for cloud-based deep learning services has eight Tesla V100s and costs about $100,000, Shrivastava said.

“We have one in the lab, and in our test case we took a workload that’s perfect for V100, one with more than 100 million parameters in large, fully connected networks that fit in GPU memory,” he said. “We trained it with the best (software) package out there, Google’s TensorFlow, and it took 3 1/2 hours to train.

“We then showed that our new algorithm can do the training in one hour, not on GPUs but on a 44-core Xeon-class CPU,” Shrivastava said. A copy of the research paper is available here.

Interestingly, however, Intel doesn't have a publicly available 44-core Xeon out right now. So, one of three possible things has happened here: 1) this is an unreleased and upcoming Intel Xeon, 2) the test was conducted using a single 22 core processor (which had 44 threads and the researchers erroneously referred to it as 44 cores) or 3) the test was conducted using 2x 22-cores in a 2P system.

The algorithm dubbed SLIDE (Sub LInear Deep learning Engine) is currently only executable on Intel processors. If an implementation of this algorithm is mainstreamed it would almost instantly disrupt the dynamics of the deep learning ecosystem. Valuations of companies could change overnight (assuming what the researchers are claiming has no caveat attached). It also raises the interesting question of whether the approach can be replicated on an AMD processor.

In any event, pending validation of this technique, we should see a significant amount of demand added to Intel's already lopsided supply equation. It would seem that as long as Intel can produce its processors, they have pent up demand as far as the eye can see.

News Source: Deep learning rethink overcomes major obstacle in AI industry