Nvidia Takes Cloud Computing To The Next Level With The GeForce GRID

Sabeeh Qureshi
Jun 4, 2012

Nvidia has taken another quantum leap into the world of virtual computing with their new Kepler GK110 based VGX Board graphics cards that utilize 4 GPUs on a single board with a whooping 16 GB frame buffer.

Related NVIDIA’s Fastest Gaming Graphics Card Goes Nano as ZOTAC Unleashes GTX 1080 Ti ArcticStorm Mini and GTX 1080 Ti Mini

So one can expect the power of cloud computing with four GK104 GPUs running in parallel with a support of up to 256 virtual machines at any given time. Nvidia aims to make it’s ‘GeForce GRID’ cloud even more affordable by making 3D games run on local machines without the need of a separate graphics card.

Check out the press release below for more details.


According to the NVIDIA GPU virtualization initiative

Related EVGA’s Magnum Opus, The GeForce GTX 1080 Ti Kingpin Edition Teased – Most Powerful PCB To Date and an Overclockers Dream, Arriving at Computex

NVIDIA, by Kepler at (Keflavík) architecture, hardware support for virtual machines on the GPU, is trying to penetrate the data center the GPU. System that runs on the server side graphics processing, is to transfer H.264 to the client. Just as you can freely use the CPU resources on the server side by the cloud, the future is that the client is trying to also be available GPU resources on the server side.

First in the application of this concept, in the cloud gaming, is on a virtual machine to support a multi-gamer, gaming platform to enable the cloud “GeForce GRID” more economical. By cloud, 3D games are going to advanced overturn the common sense thing to run on the local machine.

And for a more general introduction to the data center, to provide a platform GPU “NVIDIA VGX” for servers which require the use of virtualization. Major aim of the NVIDIA seen, and to make it into the normal GPU in the server room eventually. Seen as NVIDIA’s strategy in general and from the client to take advantage of the GPU, is that it attempts to spread the GPU server.

From now on, will be a phase to evaluate the data center, whether it is sufficiently practical this idea of ​​NVIDIA. The key there is, but the overhead of switching between virtual machines on virtualization. In other words, that the latency for users. Although Kepler GPU NVIDIA has been and can support a maximum of 256 virtual machines, this is a limit on the specs, a practical limit, you will have to use considering the overhead.

● What was difficult to GPU virtualization why

NVIDIA has been described, even in the cloud gaming, and the latency can be reduced sufficiently important for the game. Slide below is it. Pipeline for execution of the game itself is accelerated by the GPU more advanced game consoles, which indicates that the latency of level Osaekome does not impair the game play even after taking into account the network delay. The generated image on the server, then transfer encoded in H.264, to be decoded on the client. H.264, in order to be hardware accelerated GPU client server and GPU, this delay will be shortened. You are by it, and can be realized in the life of short latency of action-based games.

However, this figure is limited to game pipeline, but a little tricky. This is because the overhead of switching because of the virtual machine is not shown. Is the foundation of cloud strategy gaming NVIDIA, if there is to the support provided by the virtualization of the GPU, the gamer more than one on the GPU, but must also mention the latency when switching between the virtual machine, and the figure it is shown not. And this time, NVIDIA about this essential part, was not carried out a detailed description.

What is the problem and said switching overhead of the virtual machine so why, the switch itself can be very heavy task in the first place because GPU. It is because the amount of state of the processor architecture level that must be preserved is enormous. Also NVIDIA, it’s important has been this problem, even in this GTC, Mr. Jen-Hsun Huang had been described as follows, the difficulty of switching virtual machine.

(The virtualization of the GPU) has been working for more than five years. However, in the. GPU pipeline which has been difficult (because you are over massively parallel) in order to hide the latency, holding inside the State enormous ” because that. State (in each CPU core if CPU) whereas with 8 registers, each with two threads. reach up to a few MB, (overall GPU is) thread of thousands with the State of kilobytes (KB) respectively . and. will be able to compute state be enormous, for each virtual user to share the GPU, shall have the task to run the track in its entirety (State of) “.

Because there is a background on these architectures, to support the virtualization of the GPU, it is necessary to some sort of technique. Hurdle is much higher than the virtualization of the CPU. This review is one of the biggest challenges from previous GPU, was also a part of the focal points of development. Then, in the GTC, is actually doing a demo of the cloud gaming showed that NVIDIA was able to solve this problem.

● Two types of GPU virtualization

Slightly misleading, in the PC industry, but six years ago that provides a roadmap of virtualization “GPU”. GPU virtualization has started from the introduction “Windows Display Driver Model” in Windows Vista. At this time, was called under the name of virtualization GPU, the GPU multi-task throughout the industry. At that point, the CPU, because virtualization has been used as terms of the meanings of hardware-assisted virtual machine, this name is somewhat confusing. Admitted that at that time, the name is confusing even NVIDIA, “What we are aiming at the WDDM, exactly is the task of multi-GPU” had been explained. WDDM is a road map of the time under 2006 and currently has been changed, has become a preemptive multi-tasking is a WDDM v1.2.

It has shown this time NVIDIA, GPU virtualization and the last layer is a different stage, rather than OS-level virtualization device, has become a hardware-assisted virtual machine at the hypervisor level. Rather than become a shared resource for multiple applications on the GPU is virtualized OS, and shared resources of multiple virtual machines are GPU virtualization hypervisor. In order to distinguish, NVIDIA is now supported virtual machine is called the de Virtualized GPU.

However, this GPU in Virtualized mode, function must be implemented in the GPU is based on the functionality of the multi-task, which was a plus. Virtualization layer two, GPU problem facing are common to some extent.

Incidentally, in multi-GPU tasks that are controlled by WDDM, the road map going to be able to switch in the fine-grained tasks on the GPU has been shown in stages. The biggest challenge has been described, but to switch the processor state and how even at this time. As a primitive way, there is a way to switch from batch finished executing all the best always. However, if you try to support the full preemptive multitasking, it is necessary to switch in the middle of execution of the program, retention of the State becomes a problem.

● switch problem or if the state of the GPU

The solution is to choose whether to become a high speed switch of the State in some way, or to be able to hold in the GPU more than one State, that, or a combination of the two. Years ago, 5, AMD about this issue, have been described as follows. “Verily, whether to processing in parallel on the GPU the context of multiple, whether to faster time-division on the GPU or, much more difficult than. There will be an issue, on the function, the context of multiple to make it run efficiently “(David were enrolled in the AMD at the time (Dave) E. Orton Mr. (Dave Orton) Executive Vice President yuan (, Visual and Media Businesses, AMD)) on the GPU was split.

GPU vendor, there is a need to support preemptive multi-tasking in the granularity of WDDM v1.2 for Windows 8, has now been focused on this part. Inferred for Windows 8, is functionally reform should have been large, support for virtual machine, and made possible by its extension. As pointed out by AMD, it will choose a mix of state belonging to a different process on the GPU has been divided as multi-core CPU, or switch in the time division as CPU, called, and combinations thereof wax.

Are heading for the current GPU, and to multi-core architecture close to, it is also possible to support the task / different virtual machine to split the region GPU. For example, if the NVIDIA GPU, that is a cluster (Graphics Processing Cluster) is Iran, including the fixed-function graphics pipe GPC. In the Kepler architecture, GK110 is seen in the system, and that make up the SMX 1GPC in three, perhaps at the SMX 1GPC two systems are GK104, are separated by a graphics processing unit also GPC. It is conjectured that it used to support multi-tasking / virtual machine, but this is possible.

However, GPU NVIDIA VGX boards began as a platform for NVIDIA GPU to take advantage of Virtualized mode for the data center has become the GPU of 1SMX of minimum configuration. Is a board with four physical GPU chip 1GPC, not equipped with a plurality of SMX and GPC to the GPU chip is 1SMX. And guess on the contrary, when the GPU’s approach used in one division.

From this, the best to support the use of a generic virtual machine, the number of many physical GPU chip, board to be less internal state of each GPU chip can be deduced. In a large GPU configuration, that also implies that there is overhead to switch the virtual machine can not be ignored.

This area because it is also dependent on the Yuseji, that clearly do not know yet. Incidentally, for the cloud gaming board, unlike the generic VGX boards for the data center, are equipped with a dual-chip SMX / GPC multiple. This is probably because the game assumes that the larger granularity of switching.

● Initiatives of 15 New Year’s Eve can be seen from NVIDIA’s patent

NVIDIA is a description of support for virtual machine showed that NVIDIA is also holding several patents. Slide below the figure was projected at the keynote speech of Mr. Jen-Hsun Huang is meant (US5638535, US5696990, US5740406, US5805930, US5924126, US6081854) NVIDIA has filed a series of patents around 1997-1998. These patents, but what about extending the application when you run a multi-processor such as the GPU, memory, and around the design of I / O. In the figure, it is such as “Physical Address Table” and 36 “(FIFO) First-In First-Out” of 31, are multiplexed is understood.

As can be seen from the time of patent application, NVIDIA is from 15 years ago, was working on these issues. And, in the presentation of Virtualized de GPU, as seen in the patent showed that this multi-tasking, in two layers of virtualization this is often the extended portion of the GPU are common. In addition, NVIDIA has been acquired or pending patents related to virtualization multitasking / multi-user / various other. And often related to virtualization of memory. Patent for the control unit is also dedicated to support a multi-user on the GPU.

An overview thus, that along with the realization of the expansion of task switching in the GPU NVIDIA has been working for many years, go to the support of virtual machines has been and try turning the penetration into the data center to leverage it comes into view. In this initiative, by putting the GPU on the server side, to make freely available from the client to the parallel computing resources of the GPU. End users will be able to easily use the computing power of GPU.

(Chief Scientist) explained its advantages as follows: Mr. Bill Dally of NVIDIA.

In the benefits of “cloud, the other of convenience, the major benefit to one’s power. On mobile devices, are limited to (power of applications processors that are limited to) W number. However, in the cloud, 2W another will be able to experience the game of W W hundreds or thousands of mobile devices. ”

Of course, in this story, it is necessary to power also takes into account the communication bandwidth that is consumed. However, in the framework of constant communication power, that you can use a GPU computing performance of a scalable server-side’s for sure.