NVIDIA Pascal and Volta GPUs Now Supported By Latest GeForce 358.66 Drivers – Also Adds Preliminary Support For Vulkan API
Some surprising information has been found in NVIDIA’s latest GeForce drivers in regards to their upcoming Pascal and Volta GPUs. As we know, NVIDIA keeps on adding preliminary support for their new graphics cards in GeForce drivers and the recently launched GeForce 358.66 driver has exposed some new information about the next generation NVIDIA GPUs.
NVIDIA GeForce 358.66 Driver Adds Pascal GPU, Volta GPU and Vulkan API Support
Well to start off this news, we already know that NVIDIA is hard at work, preparing two new GPUs for launch in 2016 and 2017. In 2016, NVIDIA plans to launch their Pascal GPU architecture which will make use of the latest FinFET process node and HBM2 memory architecture. The Pascal GPU will be aimed at both consumer and HPC markets with availability reported in first half of 2016. The second GPU in talks is the Volta GPU which is going to enhance the architecture that Pascal brings in all possible ways and while it is actually slated for a 2018 time frame for both consumer and server market, the shipments will start in 2017 to two advanced supercomputers in 2017, the Summit from Oak Ridge National Laboratory and Sierra from Lawrence Livermore National that are going to feature next generation Volta GPUs and IBM Power9 CPUs to deliver up to 300 PetaFlops of compute performance.
Now the surprising thing about the new GeForce 358.66 driver is that while checking through OpenCL runtime, the driver exposes two brand new compute capabilities on a few new CUDA architecture enabled GPUs. There are two unique ids, “-D__CUDA_ARCH__=600” for Pascal GPUs and “-D__CUDA_ARCH__=700” for Volta GPUs. Previously, NVIDIA has used the “-D__CUDA_ARCH__=500” for their Maxwell GPUs, “D__CUDA_ARCH__=300” for Kepler, “-D__CUDA_ARCH__=210” for Fermi and “-D__CUDA_ARCH__=200/100” for their first generation Tesla GPUs.
- -D__CUDA_ARCH__=700 (GV100)
- -D__CUDA_ARCH__=600 (GP100)
- -D__CUDA_ARCH__=610 (GP102)
- -D__CUDA_ARCH__=620 (GP104)
- -D__CUDA_ARCH__=630 (GP106)
- -D__CUDA_ARCH__=640 (GP108)
Now the added support is still in preliminary phase and we shouldn’t get excited all yet. In February 2014, we saw a similar leak through GeForce drivers which not only leaked the CUDA compute capabilities of the then upcoming Maxwell cards but also revealed their codenames. A few days after the leak, NVIDIA launched their first Maxwell GM107 based cards and 8 months later introduced the high-end, GM204 powered GeForce 900 series cards. Now we know that there are not one but three specific GPUs mentioned in the Pascal series with newly added compute capabilities which would indicate that the Pascal GPU is pretty much ready as far as the testing and qualification phase is concerned and if NVIDIA has plans to introduce the GPUs in 1H 2016, then we are looking at mass production in early Q1 2016. The other surprising bit is that Volta is also listed in the driver which is going to be featured on two supercomputers in 2016. This shows that Volta is currently in engineering phase but is being tested at an even faster rate than Pascal so that the top-end chips make their way in Summit in Sierra before a public introduction in 2018.
Last month, at GTC Taiwan 2015, NVIDIA presented brief technical seminars for their GPUs and the applications that worked around them. During the main keynote, Vice President of Solutions Architecture and Engineering at NVIDIA, Marc Hamilton, talked about several new technologies that NVIDIA will be announcing in the coming years. Of course, Pascal was a part of the keynote and not only did he talked about Pascal GPU but one of the slides showcased the updated Pascal GPU board with the actual chip fused on the new form factor which will be aimed at HPC servers.
What we know so far about the GP100 chip.
- Pascal microarchitecture.
- DirectX 12 feature level 12_1 or higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16FF+ manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Taped out in June 2015.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM for the consumer variant and 32GB for the professional variant.
- Features a 4096bit memory interface.
- Features NVLink and support for Mixed Precision FP16 compute tasks at twice the rate of FP32 and full FP64 support. 2016 release.
The Pascal board features the actual Pascal GPU core with four HBM2 stacks which will feature up to 16 GB VRAM on consumer and 32 GB VRAM on professional HPC solutions. The Pascal GPU looks very similar to the Fiji GPU with a similar design. The die seems slightly larger than the Fiji GPU and could be anywhere around 500-600mm2. We cannot say for sure whether the Pascal chip shown on the board is the full GP100 solution or a lower tier chip that will come in as a successor to the GM204 chip but knowing that NVIDIA has aimed their high-performance chips at the HPC market, such board designs will act as a new form factor for workstation / servers and it is likely to be featuring the full Pascal GPU. On the sides of the chip, we can see the metallic heatspreader while the VRMs/MOSFETs sit on both sides o the chip.
Now we know that NVIDIA has taped out Pascal chips and we recently spotted a shipment of Pascal GPUs on their way to NVIDIA’s testing facility straight from TSMC’s fabs. Now there’s been some questioning about the board we were showcased back in 2014 as to whether it will be an actual form factor and it was officially stated by NVIDIA that along side PCI-Express form factors, Pascal GPUs will be available on Mezzanine board which is smaller than PCI-Express 3.0 PCBs. This specific PCB will come with the Mezzanine connector that has speeds of 15 GB/s and up to 40 GB/s and will be available on select HPC servers and workstations that feature NVLINK support. Several of these boards can be stacked on top of each other to conserve space inside servers while consumer PCs will stick with PCI-Express form factor and full-length cards as they are the best solution for high-end gaming rigs and professional usage.
The last bit is that the drivers also add preliminary support for the next generation Vulkan API which is meant to replace OpenGL. NVIDIA is a key partner with Khronos Group along with many others who are making the Vulkan API become a reality with performance optimizations across the board and OS / hardware support far more than DirectX 12. Following is the transcript from LaptopVideo2Go which shows newly added functions and extensions in the driver:
OpenGL runtime contains following extensions and functions:
VK_EXT_KHR_device_swapchain VK_EXT_KHR_swapchain vkCreateInstance vkEnumerateInstanceExtensionProperties vkGetDeviceProcAddr vkGetInstanceProcAddr vkGetProcAddressNV
Extensions are not recognized by GPUCapsViewer yet.
GLEW based apps fail to launch.
This driver comes with a new runtime “nv-vk32.dll”, which exposes following functions
vkAcquireNextImageKHR vkCreateDevice vkCreateSwapchainKHR vkDestroySwapchainKHR vkEnumerateDeviceExtensionProperties vkGetDeviceProcAddr vkGetPhysicalDeviceSurfaceSupportKHR vkGetSurfaceFormatsKHR vkGetSurfacePresentModesKHR vkGetSurfacePropertiesKHR vkGetSwapchainImagesKHR vkQueuePresentKHR vkCreateInstance vkEnumerateInstanceExtensionProperties vkGetPhysicalDeviceMemoryProperties vkGetInstanceProcAddr vkEnumeratePhysicalDevices vkCreateImage vkDestroyImage vkAllocMemory vkFreeMemory vkBindImageMemory vkGetImageMemoryRequirements vkQueuePresentNV
Khronos Group announced their Vulkan API few months ago that has been regarded as the successor to OpenGL. Vulkan aims to be bigger and better than what it once was. It is the only low level API that supports every single platform in existence. A big advantage of Vulkan over OpenGL is that it possesses a multi-core friendly architecture. Where OpenGL APIs did not allow a generation of graphic commands in parallel to command execution, Vulkan happily allows multiple command buffers in parallel. AMD who has put a lot of emphasis on Mantle API in the past may just leverage performance when the Vulkan API hits the market since both share the same foundation and Vulkan has cross platform support (Windows 7/8/10, Linux, Android) along with Cross-Vendor support (NVIDIA, AMD, Intel, Qualcomm, Imagination Technologies, ARM, Samsung, Broadcom, Vivanate.
Even Valve has said the Vulkan is the right way forward as said by Valve’s San Ginsburg in his speech from SIGGRAPH 2015. Dan Ginsburg, who has taken care of porting the Source 2 engine to Vulkan, didn’t tiptoe around the elephant in the room, Microsoft’s DirectX 12. In fact, he openly said that Vulkan is the right way forward and there is not much reason to create a DX12 backend when developers can use Khronos Group’s API right away; here’s a transcription of the most relevant parts:
Since hosting the first Vulkan face-to-face meeting last year, we’ve been really pleased with the progress of the API and we think it’s the right way forward for powering the next generation of high performance games.
Here’s why we think Vulkan is the future. Unless you are aggressive enough to be shipping a DX12 game this year, I would argue that there is really not much reason to ever create a DX12 back end for your game. And the reason for that is that Vulkan will cover you on Windows 10 on the same class of hardware and so much more from all these other platforms and IHVs that we’ve heard from. Metal is single platform, single vendor, and Vulkan; we are gonna have support for not only Windows 10 but Windows 7, Windows 8, we’re gonna have it on Android and all of the IHVs are making great progress on drivers, I think we’re going to see super rapid adoption. If you’re developing a game for next generation APIs, I think it’s clear that Vulkan is the best choice and we’re very pleased with the progress and the state of the API. We think it’s gonna power the next generation of games for years to come.
Moreover, we all know that Valve as a company has been trying to push OpenGL & Linux support in the last few years, in an effort to oppose Microsoft’s near monopoly on Windows; however, they haven’t had any real success so far and presently there is no reason to believe Vulkan will suddenly turn the tide. Of course, the battle for the leading next generation APIs between DirectX12, Metal and Vulkan has just begun, but we can see who’s already in pole position and it’s not Vulkan right now. Still, what gamers really care for is to get the promised performance boost and that can be achieved through constant driver optimization and robust use of the next generation APIs that are now available in the market. AMD will definitely try to enhance their graphics performance with Vulkan on the market and NVIDIA is already focusing to extend their established lead with the new APIs.
|GPU Family||AMD Vega||AMD Navi||NVIDIA Pascal||NVIDIA Volta|
|Flagship GPU||Vega 10||Navi 10?||NVIDIA GP100||NVIDIA GV100|
|GPU Process||14nm FinFET||7nm FinFET?||TSMC 16nm FinFET||TSMC 12nm FinFET|
|GPU Transistors||15-18 Billion||TBC||15.3 Billion||21.1 Billion|
|GPU Cores (Max)||4096 SPs||TBC||3840 CUDA Cores||5376 CUDA Cores|
|Peak FP32 Compute||12.5 TFLOPs||TBC||12.0 TFLOPs||15.0 TFLOPs|
|Peak FP16 Compute||25.0 TFLOPs||TBC||24.0 TFLOPs||120 Tensor TFLOPs|
|VRAM||16 GB HBM2||TBC||16 GB HBM2||16 GB HBM2|
|Memory (Consumer Cards)||HBM2||HBM3||GDDR5X||GDDR6|
|Memory (Dual-Chip Professional/ HPC)||HBM2||HBM3||HBM2||HBM2|
|HBM2 Bandwidth||480 GB/s (Instinct MI25)||>1 TB/s?||732 GB/s (Peak)||900 GB/s|
|Graphics Architecture||Next Compute Unit (Vega)||Next Compute Unit (Navi)||5th Gen Pascal CUDA||6th Gen Volta CUDA|
|Successor of (GPU)||Radeon RX 500 Series?||Radeon RX 600 Series?||GM200 (Maxwell)||GP100 (Pascal)|