NVIDIA To Devs: Compute/Graphics Toggle Is A Heavyweight Switch

Alessio Palumbo

A couple of weeks ago, NVIDIA published a vademecum of DirectX 12 Do’s and Don’ts that went largely unnoticed. However, it actually contains some interesting information on the tips that NVIDIA gave to developers on how to best use Microsoft's new lower level API with their existing architecture.

A couple of them, for instance, seem to confirm two stories we reported last month about Maxwell problems with Asynchronous Compute. In case you don't recall, the reference is to AMD's Robert Hallock saying that Maxwell can't perform Async Compute without heavy reliance on slow context switching; a few days later, Tech Report's David Kanter mentioned that according to Oculus employees, preemption context switching was potentially catastrophic for Maxwell GPUs.

Now, under the Pipeline State Objects (PSOs) section, they were very clear:

  • Don’t toggle between compute and graphics on the same command queue more than absolutely necessary
  • This is still a heavyweight switch to make

That's not all they had to say about compute and graphics tasks - under the Work Submission – Command Lists & Bundles section, NVIDIA warned developers as follows:

  • Check carefully if the use of a separate compute command queues really is advantageous
  • Even for compute tasks that can in theory run in parallel with graphics tasks, the actual scheduling details of the parallel work on the GPU may not generate the results you hope for
  • Be conscious of which asynchronous compute and graphics workloads can be scheduled together

Finally, NVIDIA also gave some advice on how to best use Maxwell and DirectX 12 hardware features. They recommend to use Conservative Rasterization, which right now is only available on Maxwell cards, while they are a bit more cautious about Raster Order Views, the other DX12_1 level feature.

  • Use hardware conservative raster for full-speed conservative rasterization
  • No need to use a GS to implement a ‘slow’ software base conservative rasterization - See https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization
  • Make use of NvAPI (when available) to access other Maxwell features
  • Advanced Rasterization features:
    Bounding box rasterization mode for quad based geometry
    New MSAA features like post depth coverage mask and overriding the coverage mask for routing of data to sub-samples
    Programmable MSAA sample locations
  • Fast Geometry Shader features:
    Render to cube maps in one geometry pass without geometry amplifications
    Render to multiple viewports without geometry amplifications
    Use the fast pass-through geometry shader for techniques that need per-triangle data in the pixel shader
  • New interlocked operations
  • Enhanced blending ops
  • New texture filtering ops
  • Don’t use Raster Order View (ROV) techniques pervasively
  • Guaranteeing order doesn’t come for free
  • Always compare with alternative approaches like advanced blending ops and atomics

For more about DirectX 12, you can check our Fable Legends benchmark results, Lionhead's statement on the DX12 features used in Fable Legends and our own analysis on Async Compute in the game.

Deal of the Day