Multi GPU Technology Analysis – Nvidia SLI and AMD CrossFire Scaling, Frame-Time and Value Comparison

Oct 1, 2015

Performance Per Dollar (Per GPU) Comparison and XDMA Analysis


Lets begin with the testing methodology. We will now be looking at the absolute performance of each graphic card from both IHVs. To get a quantified value of real world scaling, we have an option between 1080p, 1440p and 4K results. Since most of the gamers using more than one GPU will have either a multi monitor setup or a 4K screen, we will be using only the 4K round of tests. This will also allow us to reduce most of the CPU bound effect and access purely graphics processing numbers.

Note: The performance numbers used in this analysis (as well as a detailed benchmark run-down) can  be found over at IYD.KR, courtesy of DG Lee.

Nvidia SLI and AMD Crossfire – Marginal Value Comparison (Performance Per Dollar)

To get the ‘value’ offered by the setup we will then divide the performance percentage by the dollar figure of the card (or the total dollar figure of all cards in the setup). Since demand makes prices fluctuate on retailers, and considering it will be nearly impossible to create a table using those (one that remains reliable over the course of time), we will be using the MSRP at the time of release.

Performance divided by the MSRP (multiplied by the amount of cards) will equal the value offered by the SLI or Crossfire setup

Given below, is the table of how the values of high end graphic card configurations change with every added GPU. I have also included a graph to make visualization easier. Since the AMD R9 Fury X has the highest value amongst the trio, we will be using it as the reference point for our graph.

WCCFTechSingle GPUDual GPUTriple GPUQuad GPU
Nvidia GTX TITAN X0.1001(66.29%)0.0896 (59.34%)0.0794 (52.58%)0.0708 (46.89%)
Nvidia GTX 980 TI0.1495 (99%)0.1333 (88.28%)0.1176 (77.88%)0.1052 (69.67%)
AMD R9 Fury X0.1510 (100%)0.1410 (93.38%)0.1305 (86.42%)0.1179 (78.08%)

amd nvidia multi gpu performance per dollar 1*Higher is better. @Wccftech

A single Geforce GTX TITAN-X is worse value for gaming than four R9 Fury Xs or four GTX 980 Tis

Those are some pretty interesting values, if I may say so. We see that the AMD R9 Fury X wins the value rounds as well, closely followed by the Nvidia GTX 980 Ti and both graphic cards completely dominate the TITAN-X in terms of performance per dollar. I don’t think there can be a clearer depiction of the premium present in the GTX TITAN-X than this metric.

Interestingly, there is a very slight difference between a single R9 Fury X and a single GTX 980 Ti initially, but after we cross the dual SLI/Crossfire threshold, things start to get a little more spread. The GTX 980 Ti it appears, has fair value uptill a dual configuration – after that, a quad AMD R9 Fury X setup will have better value than any proceeding Nvidia setup. Infact the Nvidia triple SLI configuration (Geforce GTX 980 Ti) is slightly worse value than the one offered by a Fury X in quad Crossfire.

Investigating AMD’s edge in multi GPU scaling and XDMA Analysis

So what exactly is the root cause behind the R9 Fury X’s ability to scale efficeintly? The answer to that is three pronged. Firstly, the GPU architecture and drivers themselves contribute a significant portion to account for AMD’s edge, but that is not something we can accurately investigate so we will have to skip over this. The second obvious reason is the fact that the R9 Fury X uses an HBM (memory) setup with far more bandwidth and throughput than the Nvidia counterparts can push. Since we have already covered HBM and its effects in excruciating detail before, I won’t be going into much detail on this right now. Here’s the thing however, the testing (of AMD GPUs) was done using XDMA technology to establish Crossfire between multiple graphic cards. Nvidia’s offerings on the other hand used an SLI Bridge. And this, in our opinion, is the third major cause of the apparent advantage AMD has.
xdma crossfire diagram

The Crossfire bridge has a peak bandwidth of 900 MB/s according to conservative estimates whileas Nvidia’s SLI FAQs puts the SLI bridge at approximately the same figure: 1 GB/s. The key techniques behind using two graphic cards in conjunction is AFR and SFR. AFR stands for alternate frame rendering and tasks either GPU with a specific order for the frame queue (odds for one, evens for another for eg) while SFR stands for simultaneous frame rendering and splits a particular frame between two GPUs. AFR is superior to SFR in terms of performance but results in the phenomenon known as micro stutter.

In every case however, and when using a multi GPU configuration, one graphic card will act as the master (Radeon 1) and will be responsible for the actual display output while as the slaves (Radeon 2+) will be responsible for processing the data and handing it over to the master GPU for output. Before XDMA, the low bandwidth 900 MB/s connector was used (in conjunction with offloading some of the work to PCI-E) for the GPUs to talk. After XDMA however, the PCI-E 3.0 entered the scene and there was ample bandwidth to forgo the shallow throughput connector and push everything through PCIE. Given below is the bandwidth available for the GPUs to talk in each common PCI-E mode:

  • PCI Express 2.0 running in x16 mode will offer 8GB/s of throughput one way (or 16GB/s total)
  • PCI Express 3.0 running in x8 mode will also offer 8GB/s of throughput one way (and 16GB/s total)
  • PCI Express 3.0 running in x16 mode will offer 16GB/s one way or 32GB/s in total.

If you look at the diagram (I hastily made), you will be able to visualize how much difference this can make. Radeon 2 can either use the crossfire bridge to talk to Radeon 1 or the much faster PCIE 3.0 interface with the advent of XDMA technology. Nvidia on the other hand, does not have the option (currently) to forgo the SLI interface.

Unfortunately however, there is not a alot of documentation on SLI (or Crossfire for that matter). But what we do know is that in the past, Nvidia forceware drivers (80.XX onwards) allowed GPUs in SLI to talk via the PCI-E alone, but the performance was worse than if an SLI connector was employed. The general consensus seems to be that Nvidia’s multi GPU tech (same as AMD) does use the PCI-E to transmit a significant chunk of the data but also employs the SLI bridge for added bandwidth and synchronising timing. Unfortunately however, the SLI bridge that was once used as a superior interface to compliment the PCI-E has now become more or less obsolete and a bottleneck.