Multi GPU Technology Analysis – Nvidia SLI and AMD CrossFire Scaling, Frame-Time and Value Comparison

Oct 1, 2015 at 11:19pm EDT

Conclusion: Parting words on Multi-GPU

Multi GPU configurations are part and parcel of nearly every high end build. Unless the builder has decided to forgo the performance benefit of multiple graphic cards due to personal preference, you will find at least a dual SLI or Crossfire setup in many of the enthusiast builds. However, we are all aware, there are certain drawbacks associated with using more than one graphic card. These include issues from micro stuttering, to bad multi GPU profiles, to plain and simple diminishing marginal performance. Today, sourcing a post from IYD.KR by the well known DG Lee, we will take a look at how SLI and Crossfire compare to themselves, and to each other.

An indepth look at GPU scaling technologies of the current generation: Nvidia's TITAN-X and 980 TI in SLI versus AMD's R9 Fury X in CrossFire

If we take the performance offered by a single graphic card to be 100%, than a gamer employing 2 or more cards should logically expect gains of 100% (for a total of 200% etc) for every card added. Of course, graphic cards (currently) do not stack that way - not even close. The real gains are much more diminished in nature and the a plethora of complications is added to the mix (frame pacing being one of the recent examples). Stuttering used to be a problem in the initial era of the multi GPU, something which has now tricked down to problems associated with micro-stuttering. The primary problem, with any multi GPU technology (be it Crossfire or SLI), is that the GPUs do not logically act as a single, bigger GPU, but rather, they split the work between them. This translates to real world (micro) delays if the GPUs are not 100% in sync.

Before we go any further, given above is the reference benchmarks for this test. It consists of 9 games and one synthetic benchmark. All of these were tested at three different resolutions, namely 1920x1080, 2560x1440 and 3840x2160. Before we begin our dive, please note that this particular graph is in relative terms to the single GPU configuration of the particular IHV and not in absolute terms, to each other. The complete list is given below:

Right of the bat, we can see that red outperforms green in every step. There are a total of 9 tests present here, at three different resolutions and three different multi GPU configurations and the AMD R9 Fury X wins every single one of them. I think this makes it crystal clear: AMD's multi GPU scaling clearly has an upper hand over the Nvidia counterpart.  The Nvidia GTX 980 Ti and the Nvidia GTX TITAN-X scale almost identically (probably because its the same chip inside both products: the GM200 Maxwell die).  The diminishing marginal returns of every added GPU (to the existing setup) are also obvious. The scaling is pretty horrible on lower resolutions (such as 1080p and 1440p) but that is more or less expected (due to the performance being CPU bound); and let's face it, if you are running  an SLI or a Crossfire setup you probably aren't rocking an ageing 1080p monitor.

The ideal frame time (per GPU) should add up to 1. So if you have 1 GPU, that is exactly 1. If you have two GPUs, it should be 0.5 and so on. The problem is however, that as with the actual performance, frame time has a downward trend as well, and that is something that cannot be avoided. It can, however, be minimized and that is where each technology comes in. Given below is the comparison charts (including the Average Relative Performance and Frametimes) between AMD's Crossfire Technology and Nvidia SLI Tech:

Tables courtesy of DG Lee. @IYD.KR

WCCFTechNvidia Geforce GTX Titan X (SLI)AMD Radeon R9 Fury X (CF/xDMA)
MetricDeviation from Ideal ScalingDeviation from Ideal Frametime
Deviation from Ideal ScalingDeviation from Ideal Frametime
Single GPU0%0%0%0%
Dual GPU(-) 11%(+) 12%(-) 07%(+) 07%
Triple GPU(-) 21%(+) 26%(-) 13%(+) 15%
Quad GPU(-) 29%(+) 41%(-) 22%(+) 28%
A comparison of deterioration due to GPU scaling in Nvidia SLI and AMD Crossfire (over xDMA).
*Lower is better (in absolute terms).

 

To make the impact of the information more clear, I have taken the liberty to calculate the deviations from the Ideal score. Once again, AMD's multi GPU technology shines here. AMD's deviation from the ideal frame-time and the ideal performance is very minimal at dual Crossfire configuration (both at approximately 7% form the ideal). As more GPUs are added, the score deteriorates, till the Quad GPU configuration is lagging approximately 22% behind the ideal scaling and 28% behind the ideal frame time. This means that instead of 1, the total frame-time (for all GPUs combined) is actually 1.28 - which will be the root cause behind any complications that result from Multi-GPU.

Nvidia variants fare worse. In dual SLI, we see that the card has is lagging 11% behind the ideal scaling and 29% at the top end (four GPUs). Frame-time is a similar story, with deviation ranging from 12% all the way up to a massive 41% (in Quad SLI). This means that if you are rocking  four of green's cards, they will add up to a total frame time of 1.41 (instead of the ideal 1), which is a pretty massive drop, and slightly less than double of AMD's number.

The obvious next question, when dealing with multiple GPUs in SLI or Crossfire, is of course, value. In the second half of this article, we will look at the value offered by the setups as well as investigate, why, AMD tends to have an advantage in multi GPU.

Lets begin with the testing methodology. We will now be looking at the absolute performance of each graphic card from both IHVs. To get a quantified value of real world scaling, we have an option between 1080p, 1440p and 4K results. Since most of the gamers using more than one GPU will have either a multi monitor setup or a 4K screen, we will be using only the 4K round of tests. This will also allow us to reduce most of the CPU bound effect and access purely graphics processing numbers.

Note: The performance numbers used in this analysis (as well as a detailed benchmark run-down) can  be found over at IYD.KR, courtesy of DG Lee.

Nvidia SLI and AMD Crossfire - Marginal Value Comparison (Performance Per Dollar)

To get the 'value' offered by the setup we will then divide the performance percentage by the dollar figure of the card (or the total dollar figure of all cards in the setup). Since demand makes prices fluctuate on retailers, and considering it will be nearly impossible to create a table using those (one that remains reliable over the course of time), we will be using the MSRP at the time of release.

Performance divided by the MSRP (multiplied by the amount of cards) will equal the value offered by the SLI or Crossfire setup

Given below, is the table of how the values of high end graphic card configurations change with every added GPU. I have also included a graph to make visualization easier. Since the AMD R9 Fury X has the highest value amongst the trio, we will be using it as the reference point for our graph.

WCCFTechSingle GPUDual GPUTriple GPUQuad GPU
Nvidia GTX TITAN X0.1001(66.29%)0.0896 (59.34%)0.0794 (52.58%)0.0708 (46.89%)
Nvidia GTX 980 TI0.1495 (99%)0.1333 (88.28%)0.1176 (77.88%)0.1052 (69.67%)
AMD R9 Fury X0.1510 (100%)0.1410 (93.38%)0.1305 (86.42%)0.1179 (78.08%)

*Higher is better. @Wccftech

A single Geforce GTX TITAN-X is worse value for gaming than four R9 Fury Xs or four GTX 980 Tis

Those are some pretty interesting values, if I may say so. We see that the AMD R9 Fury X wins the value rounds as well, closely followed by the Nvidia GTX 980 Ti and both graphic cards completely dominate the TITAN-X in terms of performance per dollar. I don't think there can be a clearer depiction of the premium present in the GTX TITAN-X than this metric.

Interestingly, there is a very slight difference between a single R9 Fury X and a single GTX 980 Ti initially, but after we cross the dual SLI/Crossfire threshold, things start to get a little more spread. The GTX 980 Ti it appears, has fair value uptill a dual configuration - after that, a quad AMD R9 Fury X setup will have better value than any proceeding Nvidia setup. Infact the Nvidia triple SLI configuration (Geforce GTX 980 Ti) is slightly worse value than the one offered by a Fury X in quad Crossfire.

Investigating AMD's edge in multi GPU scaling and XDMA Analysis

So what exactly is the root cause behind the R9 Fury X's ability to scale efficeintly? The answer to that is three pronged. Firstly, the GPU architecture and drivers themselves contribute a significant portion to account for AMD's edge, but that is not something we can accurately investigate so we will have to skip over this. The second obvious reason is the fact that the R9 Fury X uses an HBM (memory) setup with far more bandwidth and throughput than the Nvidia counterparts can push. Since we have already covered HBM and its effects in excruciating detail before, I won't be going into much detail on this right now. Here's the thing however, the testing (of AMD GPUs) was done using XDMA technology to establish Crossfire between multiple graphic cards. Nvidia's offerings on the other hand used an SLI Bridge. And this, in our opinion, is the third major cause of the apparent advantage AMD has.

The Crossfire bridge has a peak bandwidth of 900 MB/s according to conservative estimates whileas Nvidia's SLI FAQs puts the SLI bridge at approximately the same figure: 1 GB/s. The key techniques behind using two graphic cards in conjunction is AFR and SFR. AFR stands for alternate frame rendering and tasks either GPU with a specific order for the frame queue (odds for one, evens for another for eg) while SFR stands for simultaneous frame rendering and splits a particular frame between two GPUs. AFR is superior to SFR in terms of performance but results in the phenomenon known as micro stutter.

In every case however, and when using a multi GPU configuration, one graphic card will act as the master (Radeon 1) and will be responsible for the actual display output while as the slaves (Radeon 2+) will be responsible for processing the data and handing it over to the master GPU for output. Before XDMA, the low bandwidth 900 MB/s connector was used (in conjunction with offloading some of the work to PCI-E) for the GPUs to talk. After XDMA however, the PCI-E 3.0 entered the scene and there was ample bandwidth to forgo the shallow throughput connector and push everything through PCIE. Given below is the bandwidth available for the GPUs to talk in each common PCI-E mode:

If you look at the diagram (I hastily made), you will be able to visualize how much difference this can make. Radeon 2 can either use the crossfire bridge to talk to Radeon 1 or the much faster PCIE 3.0 interface with the advent of XDMA technology. Nvidia on the other hand, does not have the option (currently) to forgo the SLI interface.

Unfortunately however, there is not a alot of documentation on SLI (or Crossfire for that matter). But what we do know is that in the past, Nvidia forceware drivers (80.XX onwards) allowed GPUs in SLI to talk via the PCI-E alone, but the performance was worse than if an SLI connector was employed. The general consensus seems to be that Nvidia's multi GPU tech (same as AMD) does use the PCI-E to transmit a significant chunk of the data but also employs the SLI bridge for added bandwidth and synchronising timing. Unfortunately however, the SLI bridge that was once used as a superior interface to compliment the PCI-E has now become more or less obsolete and a bottleneck.

In what was (hopefully) an interesting piece, we saw that high end AMD's offerings in CrossFire over XDMA have slightly better scaling, value and timings than Nvidia counterparts in SLI. We have also seen that the logical step forward for Nvidia's multi GPU technology is to reduce or eliminate reliance on the aged SLI connector or introduce a new version of the same - provided it wants to keep up with the superior scaling offered by AMD counterparts. Ofcourse, XDMA is just part of the equation; HBM (memory) is just as important and results in the elimination of one of the biggest bottlenecks in the entire work chain.

Pacal's test vehicles have already been spotted on Zauba, and the first flagship is expected next year. With Nvidia shifting to HBM as well, the gap between the scaling of both IHVs will change with Nvidia either catching up or even overtaking AMD counterparts.

If you are planning to go for a setup with multiple graphic cards, a dual Crossfire (XDMA) configuration of the R9 Fury X is a good buy, followed by a dual SLI configuration of the GTX 980 Ti

What is not good value however (for gamers) is the GTX TITAN-X, whose excessive price tags makes it have the worst value for money we have seen (probably exceeded only by former TITAN branded cards).

The multiple GPU scaling and deterioration was quantified to a reasonable extent and we were able to put a good cap on what a gamer (from any side of the camp) should expect with each added graphic card. It is also worth stating that with the advent of the DirectX 12 API, multi GPU performance and capability could take a huge  leap forward. The included low level access would allow developers to get creative with how work is managed on the multiple graphic cards, improving a standard that has been historically riddled with problems.

Finally, I would like to add that the dual Multi-GPU configurations from both camps are a decent approximation (and an educated prediction) of the performance we can expect from dual-GPU flagships of both Nvidia and AMD. This means that fans of both hardware vendors can expect decent performance gains, which only leaves the problem of pricing - and that is something only time will tell.

Contents

About the author: PC Hardware and Technology Enthusiast, Blood of Silicon (1 nm),

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day