Introduction to Kepler
On 22nd March 2012, NVIDIA introduced its latest 28nm ‘Kepler’ architecture with the launch of their flagship GeForce GTX 680 graphics card. The Kepler architecture makes use of a refined 28nm process which compared to the 40nm Fermi not only provides better performance but is much more power efficient than its predecessor.
NVIDIA’s Kepler is built on the same foundation first laid by the 40nm Fermi in 2010. Fermi at the time of its launch introduced an entirely new parallel geometry pipeline that was optimized for tessellation and displacement mapping. Kepler retains these features and delivers even better performance when rendering tessellation in the latest DirectX 11 enabled titles, all of this in a highly efficient package.
One of the reasons many gamers didn’t accept Fermi was its high power consumption and heat output even though it provided rich gaming performance compared to its competitors at that time. With the Kepler architecture, the GeForce GTX 680 does not only becomes the fastest performing GPU of the GeForce 600 series but also the most power efficient GPU NVIDIA have ever built.
A Brief Look at the Kepler Architecture
The Kepler architecture looks a lot similar to the Fermi architecture being composed of a number of Graphics processing clusters (GPCs), Streaming Multi-processors (SMs) and memory controllers. The GeForce GTX 680 makes use of the GK104 core architecture (Kepler).
The GK104 core consists of four Graphics processing clusters (GPCs), where each GPC unit is comprised of two SMX units and a single memory controller. This puts the total number of SMX units on the GK104 Kepler to eight and four memory controllers. Each memory controller has 128 KB L2 Cache and eight ROPs which totals 512 KB L3 cache and 32 ROPs (Raster Operation Units) on the GK104.
Among the Kepler GK104 die, you can see a PCI-e Gen 3.0 Host interface, 256-bit wide GDDR5 memory interface and NVIDIA’s latest GigaThread Engine situated underneath those two. About the memory, it’s quite amazing how NVIDIA actually achieved a 6 GHz frequency with the first revision of Kepler architecture. As you may recall, NVIDIA had quite a problem with its Fermi memory controllers since the chips ended with lower memory speeds than they were originally intended to reach. With Kepler, the GDDR5 hits 6 Gbps along a 256-bit interface.
The NVIDIA Fermi SM block featured 32 cores within the control logic, this resulted in a total of 512 cores within the GF-110 core. NVIDIA’s Kepler uses the Next-Generation Streaming multiprocessors known as the ‘SMX’ which have 192 cores each and deliver upto 2 times the performance per watt compared to Fermi. Since there are 8 GPCs on the GK104 core, this leads to a total of 1536 cores on the GK104 Kepler die which are three times more than on Fermi.
Each SMX has its own dedicated and shared resources with the new Polymorph 2.0 engine handling raster operations such as Vertex Fetch, Tessellator, Viewport Transform, Attribute Setup and Stream output hence pumping two times the primitive and tessellation performance compared to Fermi SM units. There is one Raster Engine per GPC, four in total that handles Edge setup, Rasterizer, Z-Cull through 32 Raster Operation processors or ROPs.
Another improvement over Fermi is the implementation of ‘Kepler Bindless Textures’ which increases the number of textures a shader can reference to over a million, whereas this was restricted to 128 on Fermi. The new feature allows faster rendering of textures and provides richer texture detail in a scene. In total there are 128 Texture memory Units onboard the Kepler GK104 die.
All in all, the GK104 Kepler die onboard the GeForce GTX 680 features 1536 Cuda Cores, 192 per SMX, 384 per GPC, 128 TMUs, 32 ROPs and 256-bit GDDR5 memory.
Kepler – Power Efficiency and GPU Boost
NVIDIA’s Kepler architecture is not only faster but also much more power efficient than any of NVIDIA’s previous GPU architecture. The GK104 chip is the fastest performing of the GeForce 600 series but power consumption is only rated at half of that much as Fermi.
Past GPUs from NVIDIA and AMD featured 8 Pin and 6 Pin connectors to obtain power with TDPs over 200 Watts. The GeForce GTX 680 uses two 6 Pin connectors for power which results in a total rated TDP of 195W, compared to 250W on the Radeon HD 7970 which is its direct competitor.
Automatic clock Boost for GPUs
In addition to bringing performance and efficiency to its GeForce products, NVIDIA has also brought the latest GPU boost dynamic overclocking technology to its GeForce Kepler family.
GPU Boost is similar to the Turbo boost/Turbo core technologies we see on AMD and Intel processors. The GPU Boost feature is dynamically controlled in the background while running applications. The GPU boost algorithm shows what needs to be taken in account (Power Consumption, GPU Temperature, etc) before the boost is applied upon GPU Frequency, GPU Voltage and the memory. Once again we would like to mention that no software is required by the user to enable GPU Boost, it is a dynamic feature which runs in the background without user intervention.
When GPU is given boost, it pushes the frequencies to an undetermined level based on the TDP. For instance, when a user is running an application and the card has not yet reached its TDP limit, the GPU would boost to give added performance to the application by converting the available power to boost. The lower the TDP the more the boost you would get out of your GPU, for instance a GPU running at 180W would give lower boost speeds compared to GPU running at 160W. At its max 195W limit, the GPU (GeForce GTX 680) would run at its pre-determined base clock of 1006 MHz since no more room would be available to apply GPU Boost.
The new GPU Boost gives overclockers an advantage since it can also work while overclocked settings applied on the GPU. When users overclock the GPU’s core, an overclock would also be applied in return to the max GPU Boost frequency hence increasing its boost frequency limit. However, the GPU would run Boost technology determined on the GPUs available TDP as we have mentioned above. We have also done overclocking on our review samples which you can see later in the article.
Introduction to FXAA, TXAA, Adaptive V-Sync
NVIDIA’s Kepler architecture on the GeForce 600 series also brings new technologies for games such as new anti-aliasing algorithms, Adaptive V-Sync, 3D Vision Surround (Supports 4 Displays), NVENC, and Richer PhysX processing.
FXAA and TXAA
NVIDIA has developed two new Anti-Aliasing algorithms for its GeForce Kepler family. The FXAA uses the GPUs Cuda cores to reduce visible aliasing in gaming titles whilst applying other post-processing effects such as motion blur and bloom. FXAA was first introduced in Age of Conan last year, since then it can be applied in various titles using NVIDIA’s R300 drivers.
The FXAA algorithm reduces visible aliasing without compromising on performance. A demonstration by NVIDIA shows that running the Epic’s Next-Generation Samaritan demo a year ago at GDC 2011 required three GTX 580’s in SLI, a year later the demo with the same image quality was possible with a single GTX 680 utilizing FXAA technology.
TXAA in a similar way is another anti-aliasing algorithm developed by NVIDIA which harnesses the GTX 680’s texture performance and available in two modes. TXAA 1 offers better image quality than 8xMSAA with the performance hit of 2x MSAA while TXAA 2 offers even better image quality than TXAA 1 at the performance hit of 4x MSAA. However, it is limited by developers that which games are to use the new and much better TXAA algorithm. Consider the feature to be included on NVIDIA optimized games from Crytek, EPIC such as Crysis 3, Borderlands 2, The Secret World, Mechwarrior Online, etc.
NVIDIA has also developed a new V-Sync mode known as ‘Adaptive V-Sync’ which can dynamically turn V-Sync off if the frame rate falls below the monitor’s refresh rate. Casually, V-Sync is applied when any users want to run a game at 60 FPS set.
Gamers with high end video cards able of producing more than 60 FPS apply this to get rid of any sort of screen tearing which happens when the frame rate goes past the monitor’s refresh rate. However screen tearing also occurs when frame rates drop below the V-Sync cap. For example a game is capped at 60 FPS with V-Sync, when frames start to dip past that limit, V-Sync will take in account the next cap limit which is 30 FPS and 20 FPS. This transition to low FPS cap causes screen tearing.
What Adaptive V-Sync does is dynamically adjust when such a transition takes place. The new features turns off V-Sync when the frames dip below 60 FPS so that the games keeps on running in a playable state without any sort of screen tears. When frames start to gain up, the 60 FPS V-Sync cap is applied again. Adaptive V-Sync adjusts dynamically for displays with both 60 and 120 refresh rates.
Single GPU 3D Vision Surround
With the GeForce GTX 680, gamers can now run upto three displays simultaneously in 3D Vision surround and fourth display to show email and optional applications all through a single GPU. The GTX 680 comes with native support for NVIDIA’s 3D Vision surround and supports HDMI 1.4a, 4K Monitors (3840 x 2160) and multi-stream audio.
Improved PhysX and NVENC:
With the GeForce GTX 680 and its latest SMX unit, gamers can now run titles that incorporate the PhysX effects at much higher frame rates than the GeForce GTX 580. Games such as the recently released Borderlands 2 makes use of the new PhysX effects on cloth and particles enhancing the visual experience.
All GeForce Kepler GPUs come with NVIDIA’s latest hardware based H.264 video encoder known as the NVENC which with the help of the GPUs Cuda Cores provides massive performance improvement as per compared to CPU encoding. The GeForce Kepler architecture does this job with much lesser power consumption compared to GeForce Fermi. NVENC provides the following:
- [Can encode full HD resolution (1080p) videos up to 8x faster than real-time. For example, in high performance mode, encoding of a 16 minute long 1080p, 30 fps video will take approximately 2 minutes.]
- Support for H.264 Base, Main, and High Profile Level 4.1 (same as Blu-ray standard)
- Supports MVC (Multiview Video Coding) for stereoscopic video—an extension of H.264 which is used for Blu-ray 3D.
- Up to 4096×4096 encode
The Three Slot Beast – ASUS GeForce GTX 680 DirectCU II
Finally, we get to cover about the card itself. The ASUS GeForce GTX 680 DirectCU II video card is quite a mammoth, both in terms of size and performance.
The ASUS GeForce GTX 680 is built on the same GK104 chip which is also the first 28nm chip from NVIDIA we have been detailing in the last two pages, tech and features wise. Now it’s time to learn what kind of specifications does the ASUS GeForce GTX 680 holds.
The first thing to notice about the card is that the ASUS GeForce GTX 680 is built on a non-reference PCB. Compared to the reference PCB which features 4+2 VRM Phases (GPU+Memory), the ASUS GeForce GTX 680 gets a DIGI+ VRM with 10-phase Super Alloy Power technology. This allows massive stability and overclocking performance for users. However, due to addition of a higher amount of VRM phases, the 6 Pin stacked power connector on the reference model is changed with an 8 + 6 Pin side by side PCI-e connectors for the ASUS GeForce GTX 680. Looks and features of the card are detailed by us below after the unboxing section.
ASUS GeForce GTX 680 features 1536 cores, 32 ROPs, 128 TMUs and 3500 million transistors onboard the GK104. The core of the GTX 680 runs at a base clock of 1006 MHz and GPU Boost clock of 1058 MHz at stock. These are the reference specs of the GTX 680 which ASUS’s model retains. In addition to core, there’s a 2GB GDDR5 memory which runs at 6 GHz (6008 MHz) effective frequency along a 256-bit wide interface.
|Radeon HD 6970||Radeon HD 7870||Radeon HD 7970||GeForce GTX 560 Ti||GeForce GTX 660||
|Core Clock (MHz)||880||1000||925||950||980||772||1006|
|Boost clock (Mhz)||–||–||–||–||1033||–||1058|
|Memory Clock (MHz)||5500||4800||5500||6008||6008||4008||6008|
Let’s have a look at the GPU shall we?
Unboxing the Package
The ASUS GeForce GTX 680 DirectCU II 2 GB is shipped within a large box which at its front shows the ‘GeForce GTX 680’ and ‘DirectCU II’ label. Three claws in red color are embedded on the box illustrating the design theme of the DirectCU II cooler. The bottom part lists features of GeForce GTX 680 such as Digi+ VRM, VGA Hotwire, ASUS’s GPU Tweak and 2 GB GDDR5 memory.
The backside of the box provides detailed information about the DirectCU II cooler which is said to be 20% cooler than reference design, Digi+ VRM with Super Alloy Power and Real-time overvolting with VGA Hotwire. The backside also mentions and points out the Input/Output points on the GeForce GTX 680 DirectCU II.
There’s a second cardboard box inside the packaging which holds the GeForce GTX 680 DirectCU II. This packaging only has a single ASUS logo etched at its center.
Opening the box gives us a cover made foam which holds the GPU manual and driver disk. The GeForce GTX 680 DirectCU II is supplied with a manual, setup disk, Flexible SLI cable and a PCI-e power connector as accessories.
Beneath the foam cover is the card itself – ASUS’s GeForce GTX 680 DirectCU II. Held on another foam package to protect it from any kind of structural damage and kept inside an anti-static bag.
A Look at the GTX 680 DirectCU II
A first look at the card is enough to determine the kind of power the ASUS GeForce GTX 680 DirectCU II holds.
From the front, we can see that the card uses ASUS’s famous DirectCU II cooler which they have been using on their cards for a while now. The DirectCU II label is etched in the left corner of the card and we see the claw line (previously noted on the box) running through the center of the card. There are two 100mm noise dampening fans which provide air to the central parts of the GPU.
The back of the GPU comes with a back-plate, a rather good move by ASUS to add to the GTX 680. The back-plate is protected by a protective film which can easily be removed by users. The back-plate has several holds to dissipate heat from the back and has selective key points kept bare for GPU Hotwiring. Also note that the ‘ASUS GeForce GTX 680’ logo is imprinted on the plate which is a nice addition to its looks.
From the side, we can note that the GeForce GTX 680 DirectCU II covers three slots which could be a hassle for users with small cases or for those looking forward to SLI. In our setup, the DirectCU II fitted easily without any issues. The PCI-express connector is protected by a slot cover.
The other side comes with yet another ‘ASUS’ logo and has a total of 7 cutouts in total to dissipate hot air out of the heatsink. A large metallic cover keeps the GPU and PCB in place preventing the card from bending. We can also see the power connectors from this position, an 8 and 6 Pin PCI-e to be exact. The reference models comes with two 6-pin connectors stacked on top of each other to reserve space.
ASUS GeForce GTX 680 DirectCU II features four display outputs located on the back-panel – Dual Link-DVI, HDMI and a full length Display port to allow single GPU 3D Vision Surround support. The back-panel also comes with a half-length and full length exhaust vents to dissipate heat out of the GPU shroud.
At the front side of the PCB, we can see two SLI gold fingers protected by slot covers. The SLI gold fingers allow for upto 4-Way SLI functionality using the GeForce GTX 680 GPU.
A Much Closer Look at the GTX 680 DirectCU II
We take a close look at the ASUS GeForce GTX 680 DirectCU II to see what’s kept under the GPU’s hood.
The DirectCU II cooler makes use of a large aluminum fin array block through which five copper heatpipe run. These heatpipes make direct contact with the GPU’s core dissipating heat to the heatsink block and being blown away by the PWM controlled fans.
Once again we see the power connectors, an 8 Pin and a 6 Pin which are used to load the card. A green LED is situated on the backside of the PCB which shows that whether the power cables are properly inserted or not.
ASUS has added Voltage measurement points on the GTX 680 DirectCU II enabling overclockers to get easy details of the GPU in real-time while overclocking.
In addition to voltage measurement, the backside of the PCB has an additional cut-out for VGA Hotwire which can be enabled with ASUS ROG motherboards. This allows to remove voltage limitations and is useful if you’re upto some mean overclocking sessions.
|Processor||Intel Core i5-3570K @ 4.5 GHz|
|Motherboard:||ASRock Z77 Extreme6|
|Power Supply:||Xigmatek NRP-MC1002 1000 Watt|
|Hard Disk:||Seagate Barracuda 500GB 7200.12
Kingston HyperX 3K 90GB
|Memory:||2 x 4096 MB G.Skill ARES 2133 MHz DDR3|
|Case:||Cooler Master HAF 932|
|Video Cards:||ASUS GTX 680
ASUS GTX 660
ASUS GTX 580
MSI GTX 560 Ti
MSI HD 7970
MSI HD 7870
MSI HD 6970
|Video Drivers:||NVIDIA ForceWare 310.90
AMD Catalyst 12.11
|OS:||Windows 7 Ultimate 64-bit|
- All games were tested on 1920×1080 and 2560×1600 resolutions.
- Games with PhysX were benchmarked with the setting either kept on Low or Off for fair comparison.
Benchmark – Aliens vs Predators
Rebellion Studios bring back the action to their Alien and the Predators franchise with the launch of 2010’s Alien vs Predators. The PC version of the title was one of the first games to feature DirectX 11 and tessellation.
Benchmark – Batman: Arkham City
The second title in the Batman: Arkham series has also been developed by Rocksteady Studios. Batman: Arkham City takes place in (isn’t it obvious by the name?) Arkham City which is infested with all the super-villains and their minions which Batman has previously met past his journey.
The game was released on PC in November 2011 and runs on the latest Unreal Engine 3 which features rich DirectX 11 detail, tessellation and PhysX support for NVIDIA cards.
Benchmark – Battlefield 3
Battlefield series is a name loyal to any PC gamer. Developed by DICE and published by EA, Battlefield 3 brings back the action, being one of the largest multiplayer launch titles of 2011. The game features both infantry and vehicular combat on some of the largest landscapes ever built in game with a total of 64 players pitted against each other.
Powering the game is DICE’s own Frostbite 2.0 engine. The successor to the original Frostbite engine that powered Battlefield: Bad Company 2. Battlefield 3 makes use of a highly detailed DirectX 11 engine, hardware accelerated tessellation and new lightning effects which deliver some of the most amazing visuals ever to be seen in a game.
Benchmark – Borderlands 2
Borderlands 2, developed by Gearbox Studios is one of the hottest titles released in 2012. The game runs on a highly modified version of Unreal Engine making use of PhysX and rich DirectX 9 detail.
During our test, we set the PhysX low for a fair comparison between the video cards.
Benchmark – Crysis
The first things to pop up on forums after Crysis’s launch was ‘Can my system run Crysis’. Almost every forum in the world, gaming or tech related was filled with the same question. This was not because of any bug but because of the technical and graphical achievement Crytek achieved with Crysis.
In 2007, Crytek released Crysis, A Sci-Fi FPS set on a jungle. The first few scenes were enough to determine the graphical leap the game took over others available at its time and still remains one of the most gorgeous looking titles to date. The game quickly became a benchmark to test modern PC’s performance. Crysis is powered by CryEngine 2 which makes use of a highly modified DirectX 10 set with technologies such as Ambient Occlusion and Parallax mapping detailing the rich Jungle in Crysis.
Benchmark – Crysis 2
Crysis 2 is the second title to be released by Crytek under their Crysis Franchise. The game is set in New York and revolves in the footsteps of Alcatraz who has to take out the Ceph and Cell along his path.
The game makes use of CryEngine 3 but at the time of its launch was shipped with DirectX 9 only. The game was later given DirectX 11 and High-Res textures through patches. We had our Crysis 2 with the latest DirectX 11 and High Res patch installed.
Benchmark – Dues EX: Human Revolution
Dues EX: Human Revolution developed by Edios Montreal brings us back in Adam Jensen’s footsteps and is set 25 years before the events of the original Dues EX. The game makes use of a modified version of the Crystal Engine which features DirectX 11 capabilities.
Benchmark – F1 2012
F1 2012 bring back formula racing with an actual representation of teams, drivers and cars. The game is developed on the Ego 2.0 engine by Codemasters which makes use of DirectX 11 feature set.
Benchmark – Far Cry 3
Developed by Ubisoft Montreal, Far Cry 3 is one of 2012’s hit titles which makes us take the role of Jason Brody, a tourist stranded on a tropical jungle along with his friends which is filled with pirates and a mad man known by the name of ‘Vaas’.
The game runs on Dunia Engine 2 and features DirectX 11 effects along with making use of Havok Physics effects. The game is one of the most graphically intensive titles released.
Benchmark – Hitman Absolution
Hitman Absolution is the fifth entry to Agent 47’s Hitman franchise. Developed by IO Interactive and published by Square Enix, the game revolves around 47 once again, betrayed by his former handler Diana in order to protect Victoria, a teen girl. Mystery solves about the girl as the game progress.
The game makes use of a highly improved Glacier 2 engine making use of DirectX 11 effects, Tessellation, Global Illumination and Depth of Field. Hitman Absolution is also one of the most demanding and visually impressive titles to be released in 2012.
Benchmark – Metro 2033
Metro 2033 is a post-apocalyptic FPS set under the streets of Moscow, Russia. Yes, the game is set within the Metro system to be exact which has become the last refuge to humans since the world above them is now infected with various creatures and rouge human factions.
The game uses rich DirectX 11 tessellation and lightning effects along with high quality textures. The game is on par with Crysis 1 being the most hardware demanding titles ever released.
Benchmark – Stalker: Call of Pripyat
Stalker: Call of Pripyat is developed by Ukrainian studios GSC Games World. The game takes place after the events of Stalker: Shadow of Chernobyl.
The game uses an updated X-Ray Engine 1.6 which features DirectX 11 effects such as Tessellation and dynamic shadows.
Benchmark – Sleeping Dogs
The last game in our list is Sleeping Dogs. The game gives us the role of Wei Shen, a Chinese-American undercover cop who has to infiltrate the Sun On Yee Triad organization. The game uses a powerful DX11 engine developed and tweaked by Square Enix that makes use of High-Resolution Textures.
3DMark 11 Performance Test
Futuremark’s 3DMark 11 has been around for a while, being a comprehensive benchmark application to evaluate overall GPU and PC performance. 3DMark 11 as the name suggests makes use of DirectX 11 API and makes use of every DX11 feature at hand such as Tessellation, Depth of Field, Dynamic Lightning, Parallax Occlusion mapping, etc.
- For testing we ran 3DMark 11 in Extreme and Performance presets.
Unigine Heaven 3.0 Performance Test
Based on the Unigine Engine, Unigine Heaven was one of the first demos to feature DirectX 11 effects. We use the latest Unigine Heaven 3.0 to evaluate DirectX 11 performance of GPUs with intensive features such as Tessellation. The demo also supports DirectX 9, DirectX 10 and OpenGL.
Temperature and Thermal Test
It should be noted that the ASUS GeForce DirectCU II makes use of a three-slot non-reference cooler which provides much better cooling than the reference design.
According to ASUS, the DirectCU II delivers 20% better cooling and 14db lesser noise compared to the reference GeForce GTX 680.
We tested the card under different environments – idle/load/load with OC
The temperatures are great considering the power those extra phases had added. Still, the ASUS GTX 680 runs around the same temperatures of a reference 680 and even better while gaming.
Note – We tested load with Kombuster which is known as ‘Power viruses’ and can permanently damage hardware. Use the software at your own risk!
The overclocked settings we used were 1204 MHz on Core, 1257 MHz Boost and 1630 MHz memory clock. You can check out the overclocked results of the ASUS GeForce GTX 680 below.
Overclocking the GTX 680 DirectCU II
The maximum stable overclock we could achieve with the ASUS GeForce GTX 680 is shown below in the GPU-z:
The overclock was achieved with the default GPU Voltage and Fan settings. Please note that the ASUS GTX 680 DirectCU II makes use of a non-reference PCB design featuring better VRM and power phases. This allows better overclocking than reference models of GTX 680.
ASUS claims that users can reach upto 1.3 GHz with proper configurations. We believe it to be possible since the card offers various overclocking features such as VGA Hotwire and Voltage measurement.
After overclocking, we evaluated the performance again in 3DMark 11, we also benchmarked our MSI Radeon HD 7970 to its limit for a better comparison between both flagship cards.
We said it before and we are going to say it again, the ASUS GeForce GTX 680 DirectCU II is a beast. Both in terms of performance and power.
The ASUS GeForce GTX 680 DirectCU II currently costs $499.99 US ($479.99 after rebate) which is $20 more than the reference GeForce GTX 680 and $70 more expensive than the Radeon HD 7970. However the card is worth its price due to the fact it has a three slot cooler which delivers tremendous cooling performance and a non-reference PCB designed by ASUS that allows for better overclocking stability. The cooler may become a burden for users with smaller cases or if you’re looking to SLI multiple DCIIs. ASUS recently launched a Dual-Slot GeForce GTX 680 to address this issue but it comes with added cost due to addition of 4 GB memory.
Taking these into account, the ASUS GTX 680 DirectCU II is the perfect choice for enthusiasts, gamers and overclockers. Performance wise the card trumps the Radeon HD 7970 in almost every benchmark (with the exception of a few). With continued added driver support, the performance gains for newer titles keeps on increasing.
The only thing Kepler disappointed us is that manually overclocking has become quite a hassle for the average user. Except this, the Kepler architecture has impressed us technologically and feature-wise. The ASUS GeForce GTX 680 delivers exceptionally well performance per watt due to the latest 28nm Kepler architecture and features like GPU Boost, TXAA/FXAA and Adaptive V-Sync are much needed features which help bring PC gaming back to action.