Exploring AMD’s And Intel’s Architectural Philosophies – What Does The Future Hold ? (Part I)

Hardware
The Author
PC Hardware Enthusiast, Hardcore Gamer, Love Tech. Been building gaming systems for over a decade & following the semiconductor industry for just as long.

[Editorial] Today we’re going to bring you on a journey that will get you one step closer to seeing the world in the eyes of AMD and Intel and reflect on how different yet quite similar those images truly are. To understand the philosophies of both companies we have to also understand the architectures inside the products they make and get to know the reasons behind the decisions made in designing those architectures intimately.

Intel and AMD Microarchitectures – Exploring the Past, Present and the Future (Part I)

Lets start from the beginning, in 2006 AMD made the decision to buy ATi with the vision to fuse GPU cores and CPU cores into a single piece of silicone called the Accelerated Processing Unit or APU for short. AMD called this effort “Fusion” and it was the company’s main advent from buying ATi. In 2010 Intel released the first processor to feature integrated graphics but just like the company’s first effort to make a dual or a quad core processor the resulting product was a multi-chip module or an MCM for short. Intel called this design Clarkdale & it was not a fully integrated monolithic solution. The graphics portion had its own separate die and it was connected to the main CPU die using an interposer.

Clarkdale
A truly integrated solution did not come from Intel until Sandy Bridge was released a year later in January 2011.
In the same year & month AMD released its first processor with integrated graphics, code named Brazos. However unlike Intel’s first effort with Clarkdale the graphics portion of the chip was integrated into a single die of silicone and with Brazos & Sandy Bridge began the era of Fusion.

Brazos

Fast forward to today and you’ll find that almost all processors have some sort of graphics solution integrated. Intel’s entire range of consumer processors have integrated graphics except for the niche that socket LGA 2011 addresses. All of AMD’s processors launched in the past two years have integrated graphics and the company has stated multiple times before that the company’s future is all about APUs. The only AMD products which don’t have some sort of integrated graphics solution are those on the AM3+ socket but all of said processors are based on the same Piledriver server design from 2012.

All mobile processors spanning notebooks to handhelds have integrated graphics as well. So it’s becoming strikingly clear that integrated graphics is here to stay.

Evolution Of Integrated Graphics

Today integrated graphics processors take up a considerable percentage of the chip real estate.
The GPU in Kaveri, AMD’s latest APU, takes up 47% of the chip’s real estate that’s almost half the die.
GPU real-estate has been consistently growing for the past several years, with AMD doubling the GPU portion from Llano, AMD’s first generation high performance APU, to AMD’s latest Kaveri parts.
We see a very similar trend with Intel as well, increasing the GPU portion of the die with each generation.
Kaveri

Going from Sandy Bridge, Intel’s first processor with integrated graphics built on a single monolithic die, to mainstream GT2 Haswell parts the GPU real estate nearly doubled. And with Intel’s “Iris” GT3 graphics parts the real estate quadrupled compared to Sandy Bridge.

Haswell

Integrated graphics processors are no longer just responsible for graphics, they can accelerate a plethora of workloads using OpenCL.  QuickSync is a great example of how iGPUs can be used for things other than graphics processing. Same with AMD, their iGPUs can be used to accelerate things like photo & video editing, accelerate stock analytics & media conversion.

With both AMD & Intel investing and dedicating such a significant amount of transistors in each chip to graphics and witnessing the gradual incline of research and development resources being poured into graphics solutions today by both parties; it’s clearly evident that both companies see great value and business opportunity in the of pursuit graphics.

Advertisements

So why the relatively sudden & invested interest in graphics from both companies ?
For the past several decades engineers relied primarily on Moore’s Law to get better performance out of their designs. Each year engineers were handed more transistors to play with, transistors that were faster and consumed less power. But in recent years Moore’s Law became much more slow to progress.

Power consumption continued to decline under the recent progression of Moore’s Law but frequency stopped scaling as it used to and it became exponentially more difficult to squeeze more performance out of a single serial processing unit (CPU). So engineers resorted to adding more CPU cores but the more CPU cores they add the more difficult it becomes to write code that distributes the workload evenly on all cores, the more cores you pile up, the more severely the problem is compounded.
You end up with a multicore core CPU that only sees a limited number of its resources getting any usage while running the majority of code. This means as a designer you end up with wasted transistors, inflating the cost to manufacture the processor without any tangible gain.

Engineers hit a brick wall while trying to improve the performance of CPU designs without blowing their power budgets or resorting to extremely complex and difficult design methods. If we were to get faster processors the processor design game had to change and we had to turn to a technology that did not rely primarily on frequency or complexity to scale.

The Solution To The Problem

The answer was GPUs, parallel processors which we can easily continue to scale with Moore’s law.
We can continue to spend transistors to add more parallel processors to each design instead of trying to push frequencies or complexity of a handful of CPU cores. This ensured that we could continue to scale the performance of our designs for the foreseeable future.

HSA

Parallel processors existed for years but were limited to a few applications. They have always been used in High Performance Computing “HPC” and graphics processing among other applications. Graphics processing was an obvious target for parallel processors, if you needed to process colors for millions of pixels tens of times every second a CPU was simply not going to cut it and thousands of smaller, slower & more efficient processors were perfect for such an application.

But there was a trick, not all code can be applied to GPUs and the grand majority of programmers in the world were either used to programming for a single fast serial processor or they learned programming from a such a person. After all the entire industry relied on CPUs for several decades & If we were going to turn to parallel computing in the mainstream to continue the scaling of performance we had to figure out how to make it less challenging and more accessible for programmers to write code for these types of processors.

AMD made it very clear from the beginning that their goal was to build the ultimate heterogeneous processor and all the evidence from Intel’s past actions and future roadmap is that they are indeed pursuing the same.
AMD being the smaller and more agile company was faster to respond to the changes in the industry.
AMD was able to more quickly adapt its vision to practice and mold it into a strategy that began to bare fruit.
The result was the development of HSA or (Heterogeneous System Architecture), the product was Kaveri and the goal was to chase the untapped potential of heterogeneous designs and begin a new era of computing where performance would scale again at the rate of golden age Silicone Valley.

We cater to your constant need to remain up to date on today’s technology. Like us, tweet to us or +1 us, to keep up with our round the clock updates, reviews, guides and more.

This post has 53 Responses
  • R6ex

    Isn’t Denver about increasing parallelism as well (like HSA)? No?

    • https://twitter.com/TecFanatic Khalid Moammer

      It is, will be talking about that in Part II :)

    • http://FOuRtune.org/b renzo

      Hope it comes soon, I would like to hear your take on the differances, both advanteges and disAdv. Btw nice introduction, hope you keep the rest as detailed or more.

    • https://twitter.com/TecFanatic Khalid Moammer

      Glad you’ve enjoyed it ! will definitely try to add more color & discuss the intriguing details.

    • Michael F

      Yep, but NVIDIA is at about the same stage with HSA on Denver as Intel is with Broadwell; or more bluntly, AMD has a large technology lead, especially with them putting ARM, x86, and GCN cores all on the same chip now. However, AMD lacks the performance and efficiency of each of those components compared to NVIDIA and Intel. As far as the furutre of the competition in the industry, I’m eager to see whether AMD can get the performance of its components up to that of its competition, or if Intel and NVIDIA can get to AMD’s level of HSA proficiency first. And while we’re at it, let’s not count out Qualcomm, they still make the best ARM CPUs and some darn good mobile GPUs.

    • Zlob Ster

      ‘AMD lacks the performance and efficiency’? Hello!?!? Stop swallowing everything the marketing propaganda machine is spoon-feeding you. AMD has Performance and Efficiency.
      AMD only lacks in phone SoC dept. for now.

    • Michael F

      I was trying to condense my words somewhat. Their GPUs perform roughly the same as NVIDIA’s. However, just look at their TDPs on spec sheets, NVIDIA is pretty far ahead there, even if it is for cutting 64-bit floating point capabilities. Then if you compare their CPUs to Intel, well I don’t think that I need to tell you how much faster and more efficient Intel is. Also, I know that TDP isn’t a great measure of total power usage, but it is good enough when the numbers are as far apart as they are. I don’t have any personal bias for any of the companies; I own hardware from all 4 of the ones I mentioned.

    • Zlob Ster

      Let’s drill down a bit:

      ‘Their GPUs perform roughly the same as NVIDIA’s.’ – Give or take, but always a better bang for the buck from AMD. If only AMD was spending more time to optimize drivers for every single game, those benches you are looking would be entirely dominated by Team Red;

      ‘ However, just look at their TDPs on spec sheets, NVIDIA is pretty far
      ahead there, even if it is for cutting 64-bit floating point
      capabilities.’ – Again? I’m really glad you are reading the brochures and stuff, but real world says ‘Hi!';

      ‘Then if you compare their CPUs to Intel, well I don’t think that I need to tell you how much faster and more efficient Intel is.’ – 7850K, the top of the consumer stuff from AMD has 35W CPU part. The rest of the TDP is for the iGPU. You tell me how’s that inefficient. As for performance perspective, I’ll leave this here to sink:

      http://cdn2.wccftech.com/wp-content/uploads/2014/02/Kaveri-HSA-Benchmarks-6_5_lo.png

      http://cdn2.wccftech.com/wp-content/uploads/2014/02/Kaveri-HSA-Benchmarks-6_4_work.png

      http://cdn3.wccftech.com/wp-content/uploads/2014/02/Kaveri-HSA-Benchmarks-6_3_creative.png

      Yes, those are cherry-picked. They are cherry-picked on purpose. The purpose is to show it’s not about the HW, it’s about how the lazy coders code. The purpose is to show the true potential of HSA (from a $150 chip). And we know who pays to keep the current coding dogmas, don’t we all?

    • RandomCruiser

      Come on don’t try to defend Buldozer arc, it is a losing strategy :). It is well proved that AMD cpu perf/watt is shittish, this is the reason AMD is virtually out of the server market.
      Your 35/45W parts are mobile labelled desktop, Intel too has similar SKUs for desktop, all are manufactured with an MP process.

    • Zlob Ster

      Yeah, they server products suck so bad, that I heard rumors AMD’s server dept. is being disbanded… AMD will make only console parts from now on.

    • Michael F

      The module design could have been ok, they just screwed up big time with their cache and scheduler (and a few minor other things). However, no matter how well it did, it would not have been as good as an SMP+SMT (simultaneous multi-processing = multi-core; simultaneous multi-threading = superscaler i.e. HyperThreading) implementation like Intel is doing for most users. For server tasks though, it appears that it could be a little better than SMP+SMT, but the gap is much smaller than it is for general use, so the general use design will be used for both applications to consolidate the R&D budget.

    • Michael F

      I like to see you selectively pulling from my comments. If you read carefully you will also see: “Also, I know that TDP isn’t a great measure of total power usage, but it is good enough when the numbers are as far apart as they are.” Yes, AMD regularly over-estimates their TDP (for GPUs, they’re fairly accurate for CPUs) and NVIDIA under-estimates, but there is still about a 20-25% difference in actual power usage. And your graphs basically prove the point I was making, AMD’s individual components (CPU, GPU, cache, memory controller, scheduler, etc) aren’t quite as good, but their progress with HSA is well ahead of the competition.

    • http://FOuRtune.org/b renzo

      Cmon guys we alk love AMD but they have a lot to catch up. They were stubborn with the CMT and now they pay the price of every Intel cpu mopping the floor with everything amd has. Soon we will get the fully perfected CMT in the form of excavator and after that the first AMD SMT arch and the long awaited die shrink or even FinFet and they will be competitive in this mostly singlethreaded world and maybe they win on both St and MT tasks and workloads. Dont expose yourself like that, I would bleed for AMD and I bleed RED but keep your dignity in a defeat its the only wsy not to get defeated again! Ronald and Zlob you know i love you guys :)

    • Ronald

      AMD lacks efficiency and performance lol..wake up from your dream

    • Michael F

      I was trying to condense my words somewhat. Their GPUs perform roughly the same as NVIDIA’s. However, just look at their TDPs on spec sheets, NVIDIA is pretty far ahead there, even if it is for cutting 64-bit floating point capabilities. Then if you compare their CPUs to Intel, well I don’t think that I need to tell you how much faster and more efficient Intel is. Also, I know that TDP isn’t a great measure of total power usage, but it is good enough when the numbers are as far apart as they are. I don’t have any personal bias for any of the companies, I own hardware from all 4 of the ones I mentioned.

    • Zlob Ster

      Nowhere close TBH but at least keep the shareholders and fanboys alike happy.

    • Ronald

      The difference is nobody gives a $hit about Denver

    • Zlob Ster

      And nobody will ever do. Just like with Tegra. Lots of smoke and marketing, 0 real-life value.

    • Michael F

      I am currently using a Tegra K1 dev board for some robotics work, and I can tell you that it does matter, a lot. The work I’m doing with it would not have been possible this time last year for less than $10k, now it is just $200. So Tegra does matter, many more people will be able to do a lot more now. Also, Denver is just a core architecture, the chip will still be Tegra.

  • Gaja

    So, we’re moving away from Multi-Core era, while still most of the software isn’t properly coded to use multi-core CPUs, including DirectX 11..

    • https://twitter.com/TecFanatic Khalid Moammer

      If you look at professional workloads that actually need a lot of horsepower like video rendering you’ll find that these workloads are very well optimized for multiple cores.

      Games are more complex and there is huge variation from game to game and from engine to engine.
      Some game engines have exceptional multi-core support and would be able to keep six or eight cores very busy, other engines are still very single thread/core reliant.

      On the API side of things we have Mantle which has thankfully solved the issue by introducing multi-core rendering. DX12 and future OpenGL releases will also be significantly less single-thread reliant.

    • Gaja

      Video rendering is optimized because it’s an easily parallelizable job.. Of course a lot of algorithms simply can’t be parallelized, but the problem is that there’s still too many software which can be parallel but isn’t..

    • https://twitter.com/TecFanatic Khalid Moammer

      Completely agree !

    • http://FOuRtune.org/b renzo

      Exactly! Like World of Tanks and its Big World engine which uses only one core and very ineffectivly. And there is the CryEngine ehich is very effective and uses up to 6 cores! Mantle pushed MS to stop sending water to Intels mill, and release dx12 which is highly similiar to Mantle.

    • Michael F

      Like Khalid said, professional applications are very well optimized for our current parallelism models (Adobe software may be the exception). But it may also be to our advantage to move on now, while most developers haven’t settled into the multi-core level parallelism; if they were fully taking advantage of multi-core capabilities, they may settle there for a while, but if the whole HSA thing takes off before they settle again, then they may be more likely to jump straight to adopting programming for HSA.

    • Gaja

      Yeah but HSA programming includes multi-core programming.. So they have to learn multi-core programming anyway..

    • Michael F

      Yes, and no. Most multi-core oriented programming is intended to use 2, or 4 cores; most GPU oriented programming and the very best of multi-core oriented programming is intended to use N cores, it requires a somewhat different mindset. Through proper use of APIs like OpenCL 2.0 (which is only experimental right now, but as a hobbyist I’ve played with it a little), using good AMD kernels, you will effectively jump over multi-core programming straight to HSA because multiple CPU cores and GPUs are treated as almost the same at write time, a good compiler and hardware task scheduler differentiate automatically.

    • mdriftmeyer

      The OpenCL 2.0 API Specification is stable [ratified Mar 18, 2014]. The implementations are experimental.

    • Zlob Ster

      I personally find it a pain in the @$$ to code for HSA right now. Maybe it’s just me being n00b, or HSA is just not that coder-friendly?

    • Michael F

      Most people have a lot of trouble with it, myself included. That and the vast majority of projects are well suited for it. I took a handful of coding classes in college (I was not a computer science major) and I don’t remember ever doing or learning anything that was adept to multi-threading, and this was only 2 years ago.

    • Zlob Ster

      Exit nVidia, enter Mantle.

    • Robert Severson

      Squat and gobble moar….. AMD hasn’t finished with you yet.

  • FrankVVV

    So does this mean the iGPU will finally be used even with a separate GPU card in your system? Right now my Haswell CPU has 31% of silicon doing nothing. Seems pretty useless to me.

    • jdwii

      Yeah their is people who bought APU’s and 47% of it is setting their where more cores would be useful

    • Zlob Ster

      A fine example! Just like with your brain. You have so much of it but it’s still doing nothing!

      Now, imagine when you actually utilize your entire brain. Fascinating, isn’t it? The only difference here is that AMD has already utilized the APU’s potential almost entirely. Too bad the same can’t be told for your brain…

    • http://FOuRtune.org/b renzo

      Hahaha Zlob uses Burn. Its super effective! You made me lol hard, sry jdwii, but he got you good. :)

    • albert89

      Unfortunately the code in win8 is only utilising 50% of AMD’s APU. As for HSA its closer to zero except for a few third party software. Windows is predominantly written for Intel ! But I don’t expect things to change much for these morons any time soon.

    • Zlob Ster

      Spot on, dear boy! Was it a real APU, then we’d be talking.

  • jdwii

    This probably the worse article i ever read when it compared the 2 designs

    • http://FOuRtune.org/b renzo

      Its part one bro, its the introduction, some people donk know this and those who know will surely like to read it again to solidify their knowledge.

    • jdwii

      “to solidify their knowledge” That was funny thanks ha ha

  • OMFG

    There’s a lot of R+D in heterogeneity and it started with IBM’s and Sonys Cell processor with the PPE and all the SPUs. An interesting programming model which started evolving from a try to harness the Cell potential is OmpSs, which is like OpenMP but with a different semantic, expressing parallelism with tasks and their data access dependencies. The new task constructs in OpenMP 4.0 were first introduced in OmpSs. Heterogeneity is managed as if you had normal code but the tasks instead of being written in C++, Python or Java are written in the accelerator language, for example OpenCL or CUDA. You could add all the new languages you wanted in order to support more accelerators, as it’s an open source and free software (GPL) software. The only con it has is that currently only works on GNU/Linux.
    And instead of managing transfers by using software algorithms you could automatically handle them with hardware DMA mechanisms. That should come with newer hardware capabilites like truly unified memory.

    • OMFG

      Well, the transfers are already managed automatically, by the ompss runtime, but they could be accelerated if hardware does the transfers.

    • OMFG

      Ah… and you can have multiple versions of the same task: multiple non-accelerator and accelerator code versions of the “taskified” code/functions, the runtime will execute the quickest version or if there are threads waiting, it will use slower versions too to fill up resources.

  • http://WCCFTech.com/ Ali Naqi

    Great article

    • Guest

      Thank you, I’m quite flattered !

    • https://twitter.com/TecFanatic Khalid Moammer

      Thank you, glad you’ve enjoyed it !

  • http://FOuRtune.org/b renzo

    Soon the APU will show its true power, maybe as soon as excavator. It will reasure devs to optimize and use the big power behind HSA, it will win them many points with AMD users and a few with Intel users. Its a win win, Samsung knows what they r doing, HSA and in memory processing are the future. Everything is going AMD way, dram stacking will make APUs invincible, Intel pushed the node now AMD can simply slide down to the 10nm wall that Intel is going to hit hard.

    • Zlob Ster

      Even though this is true, I’m afraid that LLVM and the entire HSAIL stack is a hassle for newb coders like me. With HSA you basically have compiler front ends -> LLVM IR -> compiler back end -> HSAIL. Then at runtime you get language runtime -> HSA runtime -> possibly finalizer -> system space.

      Even though I’m a big supporter of HSA, it’s a bit of a hassle to create and maintain fast and robust code for it. For me at least. :`(

    • http://FOuRtune.org/b renzo

      No shame in that! Look at Adobe flash player, worst shitte ever made.Million updates, 0% gain. And instead of optimizing for gpgpu or HSA, they invest money in sabotaging html5. Scum of the earth!

  • albert89

    Lets face it, most programmers are still writing to the single threaded CPU. And that’s where AMD loses. And regrettably where Intel wins (not to mention its anticompetitive trade practices world wide). But if AMD had pulled its finger out and finally fixed it’s single threaded CPU portion of the APU then it would have one hell of a SOC. But that’s half the problem right there. Because developers don’t find it economical or are too stupid to not only write to GPU or HSA (which is a more practical alternative). How about other alternatives like open CL or GL. How long did it take developers to jump from a single core to a multicore environment ? Years and counting. Adoption of new hardware highways seems to be a slow process by developers. I have no doubt that if AMD was the same size as Intel then things would have been much different. Phenomenal performance gains by LibreOffice from HSA technology makes a strong case for adoption across the market.

  • AA

    Single threaded application should just stop, seriously it doesn’t benefit anyone other than intel and the coders. All application should be at least dual threaded as cpu nowadays have more than 1 core. HSA will be good but I don think it will be in wide spread use until maybe at the earliest in 2016. Maybe Microsoft office can take the initiative, all the web browsers can be multi threaded

  • Nvidiots

    http://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262.html
    This is the article that convinced me that the APU was going to take AMD places.



Creative Commons License This work by WCCF (Pvt) Ltd. is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Permissions beyond the scope of this license may be available on the Terms and Conditions page. WCCF 2011 is designed by UzEE, Inc.
Ran 18 queries in 0.828  seconds.