I still remember a newspaper clipping my father used to show me when I was seven years old and it was the year 1990. The clipping shows a NASA egghead standing before a massive thing and the text reads: "... at NASA, we have extreme super computers that run at massive 2GHz!" At that time, a 2GHz was unbelievable and would normally make us wonder just what are we supposed to do with this computing power? How time flies and how it changed our very ways of thinking. Right now, our PCs and consoles, when idle, are helping in DNA research!
In the ever-revolutionizing world of processors, one name always comes to mind: Intel. The other key player is AMD. Currently, the latest offerings from AMD are competing with Intel’s Core 2 architecture based processors. Intel, after the massive success of Core 2, went ahead and has introduced Core i7 architecture. Currently king of the ring, the Core i7 has the most sophisticated and extremely powerful processors for Home and Business users.
Within the Core i7 series are four variables to choose from: Core i7-920, Core i7-940, Core i7-965 Extreme Edition, and last but far from least: Core i7-975 Extreme Edition. The clock speeds starts at 2.66GHz and all the way up to 3.3GHz. The Core i7-975, which we are looking at today, has its clock stabled at 3.33GHz and with Intel Turbo Boost, clock speeds at 3.45GHz (all cores) and 3.6GHz (single core).
Below are the five different Core i7 processors currently available. Please note that the entry-level Core i7-920 is alone very capable of outperforming the mammoth Core 2 QX9770!
| Core i7 975 XE | Core i7 965 XE | Core i7 950 | Core i7 940 | Core i7 920 | |
| Clock frequency | 3.33 GHz | 3.20 GHz | 3.06 GHz | 2.93 GHz | 2.67GHz |
| Quick path speed | 6400 MT/s | 6400 MT/s | 4800 MT/s | 4800 MT/s | 4800 MT/s |
| Memory controller | DDR3-1333 | DDR3-1333 | DDR3-1066 | DDR3-1066 | DDR3-1066 |
| Price | $999 | $899 | $599 | $562 | $284 |
| Parallelism | 4 Physical Cores, 8 Logical Processors (Hyper Threaded) | ||||
| Memory standard | Triple Channel DDR3 | ||||
| L2 Cache size | 256 KB per core | ||||
| L3 Cache size | 8 MB | ||||
| Transistor count | 731 Million | ||||
| TDP | 130 W | ||||
| Fabrication process | 45 nm | ||||
The Core i7 Extreme 975 runs at 3.33GHz, which is a 133MHz faster than its 965 Extreme. Apart from the 133MHz increment, the Core i7-975 offers the D0 stepping in response to Core i7-965 C0. The Core i7-975 offers lower power consumption and provides greater overclocking headroom. The quad-core Bloomfield processor, Core i7-975, uses 1 MB of L2 cache and 8 MB of L3 shared cache memory. The TDP for this processor is be 130W (TDP = maximum peak wattage).
Architecture of Core i7: Nehalem
The new Intel Core i7 (Bloomfield) processors have the following deeper features:
- Four processing cores
- Support for SMT (simultaneous multi-threading), allowing up to 8 threads to be processed simultaneously
- 32 KB instruction + 32 KB data L1 cache per core
- 256 KB L2 cache per core
- Large 8 MB L3 cache shared by all 4 cores
- An integrated memory controller (IMC) supporting three channels of DDR3 memory
- Memory clock speeds of up to 1333 MHz
- Memory bandwidth of up to 32 GB/s
- Up to six memory sockets
- The new Intel Quick Path Interconnect (QPI) replaces the front side bus (FSB)
- Addition of seven new SSE4 instructions
- Monolithic processor design (all four cores on a single die)
- Fabricated using Intel's 45nm high-k process technology
Like the Core 2, the Core i7 has the 45nm fabrication. The Core i7 has a 731 million transistors count. Nehalem processors will default at 64KB of L1 cache and 256KB of L2 cache. L3 cache of 8MB for quad-cores is shared among the four cores.
Some of the key micro architecture design features.
One of the most beneficial features for Intel is the modularity of the architecture, where they can change the design of their processor, by adding more cores, removing cores, even adding an intergraded GPU in the future.
In the Core 2 processors, there were higher latencies in inter-core communications. This was because there were two dies with two cores linked together. The Core i7 processor series have its cores joined in a single die! This is called Monolithic die. This speeds up operations with multiple program threads as the time required for moving data between cores is greatly reduced.
Another innovation that eliminates latencies found in previous series is the feature of an on-die, triple channel, DDR3 memory controller that support three channels of DDR3 memory per socket, with up to the three DIMMs per channel. The Core i7 is capable of pushing more bandwidth with much reduced latencies.
Moving the memory controller on-die, also allowed Intel to design a new serial interconnect that resides between the CPU and chipset, dubbed QPI (Quick Path Interconnect). Moreover, with the memory controller on-die, that means there is no more traditional front side bus. QPI is a serial point-to-point interconnect that offers up to 25.6GB/s of bandwidth per port over 40 data lanes--20 in each direction.
Hyper Threading is back with the new Core i7 processors. Hyper Threading was first introduced in the Pentium 4 days and end users would see two processor because of it. Hyper Threading allows the Core i7 quad core processors with four physical cores to be recognized as eight virtual cores by the system's OS because each core is Hyper Threaded. Unlike the old HT on Pentium 4, the HT is far more efficient and produces clear performance gains on individual cores.
New memory hierarchy on the processor, pushing the emphasis on a big L3 cache compared to previous generations.
Above we have a die shot of Nehalem with each of its major sections labeled. As you can see, the memory controller resides along the top edge of the die, with miscellaneous I/O and QPI links along either edge. The four executions cores are lined up through the middle, with an instruction queue in between, and the shared L3 cache below.
Core 2 CPUs had L1 and L2 caches only. Core i7 CPUs feature L1, L2, and shared L3 caches. There caches are distributed as follow:
- 64K L1 cache (32K Instruction, 32K Data) per core
- 1MB of total L2 cache (256K per core)
- Shared 8MB of L3 cache
With the Core i7, Intel is also introducing new "Power Gates". Power Gates helps in reducing leakage power and more importantly, they allow idle cores to enter the C6 state (deep sleep) while other cores may be under load. Core i7 processors also feature integrated power sensors and an integrated Power Control Unit that allows the processor to perform real-time monitoring of each core's current, power, and voltage states. Integration of these sensor and control unit enables the CPU to divert power from idle cores to active cores. Intel calls this "Turbo Mode"
So, if Turbo Mode detect abnormal usage of cores, it can allocate more power to upgrade the default clock from 3.33GHz to 3.45GHz. In addition, if a single core is being hammered, Turbo Mode will put the other cores into C6 and re-direct all the power obtained to the core being used effectively over clocking it to that max clock of 3.6GHz!
The X58 Chipset and Socket LGA 1366
The Intel Core i7 processors fit only on the Tylersburg chipset or more commonly known as the Intel X58 Express chipset. X58 and the P45 share the same ICH10 South Bridge but the X58 differs otherwise in just about every aspect:
- The X58 Express will use the new LGA1366 socket (also known as Socket B)
- No more memory controller
- Intel QuickPath Interconnect (QPI) as the interconnect between the Core i7 processor and the X58 Express.
The new Core i7 architectures differ from anything Intel built before by leap and bounds. This, therefore, means the processor, even though being a 45nm, is now bigger in size. The socket 775 cannot hold it anymore and hence a new bigger socket is place on the X58 Chipset: Socket 1366
Overclocking
If you know overclocking, you will know it all about the Front Side Bus speed or simply FBS. However, FBS has been officially annihilated in the Core i7 architecture. This changes things when overclocking but the concept remains the same.
Visit BIOS and you will easily find a 133 MHz register. Imagine this register as your FSB setting. Play around with CPU voltages and the multiplier and even on the stock air cooler you will be amazed what the Core i7 975 can do!
Jot down the defaults before beginning. The default multiplier is 25 and it is dynamic. 133 MHz times a multiplier of 25 is ~3.3GHz. Please note that without any changes, when in Turbo Mode, the multiplier is set to 26 automatically on a single core. Therefore, the processor is already overclocking itself even before you begin.
There are heavy recommendation NOT TO TOUCH voltages for a Core i7 975 so I did not bothered with voltages, especially because I was playing with an Engineering Sample anyways. As with all overclocking, simply increased the multiplier until either the system crashes or temperatures get out of hand. I jumped the multiplier to a 31 to get something over a 4GHz but the temperatures went way off board: 76°C. I’m using air cooling solution from Thermalright. I had to fall back a step and was able to get a stable somewhat-hot 4GHz overclock. Remember, no voltages touched. The overclock was limited because of rising temperatures only. Give me a bigger and better air cooling solution or a liquid cooling solution and I can only imagine the total overclocking potential of this beast!
CPU-Z
The numbers are just so beautiful! Please do not panic about the Core speed. My multiplier was way down when taking the screen shot.
As mentioned earlier, there are three levels of caches. L1 of 64K is dedicated to each core. L2 of 256K x 4 divided for each core. L3 of 8MB shared.
Straight from Intel: "The Intel® Desktop Board DX58SO is designed to unleash the power of the all new Intel® Core i7 processors with support for up to eight threads of raw CPU processing power, triple channel DDR3 memory and full support for ATI CrossfireX and NVIDIA SLI technology. Today’s PC games need a computing platform that delivers maximum multi-threaded CPU support and eye-popping graphics support."
The two memory modules of 1GB each running completely untouched.
More detailed information for both RAM modules.
Hardware and Software
- Mainboard: Intel Desktop Board DX58SO
- Processor: Intel Core i7 975 Extreme Edition (Engineering Sample)
- Graphics Cards: Asus EAH3650 SILENT/HTDI/512M
- Memory: Qimonda (2x1024MB) DDR3 @ 533 MHz
- Power Supply: Asus Z-45FP
- OS: Windows 7 64-bit
- DirectX: 9/10
- GFX Driver: ATi Catalyst v9.9 x64
Shots of build
Benchmarks
Processor Arithmetic: Benchmarks the ALU and FPU processor units. Shows how your processors handle arithmetic and floating point instructions in comparison to other typical processors. Such operations are used by software in typical tasks.
Dhrystone (MIPS) - higher results are better, i.e. better integer performance.
Whetstone (MFLOPS) - higher results are better, i.e. better floating-point performance.
Processor Multi-Media: Benchmark the (W)MMX(2), SSE(2/3/4), AVX processor units. Shows how your processors handle multi-media instructions and data in comparison to other typical processors. Such operations are used by more specialised software, e.g. image manipulation, video decoders/encoders, games.
Multi-Media Integer (Pixels/s) - higher results are better, i.e. better integer performance.
Multi-Media Single/Double Float (Pixels/s) - higher results are better, i.e. better floating-point performance.
Multi-Core Efficiency: Benchmark the multi-core efficiency of the processors. Shows how efficient the processor cores and their inter-connects are in comparison to other types to other typical processors. The ability of the cores to process data blocks and pass them to another core for processing (producer-consumer paradigm) of different sizes and different chain sizes is measured. The efficiency of the inter-connect between cores is thus benchmarked; however, the number of cores (and processors) also counts as more data buffers can be processed simultaneously (aka "in flight"). True multi-core processors that have shared L2/L3 caches will thus perform much better than cores that have separate caches and are connected by the traditional FSB.
Inter-Core Bandwidth (GB/s) - higher results are better
Inter-Core Latency (ns) - lower values are better
Power Management Efficiency: Benchmark the power management efficiency of the processors. Shows how efficient the power management of your processors is in comparison to other typical processors. The ability of the processors to step-down in frequency and voltage at different workloads is measured. The more a processor steps down in both frequency and voltage the better the score at the specific workload. The test stops when the workload is too great the processor even at 100% efficiency. The ALU/FPU score is a geometric mean based on the whole range of workloads; thus the power of the processors does matter in obtaining a higher score. The Power Efficiency score is a geometric mean based on the supported workloads only. Thus the power of the processors does not matter.
ALU Power Performance (MIPS) - higher is better
Power Efficiency - higher is better
Cryptography: Measures the cryptography efficiency of the processor units: encryption, decryption and hashing. Shows how your processors handle cryptographic operations in comparison to other typical processors. Such operations are used by software in typical tasks.
Both bandwidths - higher values are better
Physical Disks: Benchmark hard disks (i.e. the disk itself, not the file system). Shows how your physical disks connected to the storage adapters or hosts compare to other disks in a typical computer. As the test measures raw performance it is independent on the file system the disk uses and any volumes mounted off the disk.
- Read Test: Sequential across disk
- Write Test: Sequential across disk
- Seek Test: random, full stroke
Drive Index: is a composite figure representing an overall performance rating based on the highest read or write speed across the whole disk. Thus the higher the better.
Access Time: is the average time to read a random sector on the disk, analogous to latency response time. Thus the lower the better.
Memory Bandwidth: Benchmark the memory bandwidth of your computer. Shows how your memory sub-systems compare to other computers in terms of bandwidth. The benchmark is based on the well-known STREAM memory benchmark.
Memory Latency: Benchmark the latency (response time) of processors' caches and memory. Shows how your processors' caches and memory sub-systems compare to other computers in terms of latency. The latency of caches is measured in processor clocks (i.e. how many clocks it takes for the data to be ready) as it is dependent on the processor clock speed. The latency of memory is measured in nanoseconds as it is typically independent on processor clock speed.
Integer Memory Bandwidth (MB/s) - higher results are better, i.e. faster memory bandwidth.
Float Memory Bandwidth (MB/s) - higher results are better, i.e. faster memory bandwidth.
Memory Latency and Speed Factor - lower is better
Cache and Memory: Benchmark the processors' caches and memory access (transfer speed). Shows how your processors' caches and memory sub-systems compare to other computers in terms of access.
Cache/Memory Bandwidth (MB/s) - higher results are better, i.e. faster memory bandwidth.
Speed Factor (MB/s) - lower results are better, i.e. less difference between processor cache speed and memory speed.
Conclusions
Enthusiasts only, please! I mean come on, how many of you are going to pay ~85,000 PKR for just a processor? In addition, not to mention a motherboard with a smallest price tag of 18,000 PKR. This is expensive business right here. However, if you are an enthusiasts and budgeting is the least of your concerns, this is the processor for you. It is not just another processor… it is a revolution. Intel went far ahead with the number of innovations on the Core i7 architecture. This truly is the fastest processor available on planet Earth.
Although the 965 with its C0 stepping can reach 4.0GHz with ease as well, it going to be phased out because the 975 is available for the same price. They could have included more multipliers but still the D0 stepping does make overclock a child’s play. If you are a multimedia editor who encode/decode most of his life, this is the processor for you. If you are a serious gamer… no question asked, just buy one! If you are an average PC user and even have the buying power, stay away from 975! Simple. You have to be an extremist to take advantage of the sheer power of 975. Heck, I mentioned earlier, the entry-level Core i7 processor, the 920, is capable of beating the past-monster Core 2 QX9770! This is a serious statement. If you render very dense and complex scenes in Poser, Maya, or 3DS Max, this is your processor of choice as seeing eight buckets working it out will be uber-cool.
However, there are a few things you might want to consider before buying one! The modular architecture allows Intel to literally 'add' cores to the monolithic die. There is a Xeon Nehalem, currently present, with six cores and all are hyper threaded. Therefore, there are 12 cores in the OS. The L3 cache is going to be upgraded and that is for sure. Next year, the Core i5 series will be launched and unfortunately, those processors will have a different size. Therefore, a different motherboard for them. If you want to go beyond Core 2, I would suggest go for Core i7 and skip Core i5.
I would give this processor a WCCF eXtreme Award… but it is already extreme.
Thank you to Intel for proving the killer sample.
Follow Wccftech on Google to get more of our news coverage in your feeds.
