Brief Guide to Intel’s HyperThreading Technology

Since the first one was deleted and more has come out since I wrote the first one on the old forums, this one will be a version with the latest information available.

First you might be asking, what is HyperThreading? HyperThreading is Intel’s name for something called SMT or Simultaneous MultiThreading. Basiaclly, you take one processor core, add some extra registers and miscelaneous other bits and pieces and you get two processors that share the same core and resources. When you open Task Manager in Windows NT/2000/XP on a system with HyperThreading, you see twice as many CPU utilization graphs as you have physical processors in the system. These extra processors are known as logical CPUs.

This is all well and good, but what does it matter to you, you may ask. Obviously Intel did it for a reason, and their reasoning is deceptively simple: only 30% of the Pentium 4’s processor resources are being used when the processor is at full load, so it would make sense to come up with some way to address the remaining resources. Now, you might say that with only 30% of the CPU being used at once making it seem as though it’s two processors would make 60% usable and double your performance, but this is simply not true. At best a 20-30% improvement in desktop and workstation applications can be achieve, and at worst it performs 20-30% worse with HT enabled on a processor (HT can be disabled through BIOS). There is a limit to how much you can push through a particular unit on the Netburst core before you run out of resources and start getting contention which reduces performance. Numbers are available at Anandtech, and many other sites that show how HT can decrease performance. Optimal performance is only achieved if one logical processor gets your floating point data and the other one gets the related integer data, so that both threads can be crunched simultaneously with little interaction with each other. An example of this would be in an FPS game where the 3D and physics are highly FPU (floating point unit, or decimals) intensive but the AI, positioning data and other miscelaneous bits and parts are ALU (arithmatic logic unit, or integers) intensive.

Another area that sees a large boost from HT is the server arena where increases can be on the order or 30-40% per processor. In a server often times a processor will be waiting for data from main memory (which takes several hundred clock cycles) or from a disk (several thousand clock cycles). During this time the second logical system processor can take over and use the core while the first logical processor is still waiting for data, and then when the second is finished processing the first will likely have its data and be able to switch off. Very, very good for improving performance, and as more and more server applications begin to become optimized for HT the peformance will increase up to the 30-40% we should expect to see, within a year or so.

Now, there is one little issue with HT. When the Pentium 4 3.0 GHz comes out (which will be HT enabled … Xeons, the version of the Pentium 4 that will allow you to place multiples in a system, already are), don’t plan on loading Windows 98 or ME on it. These can only address the first logcal processor. No, you’ll need Windows 2000 Professional (which can use up to two logical processors) or Windows XP, which is the OS that will allow you to use up to one physical processor in HE with as many logicals as you like or two physical processors for Pro.

Hyperthreading in Pentium 4’s should be out before the end of the year and you can bet money on John Carmack making sure that Doom 3 runs will on HT Pentium 4’s. So just wait, it’s a good thing. Any more questions about it feel free to post.