Intel HyperThreading Bug in Kaby Lake and Skylake Processors Was Addressed By BIOS Fix In April 2017

Author Photo
Jul 1

A few days ago, there was a lot of noise regarding a critical bug in Intel’s microcode that could cause HyperThreading to lead to data corruption and unstable system behavior. The bug was spotted by Debian developers and would have been quite the blow to Intel had it been something that you could easily run into. The reason we have been so late in writing this story is because we were waiting for the official response, which we now have.

Motherboard vendors are responsible for pushing out the BIOS fix for Intel Skylake and Kaby Lake HT bug

The story started when the following was posted by Henrique de Moraes Holschuh, a Debian Linux developer. Needless to say, it caused a fair amount of panic and speculation as to how dangerous the bug might be to an average user. In times like this however, common sense can usually show the way. There are a very large number of Skylake processors out there right now and if this was something that truly impacted the everyday user then it would have been publicly noticed much earlier. Since this didn’t add up, I did not publish a news story on this bug at the time but reached out to Intel for comment.

This warning advisory is relevant for users of systems with the Intel processors code-named “Skylake” and “Kaby Lake”. These are: the 6th and 7th generation Intel Core processors (desktop, embedded, mobile and HEDT), their related server processors (such as Xeon v5 and Xeon v6), as well as select Intel Pentium processor models.

This advisory is about a processor/microcode defect recently identified on Intel Skylake and Intel Kaby Lake processors with hyper-threading enabled. This defect can, when triggered, cause unpredictable system behavior: it could cause spurious errors, such as application and system misbehavior, data corruption, and data loss.

People also quickly managed to find the bug in Intel’s list-of-admissions-of-guilt errata where the following was stated:

Under complex micro-architectural conditions, short loops of less than 64 instructions that use AH, BH, CH or DH registers as well as their corresponding wider register (eg RAX, EAX or AX for AH) may cause unpredictable system behaviour. This can only happen when both logical processors on the same physical processor are active (when HyperThreading is active).

According to the Debian developers, this HyperThreading bug was much more common than the errata led us to believe and could easily impact everyday workings. They encouraged people to turn off HT (assuming you had a Skylake CPU) immediately to avoid data loss and corruption. The real story however, appears to be closer to Intel’s version of events. The following is what Intel had to say about the affair (via TomsHardware):

The issue has been addressed with a fix that started rolling out in April 2017. As always, we recommend checking to make sure your BIOS is up to date, but the chance of encountering this issue is low, as it requires a complex number of concurrent micro-architectural conditions to reproduce.

So it would appear that the official errata was not exaggerating when it mentioned “complex micro-architectural conditions” although to be fair, they say that in all bugs so I can understand the lack of trust. Furthermore, it would appear that Intel developed a workaround a couple of months ago (before this issue came into the spotlight, but well into the lifecycle of Skylake). Though the fix has now been released by Intel to motherboard vendors, even if it is not applied, the probability of an everyday user running into it is slim to none – so don’t bother disabling HyperThreading.

Since the microcode update was given to motherboard vendors in April 2017, they should have rolled out the BIOS update by now. If you are a Skylake or Kaby Lake owner, it would be a good idea to keep track of BIOS updates form your motherboard vendor to avoid running into this issue, just in case. There is a reason why this technology is called bleeding edge and this is just one more example of why if you are in a data sensitive industry, you should stick to tried and tested technologies till the ‘early adopters’ have had a run. That said, the fears on this matter were a bit exaggerated.