Intel Hardware Level Speculative Execution To Blame For Kernel Bug – KPTI Workaround Introduces Performance Hits Up To 23% On Average
It has been a little over 48 hours since the Intel kernel bug was first reported and while we don’t have an official comment from Intel yet, it looks like there are some additional details of what the actual problem is. Those in the know are allegedly under embargo but courtesy of AMD's statement on what ‘isn’t’ wrong with their x86 processors, we have a fairly good idea of what the bug entails. The repercussions of this could have severe consequences for Intel’s standing as a company and as a supplier of x86 microprocessors – particularly in the enterprise ecosystem.
AMD spills the beans: Hardware level speculative execution to blame for the Intel Kernel bug; cannot be patched using a microcode update and will require OS level KPTI to patch
Before we get into any other details, a background on the problem. The bug was discovered at a hardware level and pertains to an exploit that is capable of granting kernel level access to malicious parties. Since this exists at the hardware level, a patch via microcode is apparently not possible. The only known workaround is via the OS, which would require an OS re-design which Windows is working on and Linux has already rolled out.
Word on the street is that Microsoft is scrambling to get this patched come Tuesday and the changes were already seeded to beta testers running insider builds. Here’s the catch though, any patch could introduce a crucial time penalty to the system which basically means that in some instances the CPUs could drastically slow down. We have seen numbers quoted of up to 30%, but the conservative estimates point to a roughly 17% slowdown. So what exactly is the problem?
Well, before we get into that, here is the statement from AMD, which basically spilled the beans on what the issue is:
AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.
Since Intel has been very tight lipped about this issue, we can deduce fairly easily from this statement that the problem has to do with speculative references in Intel’s processors, which AMD CPUs don’t have. Speculative execution is basically a form of pre-emption which tries to predict what code is going to be run, and then fetch it and execute it before the actual order comes through and keeping pipelines ready. The point of this is to have the kernel absolutely ready for any command, instead of letting it wait around.
The problem, as it is clear from AMD’s comments, is that you can exploit this feature to speculatively execute code that would normally be blocked as long as you stop the actual code from running before a check can be performed. What this essentially means is that a ring-3-level user can read ring-0-level kernel data by using speculative execution since the privilege check won’t actually be performed till the code is executed on the main.
The kernel layer is currently present in all processes’ virtual memory address space to ensure a fast handover during code execution but is completely invisible to all programs. The kernel will basically try to predict what code will be executed next and when a program makes a system call to the kernel, it will already be ready and primed for the handover. This can significantly increase execution times but as it is clear now, also represents a very troublesome security flaw since no privilege check is present at the kernel stage.
The only way to get around this hardware level feature is to use what is called a Kernel Page Table Isolation (KPTI) technique which will make the kernel completely blind to the system and remove it from the virtual memory space until a system call occurs. Basically, where it was an invisible stage hand hidden just behind the curtain, now it won’t be on the stage at all until it’s called. Needless to say, this could introduce severe time penalties in context-switching heavy situations where a lot of system calls are required. The Linux team also mulled over FUCKWIT (Forcefully Unmap Complete Kernel with Interrupt Trampolines) which should give you an idea of how frustrating the bug is for developers.
According to some sources, this number can range anywhere from 5% to 30% depending on which type of processor you have since modern CPUs have a feature called PCID which can reduce the performance hit. According to an existing KPTI workaround posted at Postgresql, you should expect a 17% best case slowdown and a worst case 23% slowdown. In any case, all sources agree that a slowdown will almost definitely occur and this is not something Intel can simply patch with a microcode update. AMD processors at this time are unaffected since they do not utilize speculative execution.
So the obvious next question becomes who will this impact and how will this impact end users. Well, the good news is, if you are reading this article you are probably a gamer or a PC tech enthusiast and you will see almost no difference once the patch is applied (gaming and basic rendering are not context-switching heavy payloads). Enterprise clients like Google EC2 and Amazon Compute Engine however, will be drastically affected since they use VMs which this can severely compromise. Secondly, as a general user, your passwords and other sensitive information may be stored in the kernel memory and this bug could be used to access that information (Update: Working proof of concept of password being pulled from kernel mem leak over here).
Windows is expected to roll out a patch come Tuesday and Apple should also follow up soon. All that remains is an official response from the company itself.
Update: First benchmarks with KPTI work around from Phoronix showing performance degradation (1% to 53% depending on use case), gaming not affected
The folks over at Phoronix did some preliminary synthetic testing and were able to observe performance degradation that ranges from 19% to a massive 53% depending on the exact situation and benchmark tested. Scenarios that should show no effect are showing less than a 1% deviation from the initial benchmark. We expect other publications to do their own benchmarks as well once Intel responds.
Update: Intel's official response
Intel has rolled out their official response on this and assured that the bug won't affect the average user (as we have already stated) and that they are actively working with other companies to resolve this:
Intel and other technology companies have been made aware of new security research describing software analysis methods that, when used for malicious purposes, have the potential to improperly gather sensitive data from computing devices that are operating as designed. Intel believes these exploits do not have the potential to corrupt, modify or delete data.
Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect. Based on the analysis to date, many types of computing devices — with many different vendors’ processors and operating systems — are susceptible to these exploits.
Intel is committed to product and customer security and is working closely with many other technology companies, including AMD, ARM Holdings and several operating system vendors, to develop an industry-wide approach to resolve this issue promptly and constructively. Intel has begun providing software and firmware updates to mitigate these exploits. Contrary to some reports, any performance impacts are workload-dependent, and, for the average computer user, should not be significant and will be mitigated over time.
Intel is committed to the industry best practice of responsible disclosure of potential security issues, which is why Intel and other vendors had planned to disclose this issue next week when more software and firmware updates will be available. However, Intel is making this statement today because of the current inaccurate media reports.
Check with your operating system vendor or system manufacturer and apply any available updates as soon as they are available. Following good security practices that protect against malware in general will also help protect against possible exploitation until updates can be applied.
Intel believes its products are the most secure in the world and that, with the support of its partners, the current solutions to this issue provide the best possible security for its customers.