Among the bugs that Apple patched in OS X 10.11.5 is CVE-2016-1828, a use-after-free I discovered late last year while looking through the kernel source. Combined with CVE-2016-1758, an information leak patched in 10.11.4, this vulnerability can be used to execute arbitrary code in the kernel. In this post I’ll document how I created rootsh, a local privilege escalation for OS X 10.10.5 (14F27).
CVE-2016-1828 is a use-after-free in the function
passing a crafted binary blob to this function, it is possible to invoke a
virtual method on an object with a controlled vtable pointer. I leveraged the
use-after-free to create a NULL pointer dereference, allowing the vtable and
the ROP stack to live in user space.
CVE-2016-1758 is a kernel stack disclosure in the function
bytes of uninitialized kernel stack are copied to user space. Those bytes can
be initialized to a known location within the kernel text segment by invoking a
system call prior to triggering the disclosure. After leaking the text segment
pointer, the kernel slide can be computed by subtracting the base address of
that particular text segment location from the leaked address.
I made several simplifying assumptions while developing rootsh. First, rootsh relies on SMAP being disabled, which means the exploit would have to be redesigned to work on newer (Broadwell and later) Macs. Second, I targeted the ROP gadgets at 10.10.5, since this was my initial development platform. Between 10.10.5 and 10.11 these ROP gadgets disappeared, so as written rootsh will fail on all versions of El Capitan. The exploit could be rewritten to work on El Capitan up through 10.11.3, but I chose not to. If you want to try rootsh on your own, you can set up a virtual machine running 10.10.5.1
Table of Contents
- Overview of the OS X Kernel
- Use After Free
- SMEP and SMAP
- Kernel ASLR
- Elevating Privileges
- Building the ROP Stack
- Running the Payload
Overview of the OS X Kernel
Before describing the exploit process, we’ll briefly look at the various pieces of the OS X kernel. The kernel, known as XNU, is composed of three major subsystems:
BSD: The BSD portion of the kernel implements most of the system calls, networking, and filesystem functionality. Much of this code is taken directly from FreeBSD 5.
Mach: The Mach part of the kernel is derived from the Mach 3.0 microkernel developed at Carnegie Mellon University. It implements fundamental services like memory maps and IPC primitives. User space programs access Mach services via Mach traps.
IOKit: IOKit is Apple’s framework for writing drivers for XNU. It is written in C++. Many C++ features (notably exceptions, multiple inheritance, and RTTI) are too complicated or inefficient to include in the kernel, so Apple provides its own runtime system called libkern.
When a user application communicates with a kernel driver, it often wants to
pass structured data objects like strings, arrays, and dictionaries. Libkern
makes this easy by defining container and collection classes
that correspond to the CoreFoundation objects users pass to the user space
APIs. These classes, which all inherit from
OSObject, are outlined in the
|XML tag||CoreFoundation class||Libkern class||Contents|
||Array of bytes|
||Array of characters|
||Reference to unique string|
||Array of objects|
||Map of strings to objects|
||Set of unique objects|
When a CoreFoundation object is to be sent to the kernel, it is first converted
into a binary or XML representation by
IOCFSerialize. The serialized data is
then copied into the kernel and deserialized using
OSUnserializeXML will call
OSUnserializeBinary if the supplied data is
actually a binary encoding.
OSUnserializeBinary function attempts to decode
the supplied data and reconstruct the original object. Often the deserialized
object is a container such as
OSDictionary containing several entries. In
order to minimize the size of the encoding when the same object is included in
a collection several times, the binary encoding format supports referencing
previously serialized objects by index. Thus, the decoding logic stores each
reconstructed object in an array so that it may be referenced by index later.
Presumably for efficiency reasons, this array is not an automatically managed
OSUnserializeBinary manually manages a
dynamically allocated array of
OSObject pointers. After each new object is
deserialized, it is appended to the end of the
objsArray array without
incrementing its reference count. This should be safe since each generated
object is stored in its parent: the parent increments the entry’s reference
count to keep it alive.
When an entry is referenced by index during deserialization, the object pointer
is looked up in
objsArray, stored in the local variable
o, and retained:
o is then released once it has been inserted into the parent
Use After Free
Unfortunately, this strategy does not ensure safety, since it is possible for
an object in
objsArray to be freed during deserialization, leaving a dangling
In a serialized dictionary, it is possible for the same key to be assigned a
value multiple times. Consider passing the following dictionary to
OSUnserializeBinary, presented in XML for readability:
When the second assignment to
a is deserialized, the string
bar will be
inserted into the dictionary via
setObject. Since the old
a is being replaced by
bar, the dictionary will release a
reference on it.
foo wasn’t retained when inserted into
objsArray, so the
only reference on
foo is from the dictionary itself; since no one else has a
foo, it is freed. This leaves a dangling pointer to
objsArray. When the object at index 2 is later referenced,
be called on the freed
In order to exploit this bug we need to control the contents of the freed
memory, causing the call to
retain to use a vtable pointer we control.
Fortunately, we can easily control the allocation and freeing of objects by
specifying elements in the dictionary being deserialized. We can also cause a
block of memory to be allocated and filled with data we control by including
OSData objects in the dictionary.
If we are going to control the vtable pointer of an object, we need to ensure
that the freed object’s memory is used to allocate the
OSData object’s data
buffer, and not the
OSData object itself. However, the
OSData object is
allocated before its data buffer, so we will create two freed objects: the
first will be reallocated for the
OSData container and the second will
contain our fake vtable pointer.
More specifically, we will use a dictionary with two keys,
b, that we
shall repeatedly assign. First we will associate
since on 64-bit OS X they are close enough in size to
OSData to share a free
list. We then assign
true, which causes the dictionary to
OSNumbers, freeing them.2 At this point the heap free list
OSNumber at the head and
OSNumber right behind it. By
OSData object in the dictionary we can cause the
container to use
OSNumber and the
OSData’s data buffer to use
OSNumber by index will cause
to be called on the dead
OSNumber object whose vtable we overwrote, giving us
We can disassemble the area around the exploitable call to
retain using lldb
to find the layout of the vtable:
194 the first 8 bytes of
o (pointed to by
r14) are read into
rax register. This is the vtable pointer, which we control because we
overwrote the old
OSNumber object with the contents of the data buffer. Later
19a the function pointer at address
rax + 0x20 is called. This means
rip will be set to the vtable entry at index 4 while
rax points to the
start of the vtable.
Before we move on, it’s important to realize that this exploit will never be
fully reliable. Due to the nature of use-after-free errors, there’s a window
between when the memory is freed and when it is reallocated and filled with the
fake vtable pointer during which another kernel thread could allocate or free
memory. Losing the race means calling
retain on a random vtable, which will
probably panic the kernel. The best we can do is develop the exploit and hope
that reliability is not too bad.
SMEP and SMAP
In order to make exploiting this type of bug more difficult, recent versions of OS X ship with two protection mechanisms, known as Supervisor Mode Execution Prevention (SMEP) and Supervisor Mode Access Prevention (SMAP).
SMEP causes the CPU to generate a page fault whenever the kernel tries to execute code in user space memory. A SMEP fault will trigger a kernel panic, bringing down the system. To avoid this, the exploit code cannot reside in user space; once we control the kernel instruction pointer, it must point to valid kernel memory.
We can get around this restriction by executing a ROP payload rather than
shellcode. ROP, which stands for return-oriented programming, is a technique
for chaining together segments of code that already exist in the target program
to construct an exploit payload. Since
rip will only ever point into the
kernel, no SMEP fault will be generated. We will return to ROP later.
The other mechanism, SMAP, extends the protection offered by SMEP beyond just execution. When SMAP is enabled, any attempt by the kernel to access user space memory will trigger a page fault. (There are legitimate cases where the kernel needs to read user space memory, for example during system calls. Thus there are ways for the kernel to temporarily disable SMAP. However, we would already need to be executing arbitrary kernel code in order to do so, which makes this strategy useless for us.)
Bypassing SMAP is more difficult, since we would need to put both our fake vtable and our ROP stack at a known location in kernel memory. However, SMAP support was only added to Intel processors in Broadwell. In order to simplify the exploit, we will assume that the target is an older Mac without SMAP support. This allows us to place the fake vtable and ROP stack in user space. While the exploit could be made to work on SMAP-enabled CPUs, I didn’t have the patience while developing rootsh to do so.
In my testing, rootsh works on Broadwell processors when running under VirtualBox. Thus, even on newer systems with SMAP support, it should still be possible to run the exploit in a virtual machine.
At this point we know how to control the kernel instruction pointer and we have
a strategy for bypassing SMEP. However, we still need to find the locations of
kernel functions we can use to elevate privileges. The OS X kernel binary lives
/System/Library/Kernels/kernel on the filesystem. Fortunately this is an
unstripped Mach-O file, which means we can parse the symbol
information embedded in the kernel image to find the base address of any kernel
function by name.
However, the functions don’t actually reside at those addresses in the running
kernel. For instance, the
current_proc function, which returns the
structure of the currently running process, is at address
in the kernel image, but on a live system it might actually be at address
0xffffff8018c57180. The difference of
0x0000000018400000 between these
addresses is the kernel slide.
OS X uses kernel address space layout randomization (kASLR) to hide the exact location of the kernel at runtime. During boot, the kernel is loaded at one of 384 possible locations3 that are 2MB apart. To figure out where the kernel is we need an information leak.
Our goal is to find a way to sneak a pointer to some kernel memory location out to user space. If we can get a pointer to a known piece of kernel code, we can subtract its static address from its runtime address to recover the kernel slide.
One promising way to look for information leaks is to search the XNU source for
copyout is a kernel function that copies bytes
from kernel space to user space. Often the data is copied out from the kernel
stack, which presents an opportunity for an information leak if not all of the
copied bytes have been initialized.
The code of the
if_clone_list function is shown below:
This function attempts to copy the names of network interface cloners to user
space. For each interface, the
outbuf buffer is filled with the interface
name and then copied out to user space. When
ifc_name is smaller than
strlcpy leaves the last few bytes of
copyout makes it copy the full
user space, including the uninitialized bytes at the end.
#define’d to 16, which doesn’t leave much room for an
8-byte kernel pointer if the interface name is long. Fortunately, the first
interface cloner is called “bridge”, leaving 9 uninitialized bytes in
Since this function can leak a full kernel pointer, we can probably recover the
By inspecting the code, we discover the following call graph for
soo_ioctl soioctl ifioctllocked ifioctl ifioctl_ifclone if_clone_list
soo_ioctl itself is used in the declaration of the
This structure associates socket objects in the kernel with the implementations
of common file operations like
ioctl. The call graph
suggests it should be possible to reach
if_clone_list by calling the
ioctl system call on a socket. To determine which ioctl command to
pass, we can look at the source of
Here we see that the
SIOCIFGCLONERS command should be used with an
Given the above, it should be possible to leak parts of the kernel stack into user space with the following sequence of system calls:
If you’re lucky, running the above code prints a pointer value like
0xffffff801873487f. The kernel slide is a multiple of 2 megabytes, so we know
that the lower 21 bits of the pointer are correct. Examining the OS X 10.10.5
kernel with otool, we find that there is only one instruction in the entire
kernel with a matching base address:
Thus, we can subtract the reference address
0xffffff800033487f from the
leaked pointer to recover the kernel slide.4
Just like with the use-after-free, this information leak is not fully reliable:
we’re counting on a pointer written to the stack during a previous system call
to still be there when we call
ioctl. At any point in between, any kernel
code that executes in the current process’s context could overwrite that
pointer. In practice this information leak is reliable enough, and it can be
improved by repeatedly leaking pointers and taking the majority value.
Now that we have the kernel slide, we can calculate the address of any function in the kernel by adding the kernel slide to the base address of the function, which we can find in the kernel image. The next step is determining how to elevate privileges. To do this, we first look at how a process’s privilege information is stored in the kernel.
Each process on OS X has a corresponding
proc structure, which stores
the information the kernel needs to manage the process. A kernel thread can get
a pointer to its
proc struct by calling
contains a number of pointers to substructures describing various attributes of
the process. One such substructure is the
The relevant fields are
cr_svuid of the contained
posix_cred struct. These values control the user ID, real user ID, and saved
user ID of the process.
Although it’s tempting to directly set
cr_ruid to 0 to become
ucred structure might be shared between multiple processes. If the
ucred of the attacking process is shared, setting
cr_ruid to 0
will magically elevate a whole bunch of processes to root, which can have
unintended consequences. (I discovered this fact while running the exploit
under tmux; the exploit succeeded but each new tmux window I opened would
present a root shell. Less than ideal.)
The proper solution is to create a new
ucred structure with elevated
privileges for the current process. However, I just set
cr_svuid to 0
instead. This sets the saved UID of the current process and any processes
ucred to root. From user space, our process could then elevate
privileges by calling
seteuid(0) to set the effective UID to root as well. We
haven’t eliminated the problem since any other process sharing the
seteuid to root. Nonetheless, this is much better than before:
other processes aren’t automatically granted root powers, and it’s unlikely in
practice that a normal process will suddenly try to
seteuid to root.
Thus, once we control
rip, we will get our
proc struct by calling
current_proc. We can then get a pointer to the
ucred struct by calling
proc_ucred and a pointer to the inner
posix_cred struct by calling
posix_cred_get on the
ucred. Once we have a pointer to the
struct, we will need an instruction sequence to set
cr_svuid to 0. Finally,
we will need a way to gracefully return from kernel space.
Building the ROP Stack
At this point we’ll examine how to leverage control of
rip to execute our
payload. When we get control of
rip we know that
rax points to the start of
the fake vtable. There is no single point in the kernel to which we can jump to
execute the desired attack, so we will need to use our control of
guide control flow across multiple jumps.
A good general strategy at this point is to try to pivot the stack pointer so
that it points to a fake stack that we control. If we jump to a short
instruction sequence in the kernel that makes
rsp point to our fake stack and
then executes a
rip will be set to the address at the top
of our fake stack and the stack will be popped. If the new
rip points to
another short sequence of instructions followed by
ret, then we can execute a
few useful instructions and then jump to the new address at the top of the
stack. Continuing in this way, we can chain a series of short instruction
sequences together to build a full exploit. This technique is called
return-oriented programming (ROP).
The first order of business in building a ROP payload is to find a useful stack pivot. There are several tools capable of finding ROP gadgets or even automatically building ROP payloads. I prefer building ROP payloads myself, so I used ROPgadget to find useful gadgets in the kernel.
Running ROPgadget on the 10.10.5 kernel image produces over 45,000 gadgets.
There’s a very interesting gadget at address
This instruction sequence swaps the
eax registers, pops the top
element of the new stack into
rsp, and then jumps execution to the address at
the top of the new new stack.
rax points to the fake vtable, so the
instruction will set the low bits of
rsp to the low bits of the address of
the vtable. One nuance of the x86-64 instruction set is that the high bits of
rsp will be cleared by the
xchg because it’s operating on the 32-bit
sub-registers. If the vtable resides below address
rsp to the vtable.
pop rsp will move the very first element in our vtable into
rsp. We can use this to make
rsp point to the true ROP stack. The final
ret will start executing the ROP payload.
The first element of the ROP stack can be the address of
rax to the address of this process’s
proc struct. In order to feed
this result into
proc_ucred we need the next gadget to move
rdi is used to store the first argument of a function call. Looking
through the list of ROP gadgets we find the following instruction sequence:
This gadget exchanges the contents of
rdi, effectively moving the
returned value into
rdi. We can then use the same approach to call
proc_ucred and move its return value into
rdi, and once again to call
posix_cred_get, leaving the
posix_cred struct in
Now we need to set the
cr_svuid field in the
posix_cred struct to 0.
Conveniently, at address
0xffffff800041ab81 we find the sequence:
Jumping to this gadget will zero out the third 32-bit integer in the
posix_cred struct, setting our saved UID to root.
Finally, we must safely stop executing the ROP payload. The simplest way to do
this is to call
thread_exception_return, which will immediately return
execution to user space.
This leaves us with a vtable with slots 0 and 4 filled and an 8 element ROP stack.
Running the Payload
The last piece of the puzzle is figuring out how to trigger the use-after-free
and get our payload to run. In fact it’s quite easy to call
OSUnserializeBinary from user space: any function that passes a
CoreFoundation object into the kernel must serialize and subsequently
deserialize the object. We’ll use the
IOServiceGetMatchingServices function from
However, we can’t just give the attack dictionary to
IOServiceGetMatchingServices because the matching dictionary is passed as a
CFDictionary. No valid
CFDictionary will ever serialize to our attack
dictionary, so we need to call a lower-level function.
Looking at the source, we can see that
IOServiceGetMatchingServices internally calls the
io_service_get_matching_services_bin Mach trap to pass a binary-serialized
dictionary to the kernel:
The kernel entrypoint is the function
which eventually calls
OSUnserializeXML to deserialize the dictionary.
Thus, we can exploit the vulnerability from user space by allocating a page
0x100000000 to store the fake vtable and ROP stack and then calling
io_service_get_matching_services_bin with the malicious dictionary.
When I tested the exploit with this setup, I found that the system would occasionally panic trying to dereference a NULL pointer:
*** Panic Report *** panic(cpu 0 caller 0xffffff8018816df2): Kernel trap at 0xffffff8018c8019a, type 14=page fault, registers: CR0: 0x00000000c0010033, CR2: 0x0000000000000020, CR3: 0x0000000093386000, CR4: 0x0000000000040660 RAX: 0x0000000000000000, RBX: 0xffffff802042f8c0, RCX: 0xffffff80209c7200, RDX: 0x000000008c000002 RSP: 0xffffff8094723cd0, RBP: 0xffffff8094723db0, RSI: 0x0000000000000078, RDI: 0xffffff802042f7c0 R8: 0x0000000000000001, R9: 0xffffff802042fac0, R10: 0x0000000000000000, R11: 0x0000000000000040 R12: 0x0000000000000002, R13: 0xffffff802042fac0, R14: 0xffffff802042f7c0, R15: 0x0000000000000078 RFL: 0x0000000000010297, RIP: 0xffffff8018c8019a, CS: 0x0000000000000008, SS: 0x0000000000000010 Fault CR2: 0x0000000000000020, Error code: 0x0000000000000000, Fault CPU: 0x0 VMM Backtrace (CPU 0), Frame : Return Address 0xffffff8094723980 : 0xffffff801872ad21 mach_kernel : _panic + 0xd1 0xffffff8094723a00 : 0xffffff8018816df2 mach_kernel : _kernel_trap + 0x8d2 0xffffff8094723bc0 : 0xffffff8018833ca3 mach_kernel : _return_from_trap + 0xe3 0xffffff8094723be0 : 0xffffff8018c8019a mach_kernel : __Z19OSUnserializeBinaryPKcmPP8OSString + 0x2ca 0xffffff8094723db0 : 0xffffff8018cfbd3e mach_kernel : _is_io_service_get_matching_services_bin + 0x2e 0xffffff8094723de0 : 0xffffff80187df5c8 mach_kernel : _iokit_server + 0x738 0xffffff8094723e10 : 0xffffff801872ef8c mach_kernel : _ipc_kobject_server + 0xfc 0xffffff8094723e40 : 0xffffff80187139f3 mach_kernel : _ipc_kmsg_send + 0x123 0xffffff8094723e90 : 0xffffff801872429d mach_kernel : _mach_msg_overwrite_trap + 0xcd 0xffffff8094723f10 : 0xffffff8018802115 mach_kernel : _mach_call_munger + 0x175 0xffffff8094723fb0 : 0xffffff8018834278 mach_kernel : _hndl_mach_scall + 0xd8 BSD process name corresponding to current thread: rootsh Boot args: usb=0x800 keepsyms=1 -v -serial=0x1 Mac OS version: 14F27 Kernel version: Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 Kernel UUID: 58F06365-45C7-3CA7-B80D-173AFD1A03C4 Kernel slide: 0x0000000018400000 Kernel text base: 0xffffff8018600000
The page fault occurred on (unslid) address
0xffffff800088019a, which is the
instruction that invokes
retain on the freed object. The vtable pointer is
rax just before this instruction. Looking at the panic log, it’s
clear that the
rax register somehow got set to 0 rather than the address of
What’s likely going on is we’re occasionally losing the race to reallocate the
freed memory. In between when we free the two
OSNumbers and when we allocate
OSData object, there’s a window in which another kernel thread can either
allocate or free memory and mess everything up. In practice it seems that the
most common value of
rax when we lose the race is 0. This indicates that a
simple way to make the exploit more reliable is to allocate our fake vtable at
0. To implement this hack we need to compile the exploit as 32-bit to
enable legacy support for mapping the NULL page and we need to pass special
linker flags so that the final Mach-O doesn’t have a
However, placing the payload on the NULL page gives the exploit a chance to
succeed even when we lose the race.
The final exploit is reasonably reliable, triggering a panic twice in 3000 executions on an idle machine. Panics are significantly more likely when the system is under even slight load because the frequent allocations and frees are more likely to beat us in the race to reallocate the freed memory.
This wraps up our discussion. We’ve walked through the process of developing a full local privilege escalation exploit from two vulnerabilities, CVE-2016-1828 and CVE-2016-1758. The complete exploit code is available in my rootsh repository on GitHub.
I chose to target OS X 10.10.5 rather than 10.11.3 (the last release with both vulnerabilities) for a few reasons. First and foremost is that I was running 10.10.5 while I developed rootsh. However, even after updating I decided not to rewrite the exploit so that you can test it in a virtual machine. The App Store doesn’t keep the installers for old point releases: once 10.11.4 comes out, the 10.11.3 installer goes away, making it much more difficult to create a 10.11.3 virtual machine. By contrast, the Yosemite installer in the App Store shall forever remain at version 10.10.5, meaning anyone can come along at a later time and create a virtual installation. That being said, it shouldn’t be too difficult to rework this exploit for 10.11.3.
The actual path I took in developing this exploit wasn’t nearly as clean or guided as it’s presented here. There was a lot of trial and error and many, many hours debugging random kernel panics.
rootsh code is released into the public domain. As a courtesy I ask that
if you use any of my code in another project you attribute it to me.
Apple has since released Security Update 2016-002 for Yosemite, which bumped the build number up to 14F1713. Just like El Capitan, this new build is missing the ROP gadgets used by rootsh. At the time of this writing, however, the App Store is still distributing version 14F27, which is vulnerable to rootsh without modification. ↩
ais not used to fulfill the allocation of
OSBooleans are never allocated: there are only two distinct values, and all references to them are shared. ↩
Numerous sources online suggest that the kernel is loaded into one of 256 possible locations. However, empirical testing on a 2011 Macbook Pro running OS X Yosemite 10.10.5 suggests that, at least on some systems and in some configurations, there may be closer to 384 possible locations. ↩
This hardcoded reference address only works on 10.10.5 build 14F27; in order to find the kernel slide on another version of OS X we would need a new reference pointer, which might not even be in the same function. ↩