Most of you reading this blog probably remember CVE-2010-3081. The bug got an awful lot of publicity when it was discovered an announced, due to allowing local privilege escalation against virtually all 64-bit Linux kernels in common use at the time.
While investigating CVE-2010-3081, I discovered that several of the commonly-believed facts about the CVE were wrong, and it was even more broadly exploitable than was publically documented. I’d like to share those observations here.
A brief review of the bug 🔗︎
The bug arose from the compat_alloc_user_space
function in Linux’s
32-bit compatibility support on 64-bit
systems. compat_alloc_user_space
allocates and returns space on the
userspace kernel stack for the kernel to use:
static inline void __user *compat_alloc_user_space(long len)
{
struct pt_regs *regs = task_pt_regs(current);
return (void __user *)regs->sp - len;
}
This function is only called by compat-mode syscalls, so current
is assumed to
be a 32-bit process, in which case regs->sp
, the user stack pointer, will be a
32-bit quantity. This, if we subtract a small len
, the result should still fit
in 32 bits, which, on a 64-bit system means it is guaranteed to fall within the
user address space.
Because of this, some callers of compat_alloc_user_space
were lazy, and did
not call access_ok
(or a function which called access_ok
) to check that the
result of compat_alloc_user_space
fell within the user address space.
However, it turned out that some call sites in the kernel called
compat_alloc_user_space
with a user-controlled len
value, allowing the
subtraction to wrap around. On a 64-bit system, the kernel lives in the top four
gigabytes of memory, and so this wraparound is enough for a user to cause
compat_alloc_user_space
to return a pointer into the kernel’s address space.
Moreover, it turned out that the functions that used a user-controlled len
also did not check access_ok
on the result of the allocation. In particular,
Linux 2.6.26 introduced the compat_mc_getsockopt
function, which called
compat_alloc_user_space
with a user-controlled length and then copied
user-controlled data to this pointer. It is this function which the public
exploit targetted.
Disabling 32-bit binaries doesn’t help 🔗︎
When an exploit was released for this bug, many sources circulated a mitigation: Disable 32-bit binaries on a system. Prevent compat-mode processes from running, the logic goes, and you prevent anyone from making a compat-mode syscall that triggers the vulnerable path.
This mitigation indeed prevented the public exploit from working (it included 32-bit inline assembly, and so couldn’t even easily be recompiled as a 64-bit binary), and many observers seemed to believe it closed the bug entirely.
However, this was not the case! It turns out, on an amd64
system, a
64-bit process can still make a compat-mode system call using the int $0x80
instruction, which is the traditional 32-bit syscall mechanism!
Even though the process is running in 64-bit mode, int $0x80
redirects to the compat-mode syscall table.
After realizing this, modifying the public exploit to work when compiled in 64-bit mode was a simple matter of porting the inline assembly, and changing a small handful of types. I’ve posted the modified exploit and the diff against the original for the curious.
The integer overflow is totally irrelevant 🔗︎
Once you’ve realized that you can make compat-mode system calls from a 64-bit
process, a little bit of thought reveals something else
interesting. compat_alloc_user_space
subtracts the len
value off of the
userspace stack pointer. Previously, we relied on subtracting a large value from
a 32-bit stack pointer in order to end up with a kernel pointer. However, while
a 32-bit is limited to a 32-bit stack pointer, a 64-bit process can write a full
64-bit value into %rsp
, and thus regs->sp
! There’s no need for underflow at
all – you can just write a 64-bit value into %rsp
and do an int $0x80
, and
make compat_alloc_user_space
return any value you please!
The condition for exploitability thus drops from “user-controlled
len
and no access_ok
” to simply “no access_ok
”.
This is interesting, because it turns out that some very old kernels, before 2.6.11, including RHEL 4, have the following function:
int siocdevprivate_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
{
struct ifreq __user *u_ifreq64;
...
u_ifreq64 = compat_alloc_user_space(sizeof(*u_ifreq64));
/* Don't check these user accesses, just let that get trapped
* in the ioctl handler instead.
*/
copy_to_user(&u_ifreq64->ifr_ifrn.ifrn_name[0], &tmp_buf[0], IFNAMSIZ);
__put_user(data64, &u_ifreq64->ifr_ifru.ifru_data);
return sys_ioctl(fd, cmd, (unsigned long) u_ifreq64);
}
Remember, we can make compat_alloc_user_space
return an arbitrary
value. The copy_to_user
will call access_ok
and fail, but that
return value will be discarded, and the __put_user
will scribble 32
bits of user-controlled data at a user-controlled address. Bingo,
local root.
It turns out this function was present in Linux 2.4.x, too, meaning that this exploit even affected RHEL3 and anyone else still running a 2.4-based system!
Based on this exploit, I’ve produced a working proof-of-concept exploit for RHEL4, based on the released exploit for RHEL5. Contact me if you’re interested, but it’s pretty straightforward.
Closing notes 🔗︎
As far as I know, neither of these facts has been publically documented prior to this post. I shared this information with Red Hat, and they requested I keep it private until they released fixes for RHEL 3, which happened last week. I would not be at all surprised to learn that someone else has private exploits that incorporate either or both of these observations, though.
One important moral here is you must be very careful when declaring a system unaffected by a vulnerability, or declaring a mitigation to be complete. Software systems have gotten tremendously complex, and it’s often impossible to be totally confident you understand every last way an attacker could tickle a vulnerability.