Fun with the preprocessor: CONFIG_IA32_EMULATION hacks in Linux

About two months ago, Linux saw CVE-2010-0307, which was a trival denial-of-service attack that could crash essentially any 64-bit Linux machine with 32-bit compatibility enabled. LWN has an excellent writeup of the bug, which turns out to be a subtle error related to the details of the execve system call and with 32-bit compatibility mode.

While dealing with this patch for Ksplice, I ended up reading an awful lot of the code in Linux that deals with handling 32-bit processes on 64-bit machines. In the process, I discovered a number of alternately terrifying and clever hacks, the highlights of which I wanted to share here.

Compatibility mode

On x86_64 Linux kernels, the config option CONFIG_IA32_EMULATION controls whether the kernel kernel supports i386 compatibility mode, and the execution of "compatibility-mode" 32-bit processes.

For the most part, this support is very simple. As long as the OS sets up a few bits in appropriate places, the hardware will switch to 32-bit mode and happily execute these compatibility processes in 32-bit mode. The kernel needs to contain a compatibility entry point to handle system calls and such from 32-bit processes, which needs to do a small bit of marshalling to convert 32-bit arguments to 64-bit arguments, and handle the fact that i386 has different syscall numbers than amd64, but that's about it. Most kernel interfaces are fairly word-width agnostic; If you map the compatibility process into the first 4G of the kernel's 64-bit address space, you can mostly just zero-extend all arguments, and almost everything works fine.

But there are always details…

There are, however, a few devilish details that remain. Specifically, places where the kernel cares about the details of either the pointer size inside a process, or of the memory layout of a process. The primary culprit is the ELF loader: The code which is responsible for loading an executable out of a filesystem and into a process's address space to start executing. This process is very much architecture specific; While 64- and 32-bit ELF files are structured almost identically, many of their fields are different sizes, as they need to hold a pointer or offset of the appropriate size for the architecture.

Similarly, while a running process on Linux stores most relevant information about its address space layout inside a struct mm in the struct task_struct, or inside a struct thread_info, when first constructing a new process, the ELF loader and exec system call need to figure out how to set up an initial memory layout, which is very dependent on the bittedness of the new process.

The ELF loader

Linux's ELF loader lives mostly in fs/binfmt_elf.c, which takes then definition of an ELF header from include/linux/elf.h. The latter file defines the structs for both 32- and 64- ELF files (e.g. Elf32_Ehdr and Elf64_Ehdr), and then uses an #ifdef near the bottom to select the appropriate definition.

In order to support loading both 32- and 64- bit ELF files in the same kernel, Linux uses a cute hack on the fs/compat_binfmt_elf.c file. This file uses #define to set the ELF class to ELFCLASS32, indicating that elf.h should use the 32-bit definitions, #define's a few more thing, and then just #include 's binfmt_elf.c, causing the ELF loader to get compiled a second time!:

/*
 * Rename the basic ELF layout types to refer to the 32-bit class of files.
 */
#undef  ELF_CLASS
#define ELF_CLASS       ELFCLASS32

#undef  elfhdr
#undef  elf_phdr
#undef  elf_shdr
#undef  elf_note
#undef  elf_addr_t
#define elfhdr          elf32_hdr
#define elf_phdr        elf32_phdr
#define elf_shdr        elf32_shdr
#define elf_note        elf32_note
#define elf_addr_t      Elf32_Addr

/* Some more #defines elided */

/*
 * We share all the actual code with the native (64-bit) version.
 */
#include "binfmt_elf.c"

The ELF structs themselves, however, aren't the only thing that depends on the architecture. The details of initializing a new process depend on the architecture as well. So, throughout binfmt_elf.c, there are a number of calls to macros that handle various platform-specific elemnts of ELF loading.

compat_binfmt_elf.c then just goes through and uses #define to replace all of these with appropriate COMPAT_ versions, defined by the architecture:

#undef  ELF_ARCH
#undef  elf_check_arch
#define elf_check_arch  compat_elf_check_arch

#ifdef  COMPAT_ELF_PLATFORM
#undef  ELF_PLATFORM
#define ELF_PLATFORM            COMPAT_ELF_PLATFORM
#endif

/* ... */

The Linux developers do love their preprocessor.

`TASK_SIZE`

In the linux kernel, the TASK_SIZE macro defines the highest address available to a user process. Once a process is running, this information (along with a whole host of other information about the memory layout). However, in various places, including the ELF loader, the TASK_SIZE macro (along with a few others, like STACK_TOP) are needed.

However, TASK_SIZE obviously must be different between 32- and 64- processes. Conveniently, almost all code that uses TASK_SIZE cares about the current process (such as the ELF loader), and so the introduction of compatibility mode just changed the macro as follows (arch/x86/include/asm/processor.h):

#define TASK_SIZE (test_thread_flag(TIF_IA32) ? \ IA32_PAGE_OFFSET :
                                        TASK_SIZE_MAX)

test_thread_flag reads a bit out of the flags field on the current process's thread_info struct. And so the TASK_SIZE macro pseudo-magically changes value depending on whether the process calling it is running in 32-bit compatibility mode or not!

Made of Bugs

It's software. It's made of bugs.

Fun with the preprocessor: CONFIG_IA32_EMULATION hacks in Linux

Compatibility mode

But there are always details…

The ELF loader

`TASK_SIZE`

Compatibility mode 🔗︎

But there are always details… 🔗︎

The ELF loader 🔗︎

TASK_SIZE 🔗︎

Compatibility mode

But there are always details…

The ELF loader

`TASK_SIZE`