<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of Bugs &#187; pipes</title>
	<atom:link href="http://blog.nelhage.com/tag/pipes/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nelhage.com</link>
	<description>It's software. It's made of bugs.</description>
	<lastBuildDate>Thu, 18 Aug 2011 21:57:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A Very Subtle Bug</title>
		<link>http://blog.nelhage.com/2010/02/a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/02/a-very-subtle-bug/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 03:48:47 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=150</guid>
		<description><![CDATA[6.033, MIT&#8217;s class on computer systems, has as one of its catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class about designing and building complex systems, it&#8217;s a reminder that failure modes are subtle and often involve strange interactions between multiple parts of a system. In my own experience, I&#8217;ve concluded that they&#8217;re often [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://web.mit.edu/6.033/www/">6.033</a>, MIT&#8217;s class on computer systems, has as one of its
catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class
about designing and building complex systems, it&#8217;s a reminder that
failure modes are subtle and often involve strange interactions
between multiple parts of a system. In my own experience, I&#8217;ve
concluded that they&#8217;re often wrong. I like to say that complex systems
don&#8217;t usually fail for complex reasons, but for the <a href="http://ebroder.net/2010/01/25/complex-systems-and-simple-failures/">simplest, dumbest
possible reasons</a> &#8212; there are just more available dumb reasons. But sometimes,
complex systems do fail for complex reasons, and tracking them down
does require understanding across many of the different layers of
abstraction we&#8217;ve built up. This is a story of such a bug.</p>

<p>The following code snippet in Python is intended to extract and return
a single file from a tarball. It probably should be using
<a href="http://docs.python.org/library/tarfile.html"><code>tarfile</code></a>, but let&#8217;s ignore that for the moment.</p>

<pre><code>import subprocess
def extractFile(tarball, path):
  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
  contents, err = p.communicate()
  if p.returncode:
    raise SomethingWentWrong(err)
  return contents
</code></pre>

<p>This code has a bug. It will often work just fine, but occasionally it
will fail, with &#8216;err&#8217; containing a message including <code>gzip: stdout:
Broken pipe</code>. If, however you were to write the equivalent code in a
shell script:</p>

<pre><code> contents="$(tar -xzOf "$tarball" "$path")"
</code></pre>

<p>you would find that it never fails in this way. So what&#8217;s going on?</p>

<p>When we launch <code>tar</code> with the <code>-z</code> option, it creates a <code>pipe(7)</code> and
forks off a <code>gzip</code> process writing into one end of the pipe. It then
reads the uncompressed tarball from the other end of the pipe, parsing
our <code>tar</code> headers, until eventually it&#8217;s done, and it closes the read
end of the pipe.</p>

<p>Since we&#8217;ve given <code>tar</code> a specific path to extract, it doesn&#8217;t need to
read the entire tarball &#8212; only up until it can find that file. So it
may close the pipe before reading the entire file, and,
correspondingly, before <code>gzip</code> is done writing to it. As explained in
<code>pipe(7)</code>:</p>

<blockquote>
If all file descriptors referring to the read end of a pipe have been
closed, then a write(2) will cause a SIGPIPE signal to be generated
for the calling process.  If the calling process is ignoring this
signal, then write(2) fails with the error EPIPE.
</blockquote>

<p>Under normal circumstances, <code>gzip</code> expects that whoever is downstream
of it may only care about a prefix of the uncompressed stream, and so
it registers a <code>SIGPIPE</code> handler which exits cleanly.</p>

<p>Python, however, doesn&#8217;t want to get <code>SIGPIPE</code>s. Instead, Python would
rather just check the return value of every <code>write</code> call it makes, and
raise an <code>IOError</code> if necessary, so that Python code gets the error in
an appropriately Pythonic way, instead of through an asynchronous
signal. And so, at startup, Python uses <code>signal(2)</code> or <code>sigaction(2)</code>
to ignore <code>SIGPIPE</code> by setting it to <code>SIG_IGN</code>.</p>

<p>As explained in <code>sigaction(2)</code>:
<blockquote>
A child created via fork(2) inherits a copy of its parent&#8217;s signal dispositions.   During  an  execve(2),  the  dispositions of handled signals are reset to the default; the dispositions of ignored signals are left unchanged.
</blockquote></p>

<p>And so, when started from Python, <code>gzip</code> starts up with
<code>SIGPIPE</code> ignored. And, for reasons I don&#8217;t understand, rather than
unconditionally handling <code>SIGPIPE</code>, <code>gzip</code> first checks whether or
not it&#8217;s ignored, and only installs a handler if the signal is not
being ignored.</p>

<p>And so, <code>SIGPIPE</code> continues to be ignored, which means that <code>gzip</code>&#8216;s
<code>write(2)</code> returns <code>EPIPE</code>, which gzip sees is nonzero, calls <code>perror</code>
on, and then exits. tar&#8217;s <code>wait</code> then sees <code>gzip</code> exit uncleanly,
which causes tar itself to exit uncleanly, which Python then raises as
an exception.</p>

<p>There&#8217;s an easy workaround, which is to re-enable SIGPIPE in the
<code>subprocess</code> child:</p>

<pre><code>  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,
                       preexec_fn=lambda:
                        signal.signal(signal.SIGPIPE, signal.SIG_DFL))
</code></pre>

<p>But who would think of doing that, without first having seen this
horribly subtle chain of bug, and having to track down what went
wrong?</p>

<p>(Here&#8217;s the <a href="http://bugs.python.org/issue1652">Python bug report</a>, and many thanks to cjwatson for
posting his discovery of this class of bug on his <a href="http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html">blog</a>, which
greatly reduced the amount of time I would have had to spend tracking
this down)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/02/a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

