<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of Bugs &#187; tar</title>
	<atom:link href="http://blog.nelhage.com/tag/tar/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nelhage.com</link>
	<description>It's software. It's made of bugs.</description>
	<lastBuildDate>Thu, 18 Aug 2011 21:57:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Followup to &#8220;A Very Subtle Bug&#8221;</title>
		<link>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 17:45:11 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[followup]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=160</guid>
		<description><![CDATA[After my previous post got posted to reddit, there was a bunch of interesting discussion there about some details I&#8217;d handwaved over. This is a quick followup on some the investigation that various people carried out, and the conclusions they reached. In the reddit thread, lacos/lbzip2 objected that in his experiments, he didn&#8217;t see tar [...]]]></description>
			<content:encoded><![CDATA[<p>After my <a href="http://blog.nelhage.com/archives/150">previous post</a> got posted to <a href="http://www.reddit.com/r/programming/comments/b7djd/stuff_like_this_makes_me_hate_python_subtle_bugs/">reddit</a>, there was a bunch of
interesting discussion there about some details I&#8217;d handwaved
over. This is a quick followup on some the investigation that various
people carried out, and the conclusions they reached.</p>

<p>In the reddit thread, <a href="http://lacos.hu/">lacos/lbzip2</a> <a href="http://www.reddit.com/r/programming/comments/b7djd/stuff_like_this_makes_me_hate_python_subtle_bugs/c0lc0dy">objected</a> that in his
experiments, he didn&#8217;t see <code>tar</code> closing the input pipe before it was
done reading the file, and so questioned where the <code>SIGPIPE</code>/<code>EPIPE</code>
was coming from in the first place. I had actually done similar
experiments with similar results, but I was still seeing the <code>EPIPE</code>,
so I knew it could happen, but I couldn&#8217;t totally explain why.</p>

<p>A friend of mine, David Benjamin, was curious enough to source-dive
<code>tar</code>, and <a href="http://davidben.scripts.mit.edu/blog/2010/02/28/tar-filled-pipes/">posted his results</a> on his own blog. He discovered that
by default, <code>tar</code> does not close the pipe after finding all the files
it needs, because the <code>tar</code> archive format allows for later copies of
the same file, which would supercede the previous ones. This explains
why <code>lacos</code> and I saw <code>tar</code> reading to the end of a <code>linux-2.6</code>
tarball, even if we only asked for the first file.</p>

<p>He also discovered, however, that a typical tar file ends with a
number of <code>NUL</code> blocks, which <code>tar</code> treats as end-of-file. And so
<code>tar</code> will close the pipe after reading the first of these, which
opens a narrow race condition whereby tar can potentially do so before
<code>gzip</code> has written the remaining <code>NUL</code> blocks, resulting in a
<code>SIGPIPE</code>.</p>

<p>Finally, the discussion inspired <code>lacos</code> to post a <a href="http://lists.gnu.org/archive/html/help-tar/2010-03/msg00000.html">query</a> clarifying <code>tar</code>&#8216;s behavior with respect to SIGPIPE and closing the pipe early to the <code>help-tar</code>
mailing list, which resulted in a brief thread that, among other
things, revealed that the bug I posted about has been <a href="http://lists.gnu.org/archive/html/bug-tar/2009-06/msg00009.html">fixed</a> in
GNU tar as of last summer, by having <code>tar</code> reset the disposition of
<code>SIGPIPE</code> to <code>SIG_DFL</code> before spawning a child. It was also pointed out that tar checks whether a filter subprocess is killed by <code>SIGPIPE</code>, and treats that as a success &#8212; so it&#8217;s not actually necessary for a <code>tar</code> filter to handle <code>SIGPIPE</code> and exit cleanly, like <code>gzip</code> does.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/03/followup-to-a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Very Subtle Bug</title>
		<link>http://blog.nelhage.com/2010/02/a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/02/a-very-subtle-bug/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 03:48:47 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=150</guid>
		<description><![CDATA[6.033, MIT&#8217;s class on computer systems, has as one of its catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class about designing and building complex systems, it&#8217;s a reminder that failure modes are subtle and often involve strange interactions between multiple parts of a system. In my own experience, I&#8217;ve concluded that they&#8217;re often [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://web.mit.edu/6.033/www/">6.033</a>, MIT&#8217;s class on computer systems, has as one of its
catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class
about designing and building complex systems, it&#8217;s a reminder that
failure modes are subtle and often involve strange interactions
between multiple parts of a system. In my own experience, I&#8217;ve
concluded that they&#8217;re often wrong. I like to say that complex systems
don&#8217;t usually fail for complex reasons, but for the <a href="http://ebroder.net/2010/01/25/complex-systems-and-simple-failures/">simplest, dumbest
possible reasons</a> &#8212; there are just more available dumb reasons. But sometimes,
complex systems do fail for complex reasons, and tracking them down
does require understanding across many of the different layers of
abstraction we&#8217;ve built up. This is a story of such a bug.</p>

<p>The following code snippet in Python is intended to extract and return
a single file from a tarball. It probably should be using
<a href="http://docs.python.org/library/tarfile.html"><code>tarfile</code></a>, but let&#8217;s ignore that for the moment.</p>

<pre><code>import subprocess
def extractFile(tarball, path):
  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
  contents, err = p.communicate()
  if p.returncode:
    raise SomethingWentWrong(err)
  return contents
</code></pre>

<p>This code has a bug. It will often work just fine, but occasionally it
will fail, with &#8216;err&#8217; containing a message including <code>gzip: stdout:
Broken pipe</code>. If, however you were to write the equivalent code in a
shell script:</p>

<pre><code> contents="$(tar -xzOf "$tarball" "$path")"
</code></pre>

<p>you would find that it never fails in this way. So what&#8217;s going on?</p>

<p>When we launch <code>tar</code> with the <code>-z</code> option, it creates a <code>pipe(7)</code> and
forks off a <code>gzip</code> process writing into one end of the pipe. It then
reads the uncompressed tarball from the other end of the pipe, parsing
our <code>tar</code> headers, until eventually it&#8217;s done, and it closes the read
end of the pipe.</p>

<p>Since we&#8217;ve given <code>tar</code> a specific path to extract, it doesn&#8217;t need to
read the entire tarball &#8212; only up until it can find that file. So it
may close the pipe before reading the entire file, and,
correspondingly, before <code>gzip</code> is done writing to it. As explained in
<code>pipe(7)</code>:</p>

<blockquote>
If all file descriptors referring to the read end of a pipe have been
closed, then a write(2) will cause a SIGPIPE signal to be generated
for the calling process.  If the calling process is ignoring this
signal, then write(2) fails with the error EPIPE.
</blockquote>

<p>Under normal circumstances, <code>gzip</code> expects that whoever is downstream
of it may only care about a prefix of the uncompressed stream, and so
it registers a <code>SIGPIPE</code> handler which exits cleanly.</p>

<p>Python, however, doesn&#8217;t want to get <code>SIGPIPE</code>s. Instead, Python would
rather just check the return value of every <code>write</code> call it makes, and
raise an <code>IOError</code> if necessary, so that Python code gets the error in
an appropriately Pythonic way, instead of through an asynchronous
signal. And so, at startup, Python uses <code>signal(2)</code> or <code>sigaction(2)</code>
to ignore <code>SIGPIPE</code> by setting it to <code>SIG_IGN</code>.</p>

<p>As explained in <code>sigaction(2)</code>:
<blockquote>
A child created via fork(2) inherits a copy of its parent&#8217;s signal dispositions.   During  an  execve(2),  the  dispositions of handled signals are reset to the default; the dispositions of ignored signals are left unchanged.
</blockquote></p>

<p>And so, when started from Python, <code>gzip</code> starts up with
<code>SIGPIPE</code> ignored. And, for reasons I don&#8217;t understand, rather than
unconditionally handling <code>SIGPIPE</code>, <code>gzip</code> first checks whether or
not it&#8217;s ignored, and only installs a handler if the signal is not
being ignored.</p>

<p>And so, <code>SIGPIPE</code> continues to be ignored, which means that <code>gzip</code>&#8216;s
<code>write(2)</code> returns <code>EPIPE</code>, which gzip sees is nonzero, calls <code>perror</code>
on, and then exits. tar&#8217;s <code>wait</code> then sees <code>gzip</code> exit uncleanly,
which causes tar itself to exit uncleanly, which Python then raises as
an exception.</p>

<p>There&#8217;s an easy workaround, which is to re-enable SIGPIPE in the
<code>subprocess</code> child:</p>

<pre><code>  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,
                       preexec_fn=lambda:
                        signal.signal(signal.SIGPIPE, signal.SIG_DFL))
</code></pre>

<p>But who would think of doing that, without first having seen this
horribly subtle chain of bug, and having to track down what went
wrong?</p>

<p>(Here&#8217;s the <a href="http://bugs.python.org/issue1652">Python bug report</a>, and many thanks to cjwatson for
posting his discovery of this class of bug on his <a href="http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html">blog</a>, which
greatly reduced the amount of time I would have had to spend tracking
this down)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/02/a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

