<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Made of Bugs &#187; signals</title>
	<atom:link href="http://blog.nelhage.com/tag/signals/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.nelhage.com</link>
	<description>It's software. It's made of bugs.</description>
	<lastBuildDate>Thu, 18 Aug 2011 21:57:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A Very Subtle Bug</title>
		<link>http://blog.nelhage.com/2010/02/a-very-subtle-bug/</link>
		<comments>http://blog.nelhage.com/2010/02/a-very-subtle-bug/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 03:48:47 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=150</guid>
		<description><![CDATA[6.033, MIT&#8217;s class on computer systems, has as one of its catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class about designing and building complex systems, it&#8217;s a reminder that failure modes are subtle and often involve strange interactions between multiple parts of a system. In my own experience, I&#8217;ve concluded that they&#8217;re often [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://web.mit.edu/6.033/www/">6.033</a>, MIT&#8217;s class on computer systems, has as one of its
catchphrases, &#8220;Complex systems fail for complex reasons&#8221;. As a class
about designing and building complex systems, it&#8217;s a reminder that
failure modes are subtle and often involve strange interactions
between multiple parts of a system. In my own experience, I&#8217;ve
concluded that they&#8217;re often wrong. I like to say that complex systems
don&#8217;t usually fail for complex reasons, but for the <a href="http://ebroder.net/2010/01/25/complex-systems-and-simple-failures/">simplest, dumbest
possible reasons</a> &#8212; there are just more available dumb reasons. But sometimes,
complex systems do fail for complex reasons, and tracking them down
does require understanding across many of the different layers of
abstraction we&#8217;ve built up. This is a story of such a bug.</p>

<p>The following code snippet in Python is intended to extract and return
a single file from a tarball. It probably should be using
<a href="http://docs.python.org/library/tarfile.html"><code>tarfile</code></a>, but let&#8217;s ignore that for the moment.</p>

<pre><code>import subprocess
def extractFile(tarball, path):
  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
  contents, err = p.communicate()
  if p.returncode:
    raise SomethingWentWrong(err)
  return contents
</code></pre>

<p>This code has a bug. It will often work just fine, but occasionally it
will fail, with &#8216;err&#8217; containing a message including <code>gzip: stdout:
Broken pipe</code>. If, however you were to write the equivalent code in a
shell script:</p>

<pre><code> contents="$(tar -xzOf "$tarball" "$path")"
</code></pre>

<p>you would find that it never fails in this way. So what&#8217;s going on?</p>

<p>When we launch <code>tar</code> with the <code>-z</code> option, it creates a <code>pipe(7)</code> and
forks off a <code>gzip</code> process writing into one end of the pipe. It then
reads the uncompressed tarball from the other end of the pipe, parsing
our <code>tar</code> headers, until eventually it&#8217;s done, and it closes the read
end of the pipe.</p>

<p>Since we&#8217;ve given <code>tar</code> a specific path to extract, it doesn&#8217;t need to
read the entire tarball &#8212; only up until it can find that file. So it
may close the pipe before reading the entire file, and,
correspondingly, before <code>gzip</code> is done writing to it. As explained in
<code>pipe(7)</code>:</p>

<blockquote>
If all file descriptors referring to the read end of a pipe have been
closed, then a write(2) will cause a SIGPIPE signal to be generated
for the calling process.  If the calling process is ignoring this
signal, then write(2) fails with the error EPIPE.
</blockquote>

<p>Under normal circumstances, <code>gzip</code> expects that whoever is downstream
of it may only care about a prefix of the uncompressed stream, and so
it registers a <code>SIGPIPE</code> handler which exits cleanly.</p>

<p>Python, however, doesn&#8217;t want to get <code>SIGPIPE</code>s. Instead, Python would
rather just check the return value of every <code>write</code> call it makes, and
raise an <code>IOError</code> if necessary, so that Python code gets the error in
an appropriately Pythonic way, instead of through an asynchronous
signal. And so, at startup, Python uses <code>signal(2)</code> or <code>sigaction(2)</code>
to ignore <code>SIGPIPE</code> by setting it to <code>SIG_IGN</code>.</p>

<p>As explained in <code>sigaction(2)</code>:
<blockquote>
A child created via fork(2) inherits a copy of its parent&#8217;s signal dispositions.   During  an  execve(2),  the  dispositions of handled signals are reset to the default; the dispositions of ignored signals are left unchanged.
</blockquote></p>

<p>And so, when started from Python, <code>gzip</code> starts up with
<code>SIGPIPE</code> ignored. And, for reasons I don&#8217;t understand, rather than
unconditionally handling <code>SIGPIPE</code>, <code>gzip</code> first checks whether or
not it&#8217;s ignored, and only installs a handler if the signal is not
being ignored.</p>

<p>And so, <code>SIGPIPE</code> continues to be ignored, which means that <code>gzip</code>&#8216;s
<code>write(2)</code> returns <code>EPIPE</code>, which gzip sees is nonzero, calls <code>perror</code>
on, and then exits. tar&#8217;s <code>wait</code> then sees <code>gzip</code> exit uncleanly,
which causes tar itself to exit uncleanly, which Python then raises as
an exception.</p>

<p>There&#8217;s an easy workaround, which is to re-enable SIGPIPE in the
<code>subprocess</code> child:</p>

<pre><code>  p = subprocess.Popen(['tar', '-xzOf', tarball, path],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,
                       preexec_fn=lambda:
                        signal.signal(signal.SIGPIPE, signal.SIG_DFL))
</code></pre>

<p>But who would think of doing that, without first having seen this
horribly subtle chain of bug, and having to track down what went
wrong?</p>

<p>(Here&#8217;s the <a href="http://bugs.python.org/issue1652">Python bug report</a>, and many thanks to cjwatson for
posting his discovery of this class of bug on his <a href="http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2009-07-02-python-sigpipe.html">blog</a>, which
greatly reduced the amount of time I would have had to spend tracking
this down)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/02/a-very-subtle-bug/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>A Brief Introduction to termios: Signaling and Job Control</title>
		<link>http://blog.nelhage.com/2010/01/a-brief-introduction-to-termios-signaling-and-job-control/</link>
		<comments>http://blog.nelhage.com/2010/01/a-brief-introduction-to-termios-signaling-and-job-control/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 05:42:52 +0000</pubDate>
		<dc:creator>nelhage</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[termios]]></category>
		<category><![CDATA[tty]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blog.nelhage.com/?p=60</guid>
		<description><![CDATA[(This is part three of a multi-part introduction to termios and terminal emulation on UNIX. Read part 1 or part 2 if you&#8217;re new here) For my final entry on termios, I will be looking at job control in the shell (i.e. backgrounding and foreground jobs) and the very closely related topic of signal generation [...]]]></description>
			<content:encoded><![CDATA[<p>(This is part three of a multi-part introduction to termios and
terminal emulation on UNIX. Read <a href="http://blog.nelhage.com/archives/14">part 1</a> or <a href="http://blog.nelhage.com/archives/27">part 2</a>
if you&#8217;re new here)</p>

<p>For my final entry on termios, I will be looking at job control in the
shell (i.e. backgrounding and foreground jobs) and the very closely
related topic of signal generation by termios, in response to <code>INTR</code>
and friends.</p>

<h2>Sessions and Process Groups</h2>

<p>For the purposes of termios, processes are organized into two
hierarchical groups, <strong>process groups</strong> and <strong>sessions</strong>. Every
process belongs to exactly one process group and one session, and each
process group is contained entirely within a session.</p>

<p>Process groups and sessions are both named by the process ID of the
process to create the group. This process is known as the <strong>process
group leader</strong> or <strong>session leader</strong>. A process creates a new session
using <code>setsid(2)</code>, or a new process group using <code>setpgid(2)</code>.</p>

<p>On Linux, you can inspect the process group and session of a process
using the <code>stat</code> field in <code>/proc/$PID</code>. The first several fields in
that file are:</p>

<pre><code>    pid (name) state ppid pgid sid …
</code></pre>

<p>or, in more words:</p>

<pre><code>    [process id] ([name]) [state] [parent process id] [process group id] [session id] …
</code></pre>

<h3>Sessions</h3>

<p>Sessions are the fundamental group of terminal management. Every
session may have an associated <strong>controlling terminal</strong>, which is
treated specially. A process may open and talk to any number of
terminals, but the special behaviors related to access control and job
control only apply to a process&#8217;s controlling terminal. Each terminal
may be the controlling terminal of at most one session. It follows
that calling <code>setsid(2)</code> to create a new session causes a process to
lose its previous controlling terminal. Acquiring a controlling
terminal is OS-specific, but can usually be accomplished by opening a
terminal device without the <code>O_NOCTTY</code> flag, while not already having
a controlling terminal.</p>

<p>Generally, all your processes within a single login session, or within
a single instance of your terminal emulator, are within the same
session, with the <code>pty</code> allocated by your terminal emulator or <code>ssh</code>
or whatever as their controlling terminal.</p>

<h3>Process Groups</h3>

<p>Process groups are the unit of control for signal generation by a
terminal. A terminal never sends a signal to a specific process, but
always to all processes within a process group.</p>

<p>Access to a terminal is also mediated in terms of process groups. In
addition to having an associated session, every terminal has exactly
one <strong>foreground process group</strong>. Every other process in that
terminal&#8217;s session is a <strong>background process group</strong>.</p>

<p>The foreground process group is awarded special access to its
controlling terminal. It is allowed unrestricted access to read from
and write to the terminal, as well as to call various control
functions, such as <code>tcsetattr</code> on it.</p>

<p>In addition, if any signal-generating character is read by a terminal,
it generates the appropriate signal to the foreground process group.</p>

<p>Background process groups are restricted in their access to their
controlling terminal. If any process in a background process group
attempts to read from its controlling terminal, it will result in
<code>SIGTTIN</code> being sent to its process group. Background processes may
write to their controlling terminal, unless <code>TOSTOP</code> is set in
<code>c_lflag</code>, in which case doing so will generate <code>SIGTTOU</code> to its
process group. Calling terminal control functions such as <code>tcsetattr</code>
is treated like a write operation with <code>TOSTOP</code> set (i.e. <code>SIGTTOU</code> is
sent unless the process is blocking or ignoring it).</p>

<p>The foreground process group for a terminal may be set by the
<code>tcsetpgrp(3)</code> function, which may be called by any process in the
terminal&#8217;s session, but is treated in the same way as <code>tcsetattr</code> in
the previous paragraph.</p>

<h2>Job control</h2>

<p>We&#8217;ve now got most of what we need to understand job control in your
shell.</p>

<p>When processing each command line, the shell uses <code>setpgid</code> to place
all of the programs executed by the line into the same process group,
and then calls <code>tcsetpgrp</code> to make that job the foreground job, and
does a <code>waitpid</code> to wait on that process.</p>

<p>Thus, when you run a shell pipeline (<code>foo | bar | grep baz</code>), all the
programs in the pipeline are in the same process group, and in the
foreground, which is why a <code>^C</code> kills all of them.</p>

<p>When you <code>^Z</code> a jobs, all the processes in the process group are
stopped, and the shell&#8217;s <code>waitpid</code> returns, informing it of the status
change. The shell restores itself to the foreground process group, and
marks the job as backgrounded.</p>

<p>When you use <code>bg</code> to background a stopped job, the shell just uses
<code>killpg(2)</code> to <code>SIGCONT</code> the group. If a job in the background tries
to read from the terminal, the <code>SIGTTIN</code> stops it and the shell&#8217;s
<code>wait</code> detects the state change and adjusts the job&#8217;s state
appropriately.</p>

<p>If you launch a job in the background, the shell simply doesn&#8217;t
<code>tcsetpgrp</code> it into the foreground, nor <code>wait</code> on it.</p>

<h1>In conclusion</h1>

<p>That&#8217;s probably all I want to say about termios. I could talk more about terminal emulation, ncurses, <code>$TERM</code> and friends, but it&#8217;s less interesting to me &#8212; I think I&#8217;m a kernel hacker at heart, and that stuff is just userspace programs talking to each other at this point. I hope you found this series interesting and/or informative, and I&#8217;m always happy to answer questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.nelhage.com/2010/01/a-brief-introduction-to-termios-signaling-and-job-control/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

