Some musings on ORMs

I’m pretty sure every developer who has ever worked with a modern database-backed application, particularly a web-app, has a love/hate relationship with their ORM, or object-relational mapper.

On the one hand, ORMs are vastly more pleasant to work with than code that constructs raw SQL, even, generally, from a tool that gives you an object model to construct SQL, instead of requiring (Cthulhu help us all) string concatenation or interpolation.

On the other hand, there tends to be an enormous mismatch between the objects the ORM is modelling on one end, and the relational database on the back end. Basically, the object model encourages dealing with one object at a time, whereas the relational database wants you to push your queries, JOINs, and even your entire transactions back into it, so that it can better optimize them for you. And so, you tend to have a choice between writing shockingly inefficient code, and jumping through weird hoops and strange syntax to give your ORM the tools to generate efficient queries.

The most fundamental problem, perhaps, is the question of how much data the ORM should request from the database. When I ask for a collection of objects (using some syntax to generate the SQL WHERE clause automatically), the ORM has to decide:

  • Which fields in the object should the ORM SELECT from the database, and
  • Which associated models (with foreign keys to or from the object in question) the ORM should JOIN in and pull out at the same time.

(You can think of the second as a more involved version of the first, but involves more complexity, and is interesting for being potentially recursive).

SELECTing too much data in either case makes the database do too much work, touch too many blocks on disk, read too many indexes, and so on, and makes your queries slow. SELECTing not enough data means your ORM needs to go back and make additional queries when you ask for fields or models that were not pulled in initially, killing your performance even more.

I’d like to muse about two possible solutions, neither of which I’m aware of implementations of, although to be honest I haven’t looked too hard. I’d love it if anyone could point me at work on either sort.

The static solution

Given a language with a sufficiently awesome type system (Haskell comes to mind :)), you could imagine encoding information about which fields and models are or are not SELECTed into the type of a record object, making it statically an error to write a query that under-selects. Given type inference, we can probably even go on to statically infer (probably with a bit of programmer help in the form of type annotations) which fields and models we need to SELECT, based precisely on which fields of the result get accessed.

This is the kind of solution that occurs immediately to a certain type of person (of which I know a lot) when they think about this type of problem enough. It’s fun to think about and there are probably some papers in it (if no one’s beaten you to it), but I suspect it’s the kind of thing that’s unlikely to gain much traction among mainstream developers, if only because it would require you write in Haskell or equivalent.

The dynamic solution

Another solution occurred to me recently, which I don’t think I’ve heard people talking about. I like it because I’m pretty sure it could be grafted onto an existing ORM with little or no API change, which means there’s potential for incrementel adoption and improvement.

The key insight here is that, while the decision of what to SELECT is critical for performance, it’s (mostly)1 irrelevant for correctness whether you have to make additional queries later, or pull in too much data. A good ORM, if it notices you ask for a field it doesn’t have cached, will just go ahead and make the SELECT for you.

This opens the door for heuristic solutions. A good ORM will already have hooks for the programmer to specify which fields or models to include or exclude — Django, to pick a popular example, has the select_related and defer methods on its QuerySets.

So, my proposal is this: Let’s develop tools to profile an application, and automatically generate those annotations in a profile-driven manner. Since all accesses to your data are through the ORM, we can instrument the ORM to determine which fields are selected after a query, and remember that information for next time it sees the same query.

In a webapp environment, the possibilities for optimization get potentially even more exciting. A webapp framework like Django knows an awful lot of information at any given moment, like what user is logged into your app, and what URL is currently being generated. We could imagine associating that data with the profile results, so that the ORM could automatically discover relations between the application structure and the ORM queries. It could “learn”, for example, that certain users are administrators, and so will access more fields on certain models than other users. It could learn that even though /places/list/ and /places/details/ pull up the exact same set of models (maybe even through the same helper function!), the former needs only their names and IDs, while the latter needs to pull in full details, as well as rows from the associated reviews table.

All without a single programmer annotation. Of course, in cases where the behavior is too complex for the tool’s heuristics to pick up, or where the programmer absolutely needs predictable behavior or needs to micro-optimize, you could turn off the heuristics and fall back the old behavior (with the same old annotations available to the programmer).

Is anyone aware of any work in this space? It sounds like a potentially extremely exciting project to me, but I probably don’t have the time to attempt to throw together an implementation.

1.Ignoring, for example, race conditions where you need to select a model and related models atomically, and don’t want or can’t afford a transaction around the entire lifetime of the objects in the app code.

Implementing a declarative mini-language in the C preprocessor

Last time, I announced Check Plus, a declarative language for defining Check tests in C. This time, I want to talk about the tricks I used to implement a declarative minilanguage using the C preprocessor (and some GCC extensions).

The Problem

We want to write some toplevel declarations that look like:

#define SUITE_NAME example
BEGIN_SUITE("Example test suite");

#define TEST_CASE core
BEGIN_TEST_CASE("Core tests");
…

and so on, and somehow translate them into code that does the equivalent of:

Suite *s = suite_create ("Example test suite");
TCase *tc_core = tcase_create ("Core tests");
…

First steps

Instead of having our macros somehow lay down code directly, we’ll use them to define some global data structures, which we will then loop across at runtime, and make the appropriate Check library calls. We’ll define some structs that we’re going to use for this purpose:

typedef void (*test_func)(int);
typedef void (*test_hook)(void);

struct test_case {
    test_func  *funcs, *end_funcs;
    const char *test_name;
    test_hook *setup_hooks, *end_setup_hooks;
    test_hook *teardown_hooks, *end_teardown_hooks;
};

struct test_suite {
    struct test_case *cases, *end_cases;
    const char *suite_name;
};

Since tests and test fixtures are code, we’re going to need to keep function pointers to them. We use typedefs to keep thos readable, and then we define structs to represent a test suite, and a test case. We’re going to represent arrays (of test cases, fixture hooks, or test functions) using a pair of pointers, to the start and end of the array. There’s a specific reason for this representation, which I’ll get to shortly.

ELF Sections

We can easily define global instances of these structs, but the problem is how to get the pointers to the child arrays correctly. These macros are going to be interspersed with other code, so we can’t just use a literal array, like

struct test_suite s = {
 .suite_name = "Example test suite",
 .cases = {
   …
 }
};

because that “…” is going to have arbitrary code shoved in it.

The solution is to take advantage of a feature of the ELF object file format, and some GCC extensions that let us take advantage of it.

ELF object files can contain an arbitary set of named sections, each of which can contain code or data (or both, but that’s not typical). GCC allows us to specify that a given variable should live in a given section using __attribute__((section("section_name")). So, by placing all our struct test_cases for a given test_suite into a separate section, we can cause them to get placed contiguously, which means we just need to get pointers to the start and end. That we can accomplish by declaring a zero-length array in the appropriate section, which (being of size zero) won’t generate any data, but will have the appropriate address.

So, the code we’re going to generate for the above will look something like:

struct test_case _begin_example_cases[];
struct test_case _end_example_cases[];
struct test_suite _suite_example = {
  .cases = _begin_example_cases,
  .end_cases = _end_example_cases,
  .suite_name = "Example test suite"
};
struct test_case _begin_example_cases[0]
  __attribute__((section(".data.example.cases")));

struct test_case _test_example_core
  __attribute__((section(".data.example.cases"))) = {
  .test_name = "Core tests",
  …
};

…

struct test_case _end_example_cases[0]
  __attribute__((section(".data.example.cases")));

Preprocessor tricks

So far, I haven’t actually talked about the preprocessor at all. I’ve just shown the code we want our macros to generate. So now let’s get to the business of actually generating this code. I’ll walk through the definition of `BEGIN_SUITE’, which expands to the code above. The other definitions are all essentially analagous.

My first version of this library didn’t use the SUITE_NAME and TEST_CASE macros, instead requiring those arguments to be passed to every macro. I like the decreased repetition that this method gives, but we’ll start with the explicit version, since it requires less trickery to understand.

From the above example, we can see that BEGIN_SUITE(example, "Example")' is given theexample’ token, and needs to construct both symbols (_suite_example) and strings (".data.example.cases") containing that symbol. To do the first, we need to the processor operator ##, which concatenates two tokens into one. For example, given

#define PASTE(x,y) x##_##y

`PASTE(foo, bar)’ is equivalent to just having written “foo_bar” literally in your code.

Secondly, to construct strings, we need the #x' preprocessor operator. This operator, when applied to an argument to a macro, returns a string literal containing the tokens passed to the macro. A common usage of this is for anassert’ macro, that can print the failed expression:

#define assert(x) if (!(x)) { die("Assertion failed: %s\n", #x); }

And finally, we’re going to need the implicit concatenation feature of the C language, which lets you type "foo" "bar" with the same meaning as “foobar”.

Putting it all together, and we get:

#define BEGIN_SUITE(sym, name)                                          \
    struct test_case _begin_ ## sym ## _cases[];                        \
    struct test_case _end_ ## sym ## _cases[];                          \
    struct test_suite _suite_ ## sym = {                                \
        .cases       = (struct test_case*)_begin_ ## sym ## _cases,     \
        .end_cases   = (struct test_case*)_end_ ## sym ## _cases,       \
        .suite_name  = name,                                            \
    };                                                                  \
    struct test_case _begin_ ## sym ## _cases[0]                        \
    __attribute__((section(".data." #sym ".cases")))

Two last things to note: we deliberately leave off a trailing semicolon, requiring the user to provide one, and we need to end every line with a \, to include the entire body in the macro definition.

Another layer of indirection

The one final bit is to get rid of that explicit sym argument everywhere, in favor of the SUITE_NAME define. Our first try might look something like:

#define BEGIN_SUITE(name)                                               \
    struct test_case _begin_ ## SUITE_NAME ## _cases[];                 \
    …

You’ll quickly realize, however, that since ## is applied at the same time as function-like macro expansion, this will always give you the symbol _begin_SUITE_NAME_cases. The solution is to force the preprocessor to perform multiple phases of macro expansion, to cause the ## to be applied only after SUITE_NAME is expanded. In this case, we require three layers of macros to get the order right:

#define BEGIN_SUITE(name) _BEGIN_SUITE(SUITE_NAME, name)
#define _BEGIN_SUITE(sym, name) __BEGIN_SUITE(sym, name)
#define __BEGIN_SUITE(sym, name)                                        \
    struct test_case _begin_ ## sym ## _cases[];                        \

This causes the following sequence of expansions for BEGIN_SUITE("Example");:

BEGIN_SUITE("Example");                     // (a)
_BEGIN_SUITE(SUITE_NAME, "Example");        // (b)
__BEGIN_SUITE(sym, "Example");              // (c)

The key here is that when expanding a function-like macro, the preprocessor performs the following steps, in order:

  1. Apply any # or ## operators.
  2. Macro-expand any arguments to the call
  3. Perform the substitution
  4. Macro-expand the result of the substitution

In going from (a) to (b), only step (3) applies. When going from (b) to (c), rule (2) means that SUITE_NAME gets expanded first, leaving (c). Rule (1) means that the _BEGIN_SUITE macro is necessary, since a call to __BEGIN_SUITE(SUITE_NAME, "Example") will get the # and ## applied to SUITE_NAME before it can be expanded.

Conclusions

So, that’s basically all there is to the Check-Plus implementation. If you’re comfortable with that, you can try your hand at unravelling the implementation of the Linux kernel’s tracing macros, next (sample usage)!

Jul 4th, 2010

Check Plus: An EDSL for writing unit tests in C

Check is an excellent unit-testing framework for C code, used by a number of relatively well-known projects. It includes features such as running all tests in separate address spaces (using fork(2)), which means that the test suite can properly report segfaults or similar crashes without the test runner crashes.

My main complaint about Check is that (unsurprisingly for a framework written in C), it’s not very declarative. After you define all your tests as separate functions, you need to write code to manually collect them into “test cases”, which you then collect into “test suites”, which you can then run. While this is an understandable interface for a C library to provide, as a user I found that it rapidly annoyed me, and so I wrote Check Plus.

Check Plus is a simple library I wrote that uses some preprocessor tricks and GNU C extensions to provide a declarative EDSL within C for declaring and collecting your test cases. Check-Plus is a single self-contained header file distributed under the LGPL (like Check), so it should be easy to pull into your project.

Example

Check-Plus lets you declare your Check suites and test cases in a declarative manner. A simple example, shipped with Check-Plus, would look like:

#include <string.h>
#include <stdlib.h>

#include "check-plus.h"

#define SUITE_NAME example
BEGIN_SUITE("Example test suite");

#define TEST_CASE core
BEGIN_TEST_CASE("Core tests");

TEST(strcmp)
{
    fail_unless(strcmp("hello", "HELLO") != 0);
    fail_unless(strcasecmp("hello", "HELLO") == 0);
}
END_TEST

TEST(strcat)
{
    char buf[1024] = {0};
    strcat(buf, "Hello, ");
    strcat(buf, "World.");
    fail_unless(strcmp(buf, "Hello, World.") == 0);
}
END_TEST

END_TEST_CASE;
#undef TEST_CASE
END_SUITE;
#undef SUITE_NAME

Having done that, you can extract the declared test suite using construct_test_suite(example), and run it using the standard Check test runner tools.

You can, of course, define multiple test cases within a single suite, and you can even define test fixtures using

DEFINE_FIXTURE(setup, teardown);

within the BEGIN_TEST_CASEEND_TEST_CASE block.

The equivalent code, without Check-Plus, would look similar, but instead of the BEGIN_ and END_ blocks to define the suite and test case, you’d have to write something like:

Suite *
example_suite (void)
{
  Suite *s = suite_create ("Example test suite");

  /* Core test case */
  TCase *tc_core = tcase_create ("Core tests");
  tcase_add_test (tc_core, test_strcmp);
  tcase_add_test (tc_core, test_strcat);
  suite_add_tcase (s, tc_core);

  return s;
}

It’s not a huge difference, but it’s big enough that I really appreciate it once the test suite gets large. For more documentation on Check, check out its manual.

Let me know if you find this useful for anything. Next week, I’ll do a writeup on the tricks behind the implementation.

Jun 26th, 2010

Lab Notebooking for the Software Engineer

A few weeks ago, I wrote that software engineers should keep lab notebooks as they work, in addition to just documenting things after the fact. Today, I’m going to share the techniques that I’ve found useful to try to get in the habit of lab-notebooking my work, even though I still feel like I could be better at writing things down.

Here’s my advice for keeping a lab notebook as a computer scientist:

Use a text file

There’s no need to use anything fancier. And if you’re writing code all day, you can just keep your log file open in your editor of choice, so that you don’t need to leave your hacking environment to make a note of something. I’m an emacs junkie and use org-mode to keep all my notes, but use whatever you’re comfortable with.

You could use a physical notebook, but unless you expect to want documentation of what you were working on for some legal reason, I don’t see the point. And most programmers I know hate writing things out by hand.

Treat your file as append-only
One of the downsides of just using a text file is that you need to enforce some discipline on yourself. It’s tempting to go back and fix up things that have changed, or to keep editing a desgin description as it evolved. But creating a perfect finished piece of documentation is a different task. The point here is to have a log of everything you do, not a single finished document. You can work on one side-by-side with the (dated or even timestampped) append-only log.
Keep automatic backups.
Do this any way you want. I keep my log files in git, and have a cron job that commits and pushes to a server every five minutes. This provides at least two benefits. First, it applies an automatic timestamp (to within five minutes) on anything I type, and second, it serves to make sure that a recent version is available on my server, in case I switch machines and want to keep working on a project.
Train yourself to make regular entries.

Keeping this nice text file, regularly backed up, is useless if you don’t record relevant things into it. Getting yourself into the habit of writing enough down is tricky. One strategy I’ve found helpful is to write down a list of “triggers” for situations that you promise to always record. With an explict list of these that you’ve decided to always record, you can get good at making a note whenever you perform one of these tasks, which gets you into the habit of writing things down. Here’s a partial list of some of the sorts of things I make a point of writing down:

  • Whenever I take a measurement. That is, any time I run time, or look at top to figure out how much memory a program under development is using, or similar, I’ll write down the number and the circumstances under which it was recorded. I’ve found that I almost always would like to have a better record of such things later.
  • Whenever I do a non-trivial search and find a result. This can be anything from googling for a solution to some problem where it takes me several tries to find the right keywords, or grepping a large codebase searching for the function that performs some task, or tracking down some API function in documentation. If I wanted it once, I’ll often want it later, and if it was hard to find, I’ll feel stupid if I didn’t write it down.
  • Whenever I make an explicit design decision. If I’m thinking about how to design some function, system, process, or whatever, I’ll stop and take a moment to write down the options I was considering, and why I made the choice I did. This ensures I can go back and remember why I built a system the way it is.
  • Whenever I wish I had written something down last time. This should be obvious, but if you find yourself thinking “I know I did this before; Why didn’t I take notes?”, you should always make a point of writing it down this time, no matter what excuse you come up with for why you really won’t need that information a third time.

These categories are deliberately broad, but at the same time well-defined. It’s fairly easy to train yourself to notice whenever any of these points triggers, and remember to stop and write something down. And that’s the first step to getting in the habit of writing down everything that you’ll later wish you had written down.

Jun 20th, 2010

WordPress tricks: Disabling editing shortcuts

One of the major reasons I can’t stand webapps is because I’m a serious emacs junkie, and I can’t edit text in anything that doesn’t have decent emacs keybindings.

Fortunately, on Linux, at least, GTK provides basic emacs keybindings if you add

gtk-key-theme-name = "Emacs"

to your .gtkrc-2.0. However, some webapps think that they deserve total control over your keys, and grab key combinations for a WYSIWYG editor of some sort. And so whenever I try to edit a post in WordPress (most of them are written in emacs and then copied over), I find myself trying to go backwards a word, and inserting random <strong> tags all over my post (Because M-b is bound to make text bold, by WordPress’s editor). I finally got annoyed enough to do some source-diving, and discovered that WordPress’s editor constructs keyboard shortcuts using the HTML accesskey attribute. This is easy enough to manipulate from Javascript, so I went and wrote up a quick Greasemonkey user script. The bulk of it is a simple XPath:

    var buttons = document.evaluate('//input[@type="button"][@accesskey]', poststuff);
    var button;

You can install the script off of nelhage.com.

Let me know if you find this useful, or if anyone figures out a general way to disable (sets of) keyboard shortcuts for websites, without relying on knowing the specific tricks that a website uses.

Confessions of a programmer: I hate code review

Most of the projects I’ve been working on today have fairly strict code review policies. My work requires code review on most of our code, and as we bring on an army of interns for the summer, I’ve been responsible for reviewing lots of code. Additionally, about five months ago BarnOwl, the console-based IM client I develop, adopted an official pre-commit review policy. And I have a confession to make: I hate mandatory code review.

I should be clear that I think I still believe in code review as a useful tool for developers working together on a project. A reviewer will flag as bad style or inefficient or ugly things that you let slide working for yourself. A reviewer comes at code without the assumptions of how it’s supposed to work that you made, and can often spot errors you made because mixed up a mental model of your intent with the code you actually wrote.

In addition, code review is a great way to ensure that at least two people are familiar with each piece of your infrastructure. There is often a natural tendency for different developers to take ownership of specific pieces of code or infrastructure, and be the only ones who touch or maintain it. But then it breaks while they’re on vacation, and everyone else is left trying to figure out how this code was ever supposed to work. A mandatory code review policy is often a great way to mitigate that class of problem.

But, theoretical and observed benefits of code review aside, speaking personally, as both a developer and as a reviewer, I’ve been finding it a really frustrating process.

As an author

As a developer, I hate that code review adds unpredictably long latencies into my development workflow. Once I’ve finished and tested a feature to my satisfaction, and sent it off for code review, I have to wait for a potentially long time before I can actually mark it as done. This means that, when deciding what to do next, I have essentially three choices:

  1. Busy-wait. Get coffee, read reddit, and check my mail until the review request comes back.
  2. Continue development on top of the code I just sent for code review.
  3. Work on something completely different.

All three of these options suck. (1) is convenient if I can expect the review to come back shortly, since doing something idle like reading reddit or checking mail lets me keep the code in the back of my mind, so I don’t have to page it back in when the review response comes back. But obviously it’s inefficient, wasted time, and unacceptable if I don’t expect a reply within an hour at most.

(2) is often what I want to be doing. I’ll often be working on a project with several logically distinct but related stages. It often makes sense to send out a review request for each, since they can be deployed and reviewed separately.

However, if a reviewer comes back with significant comments on the code I sent out, I now not only need to update that code, but I also need to rebase the work I’ve done since then on top of the result, which may be a real pain if I’m working on something closely related (e.g. if my work used APIs I built previously, and the reviewer asks for changes in those APIs in some way).

Option (3) avoids both problems, but means that I’m continually swapping different projects in and out of focus. This slows me down, since I have to constantly re-remember where I was in each project, which APIs I was using, and so on. Any developer can tell you that they hate switching between different projects too frequently.

Option (3) might be more tolerable with large projects, where development/review cycles are on the order of weeks. But those aren’t the projects I’m working on.

As a reviewer

Reviewing code is one of those things that I would enjoy if I had infinite time, but that I find a nuisance when I don’t. I do enjoy reading other peoples’ code and figuring out how it could be better, and giving feedback to help get it there.

But doing so well takes a lot of time, and a lot of time is something I rarely have these days. In addition, because of my above complaints about dealing with code review as a developer, I’m always acutely aware that someone is probably waiting on my reply in order to get work done. So I always feel rushed to reply to code review requests, as well.

So, even though I always feel like code review should be something I’m enjoying, I tend to find it a frustrating experience where it just feels like a task I have to do before I can get back to real work.

This is, of course, in part just a symptom of being too busy, but code review makes it a task that bugs me more than other time drains. If a customer reports a critical bug that I have to drop everything to investigate, I’ll be annoyed, but I at least feel like I’m doing something important that fixes a real problem and hopefully ends with a happy customer. If I spend a day doing nothing but reading code reviews, I’ll end up feeling unsatisfied and unproductive. Because code review feels fundamentally optional — even though I believe it’s beneficial, it’s something we’ve chosen to do, not something that we absolutely have to do in order for the project or business to keep operating — it’s more frustrating to find myself spending a large amount of time on.

In conclusion

I believe in code review as a powerful tool. But in practice, I’ve found it frustrating to work with. I’d like to hear your thoughts: Does code review work for you? Am I doing it wrong, in some ways? Is it just a question of changing my attitude in some way?

I know that a lot has been written about doing code review effectively. I’d appreciate pointers to anything you’ve found particularly compelling.

Using X forwarding with screen by proxying $DISPLAY

If you’re reading this blog, I probably don’t have to explain why I love GNU screen. I can keep a long-running session going on a server somewhere, and log in and resume my session without losing any state.

I also love X-forwarding. I love being able to log into a remote server and work in a shell there, but still pop up graphical windows (for instance, gitk’s) on my local machine when I need to.

Unfortunately, X-forwarding and screen don’t totally play nice together. When I start a screen session, I fix a value of DISPLAY in that session, while I can change it in individual shells, having to remember to do so whenever I open a new ssh session is irritating. Ideally, of course, we’d have something like screen, but for X11, so that X sessions could live in a virtual X server on your machine, which gets forward to a real X server on demand. I hear that NX does something like this, or even just a VNC window.

But in practice, I find I tend to care less about having my X windows long running. If I’m popping up a gitk to look at some commits, I will probably just close it, and don’t care about it being there tomorrow. Really, I just want a way for DISPLAY to magically track the latest X-forwared DISPLAY, so that in any window in my screen session, I can run gitk or display or such, and it will magically pop up in the appropriate X server.

So, last week, I finally wrote a script that lets you do just that. Instead of futzing with DISPLAY, it works by proxying between two values of DISPLAY. You initially open a screen with some dummy “virtual” DISPLAY that nothing is connected to — I tend to use :15:

env DISPLAY=:15 screen

Then, whenever you log in to the machine with X forwarding (or log in locally), you simply from:

proxy-display :15

proxy-display starts listening for connections on display :15, and proxying traffic between there and whatever $DISPLAY was when it was launched. In addition, it makes the appropriate xauth incants so that everything just works. It also looks for any old instances of itself listening on :15, and kills them off, to prevent leaking processes. Any connections to old instances of itself, however, that had accepted connections and were now proxying data, are left alone – so existing X windows stay open on whatever display they’re on.

So, now you can just add to your dotfiles something along the lines of

if [ "$DISPLAY" ]; then
    proxy-display :15
fi

And whenever you ssh into your remote machine and resume your screen sessions, you can run X programs and they’ll magically pop up in your most recent X-forwarded connection.

(The script, available on github, is currently a disgusting mess of shell that uses socat to do the actual proxying, and mucks around in /proc/net to find old instances of itself to kill. But it works wonderfully, so I haven’t bothered to clean it up)

May 30th, 2010
Tags:

Getting carried away with hack value

Recently, I’ve been working on some BarnOwl branches that move more of the core functionality of BarnOwl into perl code, instead of C (BarnOwl is written in an unholy mix of C and perl code that call each other back and forth obsessively).

Moving code into perl has many advantages, but one problem is speed – perl code is obvious a lot slower than C, and BarnOwl has a lot of hot spots related to its tendency to keep tens or hundreds of thousands of messages in memory and loop over all of them in response to various commands.

Unfortunately, one downside of the C/perl mix is that it makes profiling quite difficult. I can run the perl side under NYTProf and get a good picture from the perl side of things, but I’ve been unsatisfied about my visibility into the C side of things. The main problem is that oprofile and gprof, my usual tools are statistical profilers, which means that if they sample while the code is executing inside Perl, they don’t know which C code called it, and so while I can tell if a lot of time is being spent in perl, I can’t tell which C functions made the calls that are being slow. What I really want is a deterministic profiler that tracks every function entry and exit, so that time spent in a perl call can be included in the total time of the C functions earlier in the call chain.

After doing some quick googling and finding nothing compelling, I decided to hack up something of my own — this can’t be that hard, right?

The tricky part here is getting a way to intercept function calls and returns, so that you can timestamp them and then figure out where the time was being spent. My brilliantly hackish plan was:

  • If you compile your code with -pg, for profiling with gprof, GCC generates a call to the mcount function early in the prologue of every function. Normally that calls out into a function in glibc that keeps stats for gprof, but I can override that with LD_PRELOAD.
  • The next step is trapping function returns. I wrote a quick program that loads an ELF using libbfd, disassembles any text sections using udis86, and replaces any RET instructions with the one-byte INT3 instruction, which generates a debug trap.

    I could then make my preloaded library install a SIGTRAP handler, which could inspect the trapped instruction pointer, emulate the RET, and then return.

The two parts work independently, but when I glued them together, I hit trouble. It turns out that gcc -pg dumps some code into the .text section of your binary, that ends up getting called into very early in the shared-library load process — in particular, before my LD_PRELOADed library could install its SIGTRAP handler. Since I had patched those functions to INT3 instead of RET, this crashed the binary with an unhandled SIGTRAP almost immediately.

After some debugging, I was able to resolve that issue by modifying the patch program to overwrite the first byte of the offending programs with a RET instruction (after patching out RETs, of course), so that they effectively didn’t get called. Since I’m using gcc -pg only for the mcount calls, I don’t want them, anyways.

After about four hours of hacking, I had a working system which could take an executable and run it, generating a trace of function calls and returns as the program executed. As I was settling down to write some code to do something useful with the output, my friend Josh Oreman asked a question: “Couldn’t you do this more easily with -finstrument-functions?”

Having never heard of this, I pulled up the GCC info page, and found:

 Generate instrumentation calls for entry and exit to functions.
 Just after function entry and just before function exit, the
 following profiling functions will be called with the address of
 the current function and its call site.

      void __cyg_profile_func_enter (void *this_fn, void *call_site);
      void __cyg_profile_func_exit  (void *this_fn, void *call_site);

Oh. Sometimes it pays to do a little bit of documentation searching before you get carried away with the hack value of your Awesome Solution — someone may have anticipated your problem and written a real solution.

Oh well. It was a fun experiment, and I refreshed my memory about doing evil things with bfd, udis86, signals, and the dynamic linker.

I still haven’t yet written the code to turn traces into userful profiling information, but if I do produce a useful profiler tool, I’ll post something on this blog.

May 23rd, 2010
Tags:

The Window Manager I Want

Since I first discovered ratpoison in 2005 or so, I’ve basically exclusively used tiling window managers, going through, over the years, StumpWM, Ion 3, and finally XMonad. They’ve all had various strengths and weaknesses, but I’ve never been totally happy with any of them. This blog entry is a writeup of what I want to see as a window manager. It’s possible that some day I’ll get annoyed enough to write it, but maybe this post will inspire someone else to (Not likely, but I can hope).

Layout

At any given moment, the screen1 is divided into one or more panes. These panes always tile the screen. Each pane may or may not currently be displaying a window. If it is not, then the desktop will be displayed in that pane.

The primitive operations you can perform on the pane include, but are not necessarily limited to:

Split
Splits the focused pane into two equally-sized panes, either horizontally or vertically. One of the child panes will display whichever window the previous pane displayed, the other is empty.
Resize
Change the relative size of two child windows that were split from the same parent.
Kill Pane
Destroy the current pane. If it is displaying a window, that window is not sent a close message. The pane’s sibling window expands to consume its space.
Next/Previous pane
Focus the next or previous pane. Panes are ordered based on position on the screen, regardless of how they were split to arrive at the current layout.
Goto Pane
Panes are numbered, based on their location on screen. Mod-N focuses the Nth pane.

Selecting windows

There is a global list of all windows. Windows are ordered arbitrarily in this list (maybe based on the order they were opened?). Every window has a name, which is initially set to the WM_NAME property set by the window’s client.

The following operations are available to manipulate windows:

Next/Previous Window
Replace the window in the current panel with the next or previous window in the list.
Rename
Prompts for a new name for the current window. If the user enters an empty string, the window reverts to the default behavior of using the WM_NAME.
Goto Window
Prompts for the name of a window to switch to. This prompt matches on substrings or even sub-sequences of the window name, displaying the result of the selection as you type. After typing some characters, you can scroll through the list to select an entry, instead of completing typing.
Kill
Sends a close message to the window in the focused panel.

Desktops

It seems likely I will want multiple desktops. Each desktop will have its own pane layout. However, the window list is still global, not per-desktop. Goto Window will always operate on the global window list. Selecting a window that is visible through a pane somewhere else causes that pane to become empty.

Alternately, there is no concept of multiple desktops. However, there is the ability to save and restore layouts, meaning both the layout of the panels on the screen, and the list of which window is in each pane. The primary difference here is that a window can be associated with multiple panes in different saved layouts, and gets resized/moved as necessary as you switch panes. I suspect I like this model better.

Postscript

If this sounds familiar to you, it probably should. What I’ve described is essentially identical to how emacs manages buffers and windows (analogous to X windows and panes in the above, respectively), with window-number.el and either iswitchb or ido. I manage hundreds of buffers in emacs this way, and complicated screen layouts, whenever I’m doing any hacking, and I love it.

I would in fact be tempted to write my window manager into emacs itself, except for the annoying fact that emacs is very much single-threaded. It’s already annoying enough when network drops and a hung network filesystem takes down my emacs waiting for a timeout; It would be utterly unacceptable if that took down my entire window manager, too.

I’d alternately be tempted to try to make this an XMonad plugin, but I think that it’s sufficiently different from XMonad’s data model that the impedance mismatch would suck.

Footnotes:

1 I’m only going to consider the single-monitor case for now, but the generalization should be easy.

Software Engineers should keep lab notebooks

Software engineers, as a rule, suck at writing things down. Part of this is training – unlike chemists and biologists who are trailed to obsessively document everything they do in their lab notebooks, computer scientists are taught to document the end results of their work, but aren’t, in general, taught to take notes as they go, and document the steps they take in building a system. 6.005, MIT’s new introductory software engineering class, attempted to require its students to keep lab notebooks for a few semesters, and was met with near-universal complaints and ridicule from the students (“Lab notebooks? For a software engineering class? What the hell?”). (To be fair, I suspect they did a horrible job of it, but I’m not sure that students would have been any less confused at the idea).

Part of the reason is probably also the nature of software that makes it very easy to record certain things as part of our tools, and that makes experiments cheaply reproducible. Version control lets us document the process of developing a piece of code, so it feels superfluous to be taking additional notes on the side. Computers are mostly deterministic, and cycles are cheap, so why bother meticuously recording the results of a test run somewhere when you can just run it again later, any time you want? Computers feel much neater and simpler than messy bio or chem labs, and software is much simpler than complicated biology experiments or chemical syntheses, and so no one feels the need to be nearly as careful.

However, I am increasingly of the opinion that most software engineers’ total inability to work in a lab-notebook style, where you meticulously document your work, is unfortunate and often seriously detrimental to their work. While it’s true that things like commit logs do a good job of documenting certain processes, here are some types of situations where I’ve found working with meticulous notes at every step can be invaluable:

Debugging subtle problems
Debugging is very much a problem of gathering data and making and testing hypotheses. For subtle bugs in large programs, the amount of state you need to keep track of can rapidly get out of hand. And good luck when a bug is tricky enough that your debugging gets spread across multiple days, or even across a lunch break.

If you’ve ever found yourself wondering “Wait – did I see the bug after I made $CHANGE to my code or test environment?”, you should have been writing more things down.

This is especially important for non-deterministic bugs, such as rare race conditions. If it takes you half a hour on average to reproduce a bug, and you are experimenting with a dozen different variables in your test environment that might affect the bug, you can’t afford to forget the results of a single test, or to forget a single detail of what you did to test. This is the point at which you should be writing down every single command you type in any relevant prompts, and every single code change (or, since we have technology, obsessively saving the output of `history`, making commits to test branches, and recording the correlation between them).

Profiling and optimization
This is a process similar in many ways to debugging, but even more data-driven. When you’re done with a session of optimizing a piece of code or a system, if you can’t show documented evidence of exactly how much faster you’ve made it, where that speedup came from, and all the things you tried and how much they helped, perhaps you should be writing more things down. And if you (or ideally, even someone else) can’t go back and reproduce the experiments you did, with approximately the same results, you probably haven’t been documenting your work well enough.

Even if you’re happy with the performance improvements you’ve made, you may need to come back and wring even more performance out of the system, and it’d be nice not to start from scratch. Or maybe future testing will reveal that or more of your optimizations was invalid, and you need to go back and consider alternate options.

This is critically important when you’re optimizing not just a piece of code, but some kind of system with lots of configuration and setup, that you’ll later have to duplicate somewhere else, instead of just checking the result into source control.

Understanding a new project’s code or documentation
Whenever I’m first diving into a large code-base or first playing with a large new API, I find it invaluable to take notes as I go about what I look for and where I find it. I’ll often need to look up half a dozen different API calls or pieces of code to understand something, often too many to keep in my head as I go and dive through more and more pieces of code or docs.

And when the documentation is ambiguous, I’ll often drop into a REPL or build test programs to make various calls and understand what happens. Again, after more than two or three of these, it’s vital that I’ve been writing down my findings.

This is one example where a chronological style documenting exactly in what order I found things is less critical, but that detailed notes as I go are still vital.

Designing things
Whenever you’re designing something – be it an API, a protocol, an interface, some kind of system, or something else – it’s worth taking notes on the process you took to get to your final decisions, and the choices you considered and rejected, and why.

You’ll presumably end up a producing a piece of code or a design document that indicates what you ended up deciding, but understanding why you made the decisions you made is often important to understanding how your system is supposed to work, and how to best use or extend it in the future. Hopefully, when you’re done, you’d do this writeup in brief somewhere anyways, but the best way to make sure you don’t forget is to take good notes as your thought process happens.

And nothing in software is ever complete. If you have to revise the design for some reason, because someone points out problems or new requirements come up, you’ll probably want to remember the other possibilities you came up with – maybe one of them is now more right.

So, if you’re a software engineer, I strongly encourage you to try to get better at writing things down. In a future post, I’ll hopefully write up the techniques I’ve started using to take notes as I code, debug, and design, but in the meanwhile, I encourage you to just grab a text editor or a physical lab notebook, whichever is more comfortable for you, and start taking more notes on what you’re doing.