Software Engineers should keep lab notebooks
Software engineers, as a rule, suck at writing things down. Part of this is training – unlike chemists and biologists who are trailed to obsessively document everything they do in their lab notebooks, computer scientists are taught to document the end results of their work, but aren’t, in general, taught to take notes as they go, and document the steps they take in building a system. 6.005, MIT’s new introductory software engineering class, attempted to require its students to keep lab notebooks for a few semesters, and was met with near-universal complaints and ridicule from the students (“Lab notebooks? For a software engineering class? What the hell?”). (To be fair, I suspect they did a horrible job of it, but I’m not sure that students would have been any less confused at the idea).
Part of the reason is probably also the nature of software that makes it very easy to record certain things as part of our tools, and that makes experiments cheaply reproducible. Version control lets us document the process of developing a piece of code, so it feels superfluous to be taking additional notes on the side. Computers are mostly deterministic, and cycles are cheap, so why bother meticuously recording the results of a test run somewhere when you can just run it again later, any time you want? Computers feel much neater and simpler than messy bio or chem labs, and software is much simpler than complicated biology experiments or chemical syntheses, and so no one feels the need to be nearly as careful.
However, I am increasingly of the opinion that most software engineers’ total inability to work in a lab-notebook style, where you meticulously document your work, is unfortunate and often seriously detrimental to their work. While it’s true that things like commit logs do a good job of documenting certain processes, here are some types of situations where I’ve found working with meticulous notes at every step can be invaluable:
- Debugging subtle problems
-
Debugging is very much a problem of
gathering data and making and testing hypotheses. For subtle bugs
in large programs, the amount of state you need to keep track of
can rapidly get out of hand. And good luck when a bug is tricky
enough that your debugging gets spread across multiple days, or
even across a lunch break.
If you’ve ever found yourself wondering “Wait – did I see the bug after I made $CHANGE to my code or test environment?”, you should have been writing more things down.
This is especially important for non-deterministic bugs, such as rare race conditions. If it takes you half a hour on average to reproduce a bug, and you are experimenting with a dozen different variables in your test environment that might affect the bug, you can’t afford to forget the results of a single test, or to forget a single detail of what you did to test. This is the point at which you should be writing down every single command you type in any relevant prompts, and every single code change (or, since we have technology, obsessively saving the output of `history`, making commits to test branches, and recording the correlation between them).
- Profiling and optimization
-
This is a process similar in many ways
to debugging, but even more data-driven. When you’re done with a
session of optimizing a piece of code or a system, if you can’t
show documented evidence of exactly how much faster you’ve made
it, where that speedup came from, and all the things you tried
and how much they helped, perhaps you should be writing more
things down. And if you (or ideally, even someone else) can’t go
back and reproduce the experiments you did, with approximately
the same results, you probably haven’t been documenting your work
well enough.
Even if you’re happy with the performance improvements you’ve made, you may need to come back and wring even more performance out of the system, and it’d be nice not to start from scratch. Or maybe future testing will reveal that or more of your optimizations was invalid, and you need to go back and consider alternate options.
This is critically important when you’re optimizing not just a piece of code, but some kind of system with lots of configuration and setup, that you’ll later have to duplicate somewhere else, instead of just checking the result into source control.
- Understanding a new project’s code or documentation
-
Whenever I’m
first diving into a large code-base or first playing with a large
new API, I find it invaluable to take notes as I go about what I
look for and where I find it. I’ll often need to look up half a
dozen different API calls or pieces of code to understand
something, often too many to keep in my head as I go and dive
through more and more pieces of code or docs.
And when the documentation is ambiguous, I’ll often drop into a REPL or build test programs to make various calls and understand what happens. Again, after more than two or three of these, it’s vital that I’ve been writing down my findings.
This is one example where a chronological style documenting exactly in what order I found things is less critical, but that detailed notes as I go are still vital.
- Designing things
-
Whenever you’re designing something – be it an
API, a protocol, an interface, some kind of system, or something
else – it’s worth taking notes on the process you took to get to
your final decisions, and the choices you considered and
rejected, and why.
You’ll presumably end up a producing a piece of code or a design document that indicates what you ended up deciding, but understanding why you made the decisions you made is often important to understanding how your system is supposed to work, and how to best use or extend it in the future. Hopefully, when you’re done, you’d do this writeup in brief somewhere anyways, but the best way to make sure you don’t forget is to take good notes as your thought process happens.
And nothing in software is ever complete. If you have to revise the design for some reason, because someone points out problems or new requirements come up, you’ll probably want to remember the other possibilities you came up with – maybe one of them is now more right.
So, if you’re a software engineer, I strongly encourage you to try to get better at writing things down. In a future post, I’ll hopefully write up the techniques I’ve started using to take notes as I code, debug, and design, but in the meanwhile, I encourage you to just grab a text editor or a physical lab notebook, whichever is more comfortable for you, and start taking more notes on what you’re doing.
From the Boost library requirements and guidelines “rationale rationale”:
“Beman Dawes comments: Failure to supply contemporaneous rationale for design decisions is a major defect in many software projects. Lack of accurate rationale causes issues to be revisited endlessly, causes maintenance bugs when a maintainer changes something without realizing it was done a certain way for some purpose, and shortens the useful lifetime of software.
Rationale is fairly easy to provide at the time decisions are made, but very hard to accurately recover even a short time later.”
(As a side note, when reading this post in Thunderbird, clicking anywhere in the comments area caused Firefox to open a tab at the submit comments php page, with no content. Not sure if that’s Thunderbird’s fault, or WordPress’s.)
I think this is a place where open-source projects conducted largely online have a big advantage — when all discussion takes place over e-mail and IM, and they’re publicly logged, the project gets a pretty clear record of the decisions the group made without the group having to take much extra effort to do so. Obviously it’s not 100%, but I think it’s better there than in most of the proprietary software I’ve worked on, where things are more often conveyed verbally and thus more likely to get lost or misremembered.
Nice. Two practical lessonsin this department:
If you call the lab notebook a wiki, it may help at least some people swallow the pill more easily. The right wiki will also provide history and encourage editing which is a plus. This has proven especially valuable for tasks which are not just conventional coding tasks. Adding to your list: setting up a dev environment (I wish on most platforms this was dominated by aptitude install … but it can get much more complicated), release engineering, operational procedures, etc. Of course, automation can help, but sometimes the cost/benefit favors documenting a manual process.
At my robotics startup, we have adopted the code inspection process from “Code Complete”. It is not without flaws, but in general I believe it has improved quality. Apropos to this post one part of the review process is for a scribe who is not the author to write down all observed defects. This keeps a reasonable record of changes after the initial dev effort (but of course you’ll need some other mechanism to help with that).
This article was submitted to Lobsters. I wrote a comment there describing how I took something like a lab notebook when I was on co-op.
The problem is with the way that programmers and (for example) chemists are perceived. A chemist will get a project worth $100M, lasting 5 years, needing peer reviews, regulatory test, field trials etc. A programmer will get an hour to produce a week’s work and will only get recognition if he / she is a failure. Until programming is really seen (both by practitioners and their management) as a scientific / engineering discipline and not a quick hack job, you will never be able to add in apparent time wasting tasks such as recording results.
I’ve found notekeeping to be very valuable in debugging both software and hardware issues. In particular, when Googling and adding snippets of search results to my notes, including the URL in said notes proves to be a real plus if I have to revisit an issue at a later date.
At one of my former software developments jobs we were supposed to keep a lab notebook for a very different reason than those that you give: Intellectual Property. The company wanted to be able to protect itself in the cases of intellectually property disputes – e.g. prove prior art in the case of patent litigation.
I’ve been doing software engineering for about 5 years but all of my training is in mechanical and I 100% agree with this article. In my first full time software job the first thing I did was get a hard cover notebook for keeping a log of my work. I admit that these days I don’t record every single detail, but important things like design decisions, the solutions to very hard to fix bugs, and how-to guides for new APIs and compiling procedures always make it in. One thing that has served me well is to keep my table of contents sections up to date, because when you have a stack of logbooks from years of development, it makes finding something much much easier.
Great ideas.
I have a giant OneNote at work that I’ve been using for the past 4 yrs. It has been invaluable. Would be great if OpenOffice/LibreOffice or some other open source project could make a clone. Any recommendations? I’ve tried using Evernote but its just not as robust as OneNote and I can’t have it sync to say DropBox or any other cloud based file storage.
sometimes it’s good to do things old school and just write your notes down in a lab notebook the old fashioned way with a No. 2 pencil. it won’t kill you, but it just might save your a&& some day.
I ended up blogging on my site in reaction to this article, http://rtigger.com/blog/2012/12/12/executable-documentation/
tl;dr – Although taking personal notes does have benefits, I think executable documentation in the form of unit/integration/behaviour tests work better than personal documentation in a collaborative software development environment.
Agreed. Keeping notes is very useful which is why I made ForestPad.
I only half agree with most comments, writing things down for me is not so bad but remembering where the hell I put half of the notebooks or pieces of paper is a nightmare. Was forced to create a work log area on a site a few years ago for tracking progress developments over several projects.
Since that time I’ve adopted it as my own personal “wiki” (nice reference whoever that was) and has helped greatly since I began using it.
I don’t think it really matters how you document your work, just as long as you do and personally find it easy to revise it at any-time.
Yup, you are right.
Idunno … I look at my code. If it’s not obvious, it needs refactoring. If it is, the notes about it are useless.
Of course, this is an overly idealistic presentation of the idea. But that’s what I preach and practice: code should really really really be self-documenting. If you need more documentation than the occasional readme file or some inline comments, there’s something wrong enough that it’s worth fixing in your code.
Lately (over the last half a year or so), I’ve been writing mainly Java. I only use javadoc comments in library code, from which documentation should be generated. Whatever is strictly application code does not get comments – it simply has to be readable, adding comments would mean not to eat my own dogfood. Occasionally, drawing an UML diagram and providing some notes on it makes sense, but again, if you need to add many of your classes to the diagram there’s something wrong. Your app should rely on a handful of generic interfaces, and not have each class implement its own specific interface. If this condition is met, maintenance programmers can look at individual, small parts of the application, understand them in isolation, and be able to do their work without understanding much more.
One thing I find highly useful in web development, especially as a PHP coder who helped design and maintain a custom content management system, is using revision control and not allowing commits without notes on the code commit. That said, some coders will reuse the same notes (I’ve done it sometimes myself), and others will do something like put X in the notes box for a SubVersion commit. Needless to say, that’s not very helpful, even with diff tools, because then you don’t know why the commit was done. But if you make detailed comments in your commit notes, revision control is invaluable in terms of documentation of why something was done.
I would suggest that the first two reasons you give for using a notebook are actually good arguments for using version control too.
I completely agree with the basic premise. I have a string of hand-written notebooks, taken in multi-color ink, dating back many years. I give “rookies” a notebook & pen on day one and tell them to use it.
We do daily scrums that almost require you to have at least some written notes :)
But there’s a bigger problem here, which is that most programmers are already making all kinds of notes in lots of different places. You make notes when committing code, you add notes in code, you add notes to JIRA tickets, you add notes in e-mails you send to colleagues and project managers, you put notes in the company wiki or the shared spreadsheet docs, etc.
At the first level, Software developers definitely need to get better at documenting as they go (Though this is also an organizational issue).
At the second level, we need to improve our tooling around how we organize, correlate and search through the notes that we do have. While some of this tooling exists at a basic level, we are very far from having a holistic view of activity on even simple systems.
We use an electronic lab notebook (or sorts) of our own construction. It is part of our bug/feature/requirements system. Each bug/feature/requirement has an incident associated with it, and hanging off that incident is a time-stamped list of status updates, which you might call “notebook entries”.
As part of the daily stand-up meeting, which usually occurs one-on-one instead of in a group, I record for my developer a terse summary of what he has done, what will be done, and any roadblocks. This has been going on for 2 or 3 months.
Over time I have noticed that most engineers have been happy to make log entries of their own. I keep telling them that there is no need, that it is primarily intended as evidence that I’m not failing to hold the required daily stand-up meeting with them. And I’m actually quite serious about this, that it was originally intended for my own discipline, and the discipline of my sub-team managers, and not necessarily for the discipline of the software developers at the leaves. But the status update idea is catching on, and I have no complaint about that.
The status log is useful for the following reasons: - the act of constructing the stand-up summary in just a couple-hundred characters is a useful challenge that makes you assess where you stand from a big picture point-of-view. it gets your head out of the details for a minute, which is a useful daily exercise. - the list of entries shows how long projects really take. which is an eye-opener for all of us who tend to be optimistic. (“wow, you’ve been working on this for 3 weeks. do we really want to invest that much in this?”) - the list of entries are useful to help higher-level executives see that work is indeed happening. resulting in less heat for me, as the manager, in my reporting. - the list helps us remember where we stand if we have to take a break from a project for a while. this particularly helps me, the manager, because i’m overseeing work on nearly 50 tasks at any given time, with hundreds that have been put on-hold and will become active again in the years to come. - to make sure that sub-managers are holding their stand-ups.
First thing I do at a job is start a web page. Everything is documented in child pages. I also have a log page, a contacts page and pages describing SQL setup, TFS setup, Publishing procedure, etc. with screen shots. I hate to learn anything twice. Everything is documented so that I repeat nothing and can be sure of what I have done that worked and sometimes what failed. I am often called in to solve problems left behind by others. Lazy amateurs don’t believe in commenting code. First think I do in analysis is comment the trash they left behind. Why they did it is often still just a guess even after I know how it works.
@Matt Daemon
I agree with much of this post. Nothing is more embarrassing than not remembering why some design decision was made.
I second use of OneNote. I keep everything in OneNote so it is live (html links) searchable and persistent. General use notes (tips on enabling heap debuggers, etc. etc) I take with me from job to job. To do lists, notes on existing code, design notes, stack traces. All organized and searchable. Plus you can share any of it you want.
As for unit tests etc. Yes, those are fine but not sufficient, they show what, not why.
Software engineers, programmers and the like share many commalities. The one that diminishes with time is memory. The more complex the project, the more demands upon our memories to keep track of every little detail. After the last 20 years of my 40 year career I find my memory not as sharp so I comment in the code and take notes in my notebook.
ditto!
@Mike Like the idea. Do you have an example of this at work?
This is a great article. I agree with the comments stating the importance of tests and the fact they describe how the code should work but have no value in documenting the reasons behind such an implementation.
I have a more unconventional method for documenting ‘lab notes’. I keep notes in time tracking service. I like to keep track of time based on a feature or any separate unit of work. This provides excellent granularity for the content. I’ve been looking for a better container to house this content however.