Reinventing the Wheel

When I was getting my doctorate, most of my lab worked involved PCR and sequencing of DNA. And to let you know just how long ago this was, all of my PCR gels were captured using polaroid camera film. And of course, the first thing I would do after getting a good gel image would be to whip out the Sharpie™ markers and circle bands and label lanes, etc.

It wasn’t until a year or two later that my advisor (the long-suffering Bill Birky) would come up to me and say “Hey, that gel is great – let’s put it in that paper we’re writing. Where’s the original?” Of course, the original was that polaroid with all the marks on it. I had destroyed primary data by drawing on it, and the only solution was to repeat the experiment again.

So I went to the fridge and found the eppendorf tube that had the PCR product from that experiment and ran the gel out again. No big deal, it would only take an hour or so. Except that when I got the results of that gel, it turned out that the DNA had degraded. Back to the drawing board – which in this case meant re-PCR amplifying the DNA, then running the gel out yet again. Still no big deal, it would only take a half day or so.

That’s why getting a Ph.D. in molecular biology takes as long as it does, by the way.

We’ve all done this sort of thing. We’re so busy doing the actual work of science that we don’t pay attention to things like proper knowledge (data) management. Even with computer files we have this problem – we overwrite a file we meant to “Save As…”, or format the wrong drive. Data gets lost. We spent a lot of time reinventing the wheel – repeating experiments we’ve already done.

That’s just the way we’ve always done things. And yes – there’s a better way.

CERF has a Version Control System built in – basically what that means is that whenever you modify a file, or document, or notebook entry, when you save your changes you don’t overwrite the last version of that document – instead the system saves your changed file as a new version, which replaces the old version in the display. And the version history is available to you, so that you can literally walk back in time and see what every saved version of every file looks like. Since CERF is a collaborative system, it also helps manage documents that may be edited by multiple authors. For example, when I “check out” a file (as easy as double-clicking on it), the file is locked to any other potential editor until I finish saving my changes back to the system (as a new version). Other users of CERF can always see the version of the file that’s in the system (providing they are authorized to do so), but since they can’t edit the file when someone else is, CERF prevents us from accidentally overwriting each other’s work. And the version history tracks who made changes and when, so not only can you access all version of a document, you know who created it and when. You can even revert back to any previous version if you like.

This is huge. It means that you’ll never have to repeat an experiment you’ve already done because you accidentally overwrote the file – because you can’t overwrite files! Think about it – you can put your raw, original data into CERF, and open it up from CERF into your favorite editing/authoring/analysis package – do whatever you need to do – then save it back. CERF manages everything else; you can always get back at the previous version(s) if you need to do the analysis over again – or if you simply want to get the original for publication purposes.

We reinvent the wheel for other reasons as well. Here’s another great example: say that you are looking for that one best gel image – it’s time to publish, or time to write up the next grant proposal. So where is it, you say to yourself as you look at the shelf full of notebooks in your office – which notebook is it in, and where? After a moment’s deliberation you may decide that it’s easier to just repeat the experiment than it is to find the data – the data that you know you already have – in those old paper notebooks.

We do that all the time, because we simply cannot search paper notebooks! Clearly an ELN like CERF is a better way – searching by keyword, full text, metadata, tags – you name it. Now you’ll never repeat an experiment for that reason, anyway.

Next week we’ll talk about other searching tools CERF has that will make your life better.