How did ancient mariners judge the speed of their sailing ships? They would throw a log of wood overboard attached to a piece of rope, and measure how much rope was pulled along with the log in a fixed period of time measured with an hourglass. Sailors would then write the resulting estimate of the speed in a book … a log book.

So blog, a contraction of web and logbook, comes to us from this rather crude and ancient measurement technique. This serves as a reminder that log books, and by extension blogs, ought to record useful data. In fact, most of the blogs in cyberspace record thoughts, opinions, digressions, indiscretions and just about everything but useful data.

Of course, there are exceptions. In his recent and eminently readable book Reinventing Discovery, physicist and open science advocate Michael Nielsen makes a good case for ‘a new era of networked science’, as the book is subtitled. His first example is of a blogger and mathematician, Tim Gowers, and the progress he made on solving a tough mathematical problem thanks to blogging about it, and getting a lot of other mathematicians – pros and amateurs – involved that way. Gowers called this blog-based experiment the Polymath Project.

But this exception prove the rule. Even promising sites with names like ScienceBlog and ScienceBlogs are just reposting science news or giving a few loquacious scientists a chance to voice their thoughts, opinions, digressions etc. Nothing wrong with that, of course. But I don’t know of a site where scientists are actually blogging hard-core data, straight off the gizmos in their lab.

Why not? Why couldn’t someone create a science blog that was about recording scientific data on the Web? Not as an afterthought, put as a primary data archive? Michael Nielsen has a simple answer to this: publicly releasing data does nothing for a scientist’s career and could even damage it, by helping competitors.

So why did the blog-based Polymath Project work? Michael suggests that despite a radical appearance, the Polymath Project and other citizen cyberscience projects such as Galaxy Zoo are “ultimately projects in service of the conventional goal of writing scientific papers.” Michael goes on to argue that this is a problem, ” because some of the best ideas for improving the way scientists work involve a break away from the scientific paper as the ultimate goal of scientific research.”

Michael’s book contains many good ideas about how to make blogs and other online technologies more valuable and thus more widely used by science. But I’d like to focus on the most obvious one, namely using blogs to do what their log-monitoring forbears did – record data. This gives me the excuse to retell a story about an experiment I did together with colleagues at CERN nearly a decade ago, called openlogbook. (This is a blog, after all, so allow me to digress…)

The experiment was a collaboration with the ATHENA project, the predecessor of ALPHA. Both experiments have been pushing the envelope for our understanding of antimatter by making, trapping and – most recently this month – studying spectroscopically anti-hydrogen.

ATHENA was a small project by CERN standards, where small means “only” 20 or so university partners. There’s a hut above the experiment area where a few of the many scientists involved spend days and nights recording data. Despite several attempts, the ATHENA team had not managed to wean itself off paper-based logbooks. For some reason, people are just much better at noting things down on paper than in some electronic form.

The problem was that the resulting pile of paper logbooks – over a dozen got filled in a year – had to be copied twenty times and sent to each partner, so they had a record of what had happened. This was done once a year by a secretary, who spend a week making and distributing the copies. And by the time the scientists in far-flung labs got the photocopies, the information in them was hard to use: exactly what one scientists meant by “problems with the trigger”, jotted down at 3am seven months ago, was probably no longer clear even to the scientist who jotted it down.

The idea behind openlogbook was to use digital pen and paper for the logbooks, so that everything written down could be immediately shared via the web with the other scientists, while still affording them the convenience of an (almost) standard pen and paper logbook. We used a technology developed by a small Swedish firm, Anoto, whose chief developer had given a talk at CERN that impressed me. The pens were produced by Logitech at the time, and a variant is still produced by Livescribe. An example of a page from the Athena openlogbook is shown below.

One of the neat aspects of recording notes in a logbook this way was that the digital pen allowed accurate timestamping. So it was possible to go back quite precisely to find out when the scientist had written “problems with the trigger” and what was actually happening to the trigger in the data recorded from that instrument. We actually envisaged recording audio and even video continuously in the hut, so that it would be possible to go back and review exactly what had happened that night at 3am.

There were mixed feelings about this, since it meant the scientists would be under video surveillance. No more nose-picking or lewd jokes (you’d be surprised what scientists get up to at 3am). On the other hand, the notion of being able to track of being able to “replay” the experiment in this way was attractive But the project was a low-budget effort based on summer student manpower, a valuable but ephemeral resource at CERN, so we never got that far.

We did, however, get as far as using the digital logbooks during a real experimental run, in fact ATHENA’s last run before it shut down to transform into ALPHA. We even wrote up the results, which were very promising, for the CERN Courier. The main point was that for the first time, scientists in remote labs were able to review and interact with those in the ATHENA hut in nearly real time, commenting by email on things they read in the electronic logbook. We got a massive thumbs up from the ATHENA coordinator.

My colleagues and I made a valiant effort to get the technology adopted by the LHC experiment ATLAS. In fact, the team in charge of their digital logbook showed quite some interest. But a committee had ruled that there would be no paper logbooks in the ATLAS control room, so our particular approach would not fly. (We later heard that ATLAS scientists secretly smuggled paper into the control room, so strong was the urge to keep notes using this venerable medium).

Besides the issue of paper versus electronic recording, though, there is an aspect of openlogbook that still seems to me valuable for science, namely the idea of sharing data via a web-based logbook … a blog. Of course, ATHENA did not publish their data publicly – it was on a password-protected website. But the value of blogging data this way, even within a community, was clear. And the value of releasing this stream of data to the wider scientific community could be very powerful, indeed.

For example, in conventional scientific articles, there is always a methods section, which explains how the experiment was carried out. This is usually a staccato summary that makes it quite hard for anyone to reproduce it. Imagine if instead, you could simply review every moment of the experimental setup via a selection of key passages from such a blog. Overwhelming, perhaps. But if that became standard, it could make it very hard for scientists to perpetrate fraud in scientific papers. The point here is that blogging suddenly becomes useful to the scientist.

Openlogbook was a successful experiment, but it was probably a shade ahead of its time. The technology has moved on since then, and most importantly, the mindset of a younger generation of scientists is more open to such an approach. Maybe one day, we’ll get round to putting the log back in blog.

EPILOGUE: A final digression about Michael Nielsen and his inspiring book. I stumbled on Michael while he was preparing the book, via his blog. The reason was that I was looking for some good solid data on the web about the “long tails” that so famously characterize various types of internet behaviour. Turns out that, despite the success of the book “The Long Tail” by Wired editor Chris Anderson, there’s a paucity of real data to back this concept up (I mean graphs with numbers and fits to power laws and that sort of thing). I never found much data. But what I found on Michael’s blog, thanks to Google, was that he, too, had been looking for such data. One thing led to another, and I ended up chatting with him by skype, and later recommending him to the Open Society Foundations to interview about Open Science. They actually funded his book and subsequent book tour. So that is a case of a blog producing some very useful benefits for a scientist.