While I am clearly a fan of citizen cyberscience, I would not for an instant pretend that it is some sort of scientific panacea. There are lots of problems where it is irrelevant or inapplicable.
But the point of this post is that the cases where citizen cyberscience will work – the range of what is possible with volunteer support on the Internet – is growing with the same breakneck speed as the Internet itself, pushing back the limits of what used to be possible at a furious rate. This is a powerful trend which scientists often don’t fully appreciate or adequately exploit.
For example, consider volunteer computing – just one variety of citizen cyberscience, mind you. Technically speaking this is a form of distributed computing, a way of sharing a multitude of tasks amongst myriad processors, of dividing a problem in order to conquer it. Seen in this light, volunteer computing is in the same conceptual league as supercomputing, cluster computing, grid computing and cloud computing. Each approach has its merits and its limitations, and so will be better suited for certain types of problems, less appropriate for others.
If you could imagine all the possible scientific computing problems graphed according to how much processing power they needed, versus how much data they consumed and produced while processing (input/output or I/O) then volunteer computing works best for the quadrant with very high processing requirements but comparatively small I/O. And it works particularly well if the input needed for processing one bit of data is completely independent from the outputs of all other processing tasks. In other words the problem can be divided into a multitude of independent sub-problems, or in the jargon is embarrassingly parallel.
But over the last decade, the size of the volunteer computing quadrant has grown in leaps and bounds. The processing power of home computers and laptops – and tablet computers and smart phones, for that matter, too – is constantly improving, doubling ever couple of years or so according to the famous law of Moore (although, strictly speaking, Moore’s law was a statement about the number of transistors on a chip, not about the harder-to-measure quantity, computing power, which does not necessarily scale with transistor number). Bandwidth to the home follows a similar exponential growth curve, as does the storage capacity of computers. And then there is the trend to multicore processors that has taken the personal computer market by storm in the last few years. This means that a typical laptop today may have two, four or even more processors that are “tightly coupled” – in other words, they can communicate with each other at tremendous speeds. This multicore architecture mimics the inklings of a supercomputer, which typically hosts thousands of tightly coupled processors in its bowels.
So the parameters that traditionally limit certain types of computing task to supercomputers – speed, bandwidth, memory and parallelism – are constantly being revised upwards in the consumer electronics market. And it is this market that volunteer computing exploits. Of course, there is a similar trend for the workhorses of other forms of distributed computing. Indeed, supercomputers today are often just glorified clusters of souped up PC processors. This is the basis of the successful design strategy of IBM’s Blue Gene supercomputers. This strategy enables Blue Gene to ride the wave of progress of the consumer processor market, rather than rely on expensive custom-designed chips.
Still, the amount of money invested in developing computers for the consumer market continues to dwarf anything that any company or organization can throw at other types of distributed computing infrastructure. So while supercomputers for science continue to improve, and science grids connecting clusters of computers in universities around the world continue to spread, the expanding horizon of what is possible on all the PCs, laptops and other computing devices that are connected to the Internet grows at a faster rate. This point was rammed home in 2007, when the volunteer computing project Folding@home, which runs protein folding simulations on home computers, became the first distributed computing system to reach what was, at the time, the Holy Grail of supercomputing speed, a petaflop, or a million million floating point operations per second.
To add insult to injury, the Stanford University researchers behind Folding@home were able to boast that they broke this barrier largely on the back of Playstation-3 processors, about 50,000 of which were contributing voluntarily to the project at the time the petaflop barrier was broken. Pimply teenage gamers beat supercomputing pros. You can imagine the shame. When I emailed news of this result to a colleague who was at a high-level meeting on supercomputer-based research, he got back to me later saying that the first reaction of the assembled experts was, effectively, to dissect this breakthrough and redefine the petaflop barrier so that it was impregnable to this sort of upstart challenger.
An anecdote, perhaps. But when trying to reason with scientists who are in the thralls of Myth #2, I often resort to such anecdotes to jolt them out of their complacency. Maybe your science can’t run on citizen-based processing power yet. But in three years time, things may look different. And three years time is just one grant cycle away, baby!
Talk is cheap, of course. And anecdotes are always anecdotal. The ultimate proof that citizen cyberscience is gaining ground on traditional forms of distributed computing – and that it might actually work for your science, too, comes from trying. And that’s exactly what some of my colleagues are going to do in a few weeks time at the first “Asia@Home Hackfest” at Academia Sinica in Taipei.
The idea of the first day of this two-day event is to focus on computing code for earthquake simulations, which uses a protocol called MPI (message passing interface) designed specifically for tightly coupled supercomputer processors. The goal is to show that with a bit of effort and a good dose of ingenuity, this ought to run quite effectively on ordinary PCs. Or at least, on machines that volunteers are willing to make available to researchers.
This last point is a subtle and important distinction. The Internet connects a huge range of computing devices. And volunteer computing need not be constrained by what an “ordinary” or “average” PC or laptop can do. It is quite easy to specify that a volunteer computing project should only be distributed to machines with certain performance parameters. And nothing stops a scientist from setting those parameters very high. The “long tail” of the Internet, when it comes to volunteer computing, stretches out to amazing levels of performance, achieved by PC enthusiasts who build their own blazingly fast machines and even hack together garage-sized clusters.
We’ll see in a few weeks time whether a strategy to deliberately exploit this long tail will pay off. If it does, yet another barrier to volunteer computing may come crashing down. And Myth #2, the myth that “it won’t work for my science”, will have been exploded once again.
Addendum: this post has focused uniquely on volunteer computing to illustrate Myth #2, because that is an area where citizen cyberscience is up against some entrenched views and values, fueled by powerful interests that keep large scientific data centres and supercomputing initiatives going. When it comes to volunteer thinking – distributing tasks via the Web for people to do on their computers – there are similar dogmas that need to be challenged. For example the view that computers will be able to do any pattern recognition problem better than humans (wrong!) or that the work of untrained humans can never be trusted for scientific purposes (wrong again – I’ll come back to this when I discuss Myth #4). However, the challenge of overcoming ingrained prejudices is not as severe in this case, and I would venture that this is simply because the competition to volunteer thinking is not as well resourced.