Writer Paul McFedries on the need to create digital libraries for storing the enormous data sets generated by computers today.
In a new article for IEEE Spectrum, writer Paul McFedries talks about the emergence of a new data-intensive science as a result of huge data sets generated by computers. With new experiments such as the human genome project, the mapping of brain’s neural circuitry or those undertaken by massively complex machines such as the Large Hadron Collider, computers are now able to generate petabytes (1PB=1 million gigabytes) of data in a year. And these huge data sets with their equally large computations require highly complex database tools to help scientists extract meaningful information from them.
McFedries recommends building more digital libraries to store these enormous, but highly valuable data sets.
As all this eResearch becomes more sophisticated and more valuable, data scientists are realizing that these humongous data sets need to be shared among multiple scientists, labs, and institutions. We’re starting to do a good job of making papers and other research end products more widely available, but what’s needed are more digital data libraries that store not only documents such as research papers but also the data on which those papers were based. Now all we need is for someone to come up with a Digital Dewey Decimal System to catalog all this data. A Dewey Binary System, perhaps?