font size
Sign inprintPrint
INFORMATION TECHNOLOGY

US Unveils $200 Million Big Data Initiative

Money will be used to advance means of analyzing growing volume of digital data.

MARIE DAGHLIAN

The Burrill Report

“The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery.”
The Obama Administration has launched a Big Data Research and Development Initiative in conjunction with six federal departments and agencies, making a $200 million commitment to improve the tools and techniques needed to access, organize, and glean discoveries from the huge volumes of data being generated around the world.

“In the same way that past federal investments in information technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental, and biomedical research, education, and national security,” says John Holdren, assistant to the President and director of the White House Office of Science and Technology Policy.

The National Institutes of Health and the National Science Foundation will be among the top recipients of funding, along with the Departments of Defense and Energy, and the U.S. Geological Survey. Besides advancing the development of technologies needed to collect, store, preserve, manage, and analyze the growing volume of digital data being generated on a daily basis, the initiative aims to harness the knowledge gleaned to advance scientific discovery, strengthen national security, and transform teaching and learning. Money will also be allocated to expand the skilled workforce needed to make sense of it all.

One of the first commitments is a joint solicitation supported by the NSF and NIH to advance the core scientific information from large and diverse data sets being generated by advances in sequencing and genomics that could accelerate scientific discoveries that improve human health.

In conjunction with the Big Data Initiative, the NIH announced that its 1,000 Genomes Project, an international public-private attempt to build the most detailed map of human genetic variation available, is now publicly available on Amazon’s Cloud services.
“The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation's health and economy,” says NIH Director Francis Collins, who supports the Big Data Initiative.

Among the NIH components participating in the Big Data initiative are the National Human Genome Research Institute (NHGRI) and the NIH National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine. NHGRI played a lead role in organizing and funding the international 1000 Genomes Project. NCBI, along with the European Bioinformatics Institute, which is located in the United Kingdom, began making 1,000 Genomes Project data freely available to researchers in 2008. Besides the NIH, major supporters include the Wellcome Trust and BGI.

The project’s organizers hope to eventually analyze the genomes of 2,600 people from 26 populations around the world. It began with three pilot studies in 2008 that assessed strategies for producing a catalog of genetic variants that are present at 1 percent or greater in the populations studied. Data from the pilot studies were released on Amazon Web Services in 2010. The data now being released in the cloud include results from sequencing the DNA of some 1,700 people; the remaining 900 samples will be sequenced in 2012 and that data will be released to researchers as soon as possible. The new results identify genetic variation occurring in less than 1 percent of the study populations and which may make important genetic contributions to common diseases, such as cancer or diabetes.

Since the project's launch, the data set has grown enormously: At 200 terabytes—the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs—the current 1,000 Genomes Project records are a prime example of big data that has become so massive that few researchers have the computing power to use them.

To help solve that problem, Amazon Web Services agreed to post the data for free as a public data set, providing a centralized repository on its cloud. It lets any researcher access and analyze the data at a fraction of the cost it would take for their institution to acquire the needed internet bandwidth, data storage and analytical computing capacity.

“Improving access to data from this important project will accelerate the ability of researchers to understand human genetic variation and its contribution to health and disease,” says NHGRI director Eric Green.

Cloud access also enables users to analyze the data much more quickly, as it eliminates the time-consuming download of data and because users can run their analyses over many servers at once. “Putting the data in the cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so,” says Richard Durbin, co-director of the 1,000 Genomes Project and joint head of human genetics at the Wellcome Trust Sanger Institute.


April 06, 2012
http://www.burrillreport.com/article-us_unveils_200_million_big_data_initiative_.html

[Please login to post comments]

Other recent stories

Sign Up to recevie the Burrill Weekly Brief


Follow burrillreport on Twitter