Most geneticists would as soon share a toothbrush than adapt to the issue of using a consistent ontology. This is something we really have to come to grips with.
During a recent bit of home computer maintenance I moved several gigabytes of family photos from one hard drive to another. I know there are some wonderful snapshots hidden among the seemingly infinite 0’s and 1’s encoding my memories. But given the volume of data, it will undoubtedly take me hours to find them. Even once I’ve identified the good stuff, I’m haunted by knowing that my wife has captured gigabytes of her own photos of many of the same events and people from different angles, named them different things, and filed them away with different organizational schemes on her own drives and social networks.
Perhaps that’s why I was so heartened to hear I’m not alone. The problems of managing the massive amount of data created by the genomics revolution is dogging the biomedical R&D and healthcare delivery world too, says George Poste.
“Information is not knowledge. Information is data. How will we use advanced computing to create usable, actionable knowledge? That is the challenge,” says Poste.
Poste, chief scientist in Arizona State University’s Complex Adaptive Systems Initiative, identified the challenge while speaking at the Burrill Personalized Medicine Meeting October 4 in Burlingame, California.
To overcome the challenge, Poste says, the scientific community will need to not only understand how encoded genome information creates complex biological systems, but also begin to shift the way they think of diseases.
That shift, to mapping and recognizing the molecular signatures of disease, is the next step in a long historic evolution, he says. A medical establishment that was founded on the basis of superstition, blaming hormonal imbalances for disease, was transformed by the recognition of symptoms, and will be transformed again by the rational diagnosis of disease and personalized treatment approaches.
A big part of the problem is the incredible rate at which data is being generated, says Poste. We have arrived in a world of big data, Poste says, a world in which large-scale science projects routinely generate data sets demanding thousands of terabytes of storage capacity.
Genome sequencing platforms are accelerating the deluge of data, and as we add new data from exomes, epigenomes, and microbiomes, the problem just grows bigger.
However, the volume of data created is not the only problem, says Poste. Too much of the data has become unusable, either due to lack of standardization or statistical flaws.
“Most geneticists would as soon share a toothbrush than adapt to the issue of using a consistent ontology. This is something we really have to come to grips with,” Poste says.
To detangle the complex biological networks behind disease we’ll need not only to overcome these data storage and complexity problems, but also make a shift, from a world in which research and clinical activity is largely siloed into a more systems-based world built on data and computation-intensive methods in which structured data supports a world of cross-disciplinary and cross-sector integration of R&D.
Despite worrying about the tremendous pace at which technological and data-driven complexities are growing, Poste does see some projects capable of dealing with certain aspects of the deluge. The Chan Soon-Shiong Institute for Advanced Health in July said it would provide $100 million to support the development of a “national health intranet” to help genomic and proteomic researchers, and eventually healthcare providers, share huge data sets with each other at high speed in an effort to foster a new era of genomics-informed medical care.
While infrastructure and standardization address part of the issues facing biomedical research and development, making computational tools more accessible to clinicians will be important too, noted Carlos Santos, Vice President of Biovest International, a member of Poste’s audience at the meeting.
“We also need to raise the computational literacy of our top research scientists,” says Santos. “They almost need to be part-time computer scientists. We’re treating complex disease, but clinicians don’t have the tools that they need.”
Arming young scientists and clinicians with improved programming, statistical, and analytical skills for using advanced computation will be a challenge but can no doubt pay enormous dividends in the future.
But it will be worth it in the end. “Ideally, we want computers to do the heavy lifting and assists and people to do the thinking,” says Poste.
That’s my kind of world.
October 07, 2011
http://www.burrillreport.com/article-thriving_in_the_data_deluge.html