What Does The Exponential Growth Of Genomics Data Actually Mean?

Wednesday, March 20, 2013

genomics


 Genomics
DNA sequence data is growing exponentially. For most of the genes that we identify, we have no idea of their biological functions, says Miami computational biologist Iddo Friedberg. Only by a group effort can the field move forward and learn to harness the deluge of genomic data, turning it into useful information. 
We now live in the post-genomic era, when DNA sequence data is growing exponentially. Soon, having your entire genome sequenced will be an affordable and possibly routine diagnostic tool for personalized medicine.

According to Iddo Friedberg, a computational biologist at Miami University, “for most of the genes that we identify, we have no idea of their biological functions. They are like words in a foreign language, waiting to be deciphered.”

Friedberg works in the new bioinformatics research field of computational prediction of protein and gene function. He and colleagues Predrag Radivojac, associate professor of computer science and informatics, Indiana University, Bloomington, and Sean Mooney, associate professor, Buck Institute for Research on Aging, are the leaders of CAFA (Critical Assessment of Function Annotation).

CAFA is a new community-wide experiment to assess the performance of the multitude of methodologies developed by research groups worldwide to “help channel the flood of data from genome research to deduce the function of proteins,” Friedberg explained.

Thirty research groups participated in the first CAFA, presenting a total of 54 methods. The results are published in an article in Nature Methods co-authored by all the participating groups, with Friedberg and Radivojac as lead authors.

The research was published this year in Nature Methods.


Concurrently, 15 articles edited by Friedberg and Radivojac were published in BMC Bioinformatics. These articles are companions to the Nature Methods article, and describe the top-ranking methods in-depth.

The purpose of CAFA is to establish “Accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications.”

The accurate annotation of protein function is key to understanding life at the molecular level and has great biochemical and pharmaceutical implications, explain the study authors; however, with its inherent difficulty and expense, experimental characterization of function cannot scale up to accommodate the vast amount of sequence data already available.

Friedberg and Radivojac explain:

The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology. 
Recently, the availability of genomic-level sequence information for thousands of species, coupled with massive high-throughput experimental data, has created new opportunities as well as challenges for function prediction.

Many methodologies have been developed by research groups worldwide, many based in comparing unsolved sequences with databases of proteins whose functions are known. Other methods aim at mining the scientific literature associated with some of these proteins, yet others combine sophisticated machine-learning algorithms with an understanding of biological processes to decipher what these proteins do, said Friedberg.

“Indeed, we may have already identified a protein that is an ideal drug target for cancer, but it is lost in the myriad of data labeled as ‘function unknown.’”

“Only by a group effort can we move the field forward and learn to harness the deluge of genomic data, turning it into useful information.”

The growth of biological databases
The growth of biological databases (through 2009 – growth has increased even more).The red line is the growth of protein sequences deposited in TrEMBL, a comprehensive protein sequence database. The blue line illustrates the growth proteins in TrEMBL whose function is known, or at least can be predicted with some reasonable accuracy. The green line is the growth in the proteins whose 3D structure has been solved. Note the logarithmically increasing gap between what we know (blue) and what we do not know (red). Image Source: Predrag Radivojac.
"This is despite the competitive environment in which research groups want their methods to perform better than their peers' methods. Overall, throughout CAFA there was a highly collegial spirit, and a willingness to share information and science.

Everyone recognized that this is an important endeavor, and that only by a group effort can we move the field forward and learn to harness the deluge of genomic data, turning it into useful information."

“We will continue running CAFA in the future."

“For the fist time we have broad insight into what works, where improvement is needed, and how we should move the field forward.

We will continue running CAFA in the future, as we are confident it will only help generate better methods to understand the information locked in our genomes, and those of other organisms," states Friedberg.

SOURCE  Miami University

By 33rd SquareSubscribe to 33rd Square


Enhanced by Zemanta

0 comments:

Post a Comment