High-throughput DNA sequencing technologies are leading to a revolution in how clinicians diagnose and treat cancer. The molecular profiles of individual tumors are beginning to be used in the design of chemotherapeutic programs optimized for the treatment of individual patients. The real revolution, however, is coming with the emerging capability to inexpensively and accurately sequence the entire genome of cancers, allowing for the identification of specific mutations responsible for the disease in individual patients.
There is only one downside. Those sequencing technologies provide massive amounts of data that are not easily processed and translated by scientists. That’s why Georgia Tech has created a new data analysis algorithm that quickly transforms complex RNA sequence data into usable content for biologists and clinicians. The RNA-Seq analysis pipeline (R-SAP) was developed by School of Biology Professor John McDonald and Ph.D. Bioinformatics candidate Vijay Mittal. Details of the pipeline are published in the journal Nucleic Acids Research.
“A major bottleneck in the realization of the dream of personalized medicine is no longer technological. It’s computational,” said McDonald, director of Georgia Tech’s newly created Integrated Cancer Research Center. “R-SAP follows a hierarchical decision-making procedure to accurately characterize various classes of gene transcripts in cancer samples.”
There are at least 23,000 pieces of RNA in the human genome that encode the sequence of proteins. Millions of other pieces help regulate the production of proteins. R-SAP is able to quickly determine every gene’s level of RNA expression and provide information about splice variants, biomarkers and chimeric RNAs. Biologists and clinicians will be able to more readily use this data to compare the RNA profiles or “transcriptomes” of normal cells with those of individual cancers and thereby be in a better position to develop optimized personal therapies.
Personalized approaches to cancer medicine are already in widespread use for a few “cancer biomarkers” including variants of the BRAC 1 gene that can be used to identify women with a high risk of developing breast and ovarian cancer.
“Our goal was to design a pipeline that is easily installable with parallel processing capabilities,” said Mittal. “R-SAP can make 100 million reads in just 90 minutes. Running the program simultaneously on multiple CPUs can further decrease that time.”
R-SAP is open source software, freely accessible at the McDonald Lab website.
“This is another example of Georgia Tech’s ability to merge computer technology with science to create an essential feature of next-generation bioinformatics tools,” said McDonald. “We hope that R-SAP will be a useful and user-friendly instrument for scientists and clinicians in the field of cancer biology.”
R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data. Vinay K. Mittal and John F. McDonald. Nucleic Acids Research, 2012, 1–12. doi:10.1093/nar/gks047
Georgia Institute of Technology