Taking the gamble out of DNA sequencing

Scientists have developed an algorithm to predict how much can be learned in a large-scale DNA sequencing experiment

28-Feb-2013 - USA

Two USC scientists have developed an algorithm that could help make DNA sequencing affordable enough for clinics – and could be useful to researchers of all stripes.

Andrew Smith, a computational biologist at the USC Dornsife College of Letters, Arts and Sciences, developed the algorithm along with USC graduate student Timothy Daley to help predict the value of sequencing more DNA, to be published in Nature Methods.

Extracting information from the DNA means deciding how much to sequence: sequencing too little and you may not get the answers you are looking for, but sequence too much and you will waste both time and money. That expensive gamble is a big part of what keeps DNA sequencing out of the hands of clinicians. But not for long, according to Smith.

"It seems likely that some clinical applications of DNA sequencing will become routine in the next five to 10 years," Smith said. "For example, diagnostic sequencing to understand the properties of a tumor will be much more effective if the right mathematical methods are in place."

The beauty of Smith and Daley's algorithm, which predicts the size and composition of an unseen population based on a small sample, lies in its broad applicability.

"This is one of those great instances where a specific challenge in our research led us to uncover a powerful algorithm that has surprisingly broad applications," Smith said.

Think of it: how often do scientists need to predict what they haven't seen based on what they have? Public health officials could use the algorithm to estimate the population of HIV positive individuals; astronomers could use it to determine how many exoplanets exist in our galaxy based on the ones they have already discovered; and biologists could use it to estimate the diversity of antibodies in an individual.

The mathematical underpinnings of the algorithm rely on a model of sampling from ecology known as capture-recapture. In this model, individuals are captured and tagged so that a recapture of the same individual will be known – and the number of times each individual was captured can be used to make inferences about the population as a whole.

In this way scientists can estimate, for example, the number of gorillas remaining in the wild. In DNA sequencing, the individuals are the various different genomic molecules in a sample. However, the mathematical models used for counting gorillas don't work on the scale of DNA sequencing.

"The basic model has been known for decades, but the way it has been used makes it highly unstable in most applications. We took a different approach that depends on lots of computing power and seems to work best in large-scale applications like modern DNA sequencing," Daley said.

Other news from the department science

Most read news

More news from our other portals

Last viewed contents

LGC announces plans to expand forensic services with new facilities to be created in Runcorn, Cheshire

Synthetic Production of Potential Pharmaceuticals Dramatically Simplified by Scripps Research Team - Findings Could Expand Interest in Natural Products by Making Production More Cost-Effective

Light-induced shape shifting of MXenes - Femtosecond light creates switchable nano-waves in MXenes and moves the materials’ atoms at a record-breaking speed

Light-induced shape shifting of MXenes - Femtosecond light creates switchable nano-waves in MXenes and moves the materials’ atoms at a record-breaking speed

Brooks Instrument Joins Major Biotechnology Consortium Focused on Regenerative Drug Manufacturing - Brooks Instrument will contribute insights and precision fluid control technology to the Advanced Regenerative Manufacturing Institute’s BioFabUSA program

Brooks Instrument Joins Major Biotechnology Consortium Focused on Regenerative Drug Manufacturing - Brooks Instrument will contribute insights and precision fluid control technology to the Advanced Regenerative Manufacturing Institute’s BioFabUSA program

Promising discovery for a non-invasive early detection of Alzheimer's disease

Promising discovery for a non-invasive early detection of Alzheimer's disease

Innovations through hair-thin optical fibres - Study shows what miniaturised optical filters make possible

Innovations through hair-thin optical fibres - Study shows what miniaturised optical filters make possible

Wave of the future - Terahertz chips a new way of seeing through matter

Wave of the future - Terahertz chips a new way of seeing through matter

Using PETRA III to watch the disabling of a penicillin killer - Scientists observe in detail the binding and formation of covalent bonds of an inhibitor to a bacterial enzyme that disables common antibiotics

Using PETRA III to watch the disabling of a penicillin killer - Scientists observe in detail the binding and formation of covalent bonds of an inhibitor to a bacterial enzyme that disables common antibiotics

Eurofins acquires LGC Forensics

How pathogenic bacteria prepare a sticky adhesion protein

How pathogenic bacteria prepare a sticky adhesion protein

How the brain might compensate stress during learning

How the brain might compensate stress during learning

The Combined Nanoscopy Technique - Scientists develop a combined technique for studying cellular structures via high-resolution imaging