New software processes huge amounts of single-cell data

Comprehensive analysis of large gene-expression datasets

13-Feb-2018 - Germany

Scientists from the Helmholtz Zentrum München have developed a program that is able to help manage enormous datasets. The software, named Scanpy, is a candidate for analyzing the Human Cell Atlas.

Helmholtz Zentrum München

Visualization of gene expression patterns of murine brain cells generated with Scanpy.

“It’s about analyzing gene-expression data of a large number of individual cells,” explains lead author Alex Wolf of the Institute of Computational Biology (ICB) at Helmholtz Zentrum München. He developed Scanpy together with his colleague Philipp Angerer in the Machine Learning Group of Prof. Dr. Dr. Fabian Theis. In addition to his position at Helmholtz Zentrum, Theis is also a professor of mathematical modelling of biological systems at the Technical University of Munich. “New technical advances generate several orders of magnitude more data with a correspondingly greater information content,” Theis says. “However, the historically evolved software infrastructure for gene-expression analysis simply wasn’t designed to cope with the new challenges. New analytic methods are therefore needed.”

The race for the Human Cell Atlas

According to Theis, a major international research project could also benefit from the software. A team of international scientists is compiling a reference database, called the Human Cell Atlas, which holds data on the gene activity of all human cell types. “For this project, and in a growing number of other projects in which databases are combined, it is important to have scalable software,” says Theis. It is therefore no surprise that Scanpy is currently a candidate for helping to analyze the Human Cell Atlas.

“The publication of Scanpy marks the first software that allows comprehensive analysis of large gene-expression datasets with a broad range of machine-learning and statistical methods,” explains Wolf, describing the achievement. “The software is already being used by a number of groups around the world, notably at the Broad Institute of Harvard University and the Massachusetts Institute of Technology, MIT.”

Technologically, the application is a trailblazing development: Whereas biostatistics programs are traditionally written in the programming language R, Scanpy is based on the Python language, the dominant language in the machine learning community. Another new feature is that graph-based algorithms lie at the heart of Scanpy. Unlike the usual approach of regarding cells as points in a coordinate system within gene-expression space, the algorithms use a graph-like coordinate system. Instead of characterizing a single cell by the expression value for thousands of genes, the system simply characterizes cells by identifying their closest neighbors – very much like the connections in social networks. In fact, to identify cell types, Scanpy uses the same algorithms as Facebook does for identifying communities.

Original publication

Other news from the department science

Most read news

More news from our other portals

Last viewed contents

Easy detection of materials - Low-cost, mobile near-infra-red spectrometer provides information about different substances

Easy detection of materials - Low-cost, mobile near-infra-red spectrometer provides information about different substances

Detecting nanoplastics – in fractions of a second - Basis for new measurement devices

Detecting nanoplastics – in fractions of a second - Basis for new measurement devices

New palm-sized microarray technique grows 1,200 individual cultures of microbes

“Two in One” enzyme: unusually flexible Scientists from the RUB have solved the structure of a viral protein - The Journal of Biological Chemistry has ranked this documentation as “Paper of the Week.”

“Two in One” enzyme: unusually flexible Scientists from the RUB have solved the structure of a viral protein - The Journal of Biological Chemistry has ranked this documentation as “Paper of the Week.”

Shedding light on protein interaction networks in a developing organism

Determining the structures of nanocrystalline pharmaceuticals by electron diffraction

Determining the structures of nanocrystalline pharmaceuticals by electron diffraction

Artificial opals measure temperature and time - Researchers discover novel sensors

Artificial opals measure temperature and time - Researchers discover novel sensors

FRITSCH Milling and Sizing opens second subsidiary in China

FRITSCH Milling and Sizing opens second subsidiary in China

Super-resolved imaging of a single cold atom on a nanosecond timescale - Scientists have made important progress in the research of cold atom super-resolution imaging

Super-resolved imaging of a single cold atom on a nanosecond timescale - Scientists have made important progress in the research of cold atom super-resolution imaging

Artificial intelligence boosts super-resolution microscopy - New generative model calculates images more efficient than established approaches

Artificial intelligence boosts super-resolution microscopy - New generative model calculates images more efficient than established approaches

Elucidation of vibration energy of a single molecule in an external force field

Cell labelling method from microscopy adapted for use in whole-body imaging for the first time - Researchers develop imaging methods to examine bodily processes from the individual building blocks to the whole system

Cell labelling method from microscopy adapted for use in whole-body imaging for the first time - Researchers develop imaging methods to examine bodily processes from the individual building blocks to the whole system