Science and technology

A revolutionary approach to deciphering evolution

ASU professor describes new method for creating accurate, scalable phylogenomic trees, offering new insights into the history of species

Illustration of a phylogenetic tree. Courtesy Anaissa Ruiz-Tejada

By Anaissa Ruiz-Tejada |
August 28, 2023

For decades, researchers have been trying to understand evolution using phylogenomic trees. These branching structures are intricate maps, revealing the ancestral connections among different species. Each link in the tree’s branches signifies a bond stretching across time, linking species through their shared heritage.

But building phylogenomic trees is no easy task. Traditional methods have relied on analyzing single-marker genes, like the 16S rRNA gene, to construct these evolutionary maps. While they worked well for smaller groups of organisms, they struggled to accurately portray the relationships when dealing with hundreds of thousands of species. Other methods use genome-wide data, which might be more accurate but require scalable algorithms and high computing power. Thus, none of these methods are ideal for generating accurate and large phylogenomic trees.

Solving the puzzle of evolutionary research

Assistant Professor Qiyun Zhu from the School of Life Sciences and the Biodesign Center for Fundamental and Applied Microbiomics at Arizona State University recently published an article in the prestigious journal Nature Biotechnology, written in collaboration with other researchers from the University of California San Diego, that introduces a new approach to creating accurate and scalable phylogenomic trees.

In the realm of evolutionary science, researchers compare the DNA sequences from many organisms and determine the differences between them. These differences serve as clues to deciphering the intricate relationships between species. Several differences indicate a greater genetic divergence, placing species at a distance on the phylogenomic tree. Conversely, lesser differences reflect a closer relationship, illustrating the common threads that bind closely related species together.

To create such connections, researchers have to use genetic information, which often includes DNA sequences. DNA is composed of nucleotides (A, T, C and G) that come together in base pairs. Arranged in a sequence, these base pairs form the genes that determine the characteristics of each organism. The collection of all genes in a single organism is called a genome, and these can vary in length but may contain thousands to billions of base pairs.

In their recent publication, Zhu and colleagues propose a new method, named uDance, to generate phylogenomic trees. Imagine the tree as a puzzle being assembled in pieces, each refined and polished before being combined to create the final masterpiece.

Zhu and his colleagues recognized the limitations of traditional methods and the scalability issues of genome-wide data approaches. So they combined the best of both worlds.

“Central to the new computational approach is the divide-and-conquer strategy,” Zhu explained. “This strategy involves breaking down the full tree into smaller subtrees, each of which can be generated accurately and efficiently.”

Qiyun Zhu

Once the smaller subtrees have been created, the algorithm pulls information that might relate them to neighboring trees. Then it connects them to create a complete phylogenomic tree. With this approach, brand new trees can be created, or existing trees can be improved upon.

What makes uDance truly remarkable is its ability to update existing trees. Instead of starting to assemble the puzzle from scratch, uDance allows researchers to incorporate these new pieces seamlessly into the evolving tree, refining different sections independently.

“Researchers can use uDance to determine the evolutionary trajectories of pathogenic strains over decades by incorporating the subtle differences found through massive sequencing efforts,” Zhu said.

This method not only ensures higher accuracy in the trees but also makes the process scalable. The uDance approach resulted in a species tree with 42.5 billion amino acid residues. This accomplishment would have been almost unimaginable just a few years ago.

“With this new framework, researchers can now aim to accomplish those long-standing mission impossibles, such as inferring the evolutionary relationships among all genomes ever sequenced. There are about 2 million of them as of 2023,” Zhu said.