Current and Past Research Highlights
Strain-level metagenome deconvolution. Microbial communities in many environments include distinct lineages of closely related organisms, which have proved challenging to separate in metagenomic assembly. It is difficult to distinguish between read errors and real polymorphisms between bacterial strains, but high-fidelity (HiFi) long reads have the potential to solve this issue. Here we recovered 428 complete or nearly-complete bacterial genomes from a single sheep gut metagenomic sample, the highest resolution achieved with metagenomic deconvolution to date. HiFi assembly has resolved many closely-related microbial lineages into distinct contigs, proving to be a powerful tool to characterize complex heterogeneous environments.
Metagenome assembly with metaFlye. Shotgun metagenomic assembly is a powerful method to characterize complex microbial communities (such as human gut or tumor microenvironments). Until recently, metagenome assemblies based on short reads (such as Illumina) were highly fragmented and incomlete (e.g. missing 16S genes). To enable long-read based analysis, we developed metaFlye, the first dedicated method for long-read metagenomic assembly. Using metaFlye we reconstructed many complete bacterial genomes from various metagenomic communities. We also showed that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products (such as Colibactin).
Long-read assembly using Flye. The new long-read sequencing technologies (such as Pacific Biosciences or Oxford Nanopore) increased the read length up to tens of thousands of nucleotides, and substantially improved the quality of many genome assemblies. These technologies, however, are facing the challenge of the high error rates. We have created the Flye algorithm for assembly of long and error-prone reads to address this challenge. Flye is using the novel repeat graph framework, which enables fast and accurate assemblies of various organisms. In particular, Flye is good for assembly of human genomes using ultra-long Oxford Nanopore sequencing data (such as NA12878 or CHM13).
Comparative assembly using multiple references. Since many de novo assemblies of large genomes are still incomplete, one can use the information for related reference genomes to order and orient the contig fragments. We have developed Ragout that infers structural rearrangements between the multiple input references and reconstructs the most probable architecture of a target genome. We used Ragout to produce chromosome assemblies of multiple mice genomes, which gave insights into rodent genome evolution and novel functional loci. Mouse assemblies were generated as a part of Mouse genomes sequencing project, hosted by Wellcome Sanger Institute.
Tools for assembly graphs analysis. The analysis of genome graphs is helpful in studying repeat structure of genomes (for example, mosaic segmental duplications in humans). To visualize large and complex assembly graphs, we developed AGB - an interactive graph visualization tool. We have also introduced a new Synteny Paths approach for comparison of two related genomes in a graph from, similarly to synteny block for linear genomes. The tools were developed in a collaboration with the Center for Algorithmic Biotechnology and Bioinformatics Institute in St. Petersburg, Russia.