A. Laskowski, S. Swat, J. Badura, P. Wojciechowski, A. Swiercz, M. Kasprzak, J. Blazewicz
The aim if this work is calculating similarity of genomic data, which is a crucial task. We could use similarity measures to decide whether given data comes from the same species or not. Given that similarity is high enough, we could calculate the number of structural variants among a group of individuals from the same species. Those measurements should make us better understand of the evolution process. Additionally this should improve our understatement of how individuals from across the world mix with each other. Previously, analyses like that could only be performed with assembled data since state-of-the-art methods, ANI and Mash, are calculating the similarity of genomes. The assembling process is highly time-consuming, and for calculation of the human genome could take more than 12 hours. In this work, we would like to introduce the assembly-free process of comparing genomic data in the form of long-read sequences produced by NGS sequencers like Illumina. In contrast to state-of-the-art methods, our method could produce results within minutes.
Keywords: de-novo, assembly, algorithms
Scheduled
TB2 Bioinformatics
June 10, 2021 11:15 AM
2 - LV Kantorovich