S. Swat, A. Laskowski, J. Badura, A. Ćwiercz, W. Frohmberg, P. Wojciechowski, M. Kasprzak, J. Blazewicz
Reconstruction de novo of a genome sequence is a great challenge, largely due to computational difficulties connected with processing millions of reads at once. ALGA is a new method realizing this process and is based on the overlap-layout-consensus approach. The approach consists of three phases: construction of the overlap graph, preparation of the graph for traversal and agreement of final sequences. It is generally viewed as more accurate than the so-called de Bruijn graph approach, but much more consuming in the sense of time and memory. Several new ideas were implemented in order to increase efficiency at each of the phases, including a number of heuristics designed to effectively simplify the overlap graph's structure during the second phase as well as during the graph creation. ALGA was tested on a few real data sets, including whole human genome, and the results were evaluated with the standard tool QUAST. In comparison to other assemblers, ALGA provides very good results according to metrics such as genome coverage fraction, length of resulting sequences and occurrences of misassemblies.
Keywords: Genome assembly de novo, heuristics
Scheduled
TB2 Bioinformatics
June 10, 2021 11:15 AM
2 - LV Kantorovich