Unsupervised ensemble learning for genome sequencing
Keywords: 
Área Ciencias de la Computación y Tecnología Informática
Expectation maximization algorithm
Variant calling
Genome sequencing
Unsupervised multi-class ensemble
Classifier
GATK
Framework
Issue Date: 
2022
ISSN: 
0031-3203
Note: 
This is an open access article under the CC BY-NC-ND license
Citation: 
Pages-Zamora, A.; Ochoa-Álvarez, I. (Idoia); Ruiz-Cavero, G.; et al. "Unsupervised ensemble learning for genome sequencing". Pattern Recognition. 129, 2022, 108721
Abstract
Unsupervised ensemble learning refers to methods devised for a particular task that combine data pro-vided by decision learners taking into account their reliability, which is usually inferred from the data. Here, the variant calling step of the next generation sequencing technologies is formulated as an unsuper-vised ensemble classification problem. A variant calling algorithm based on the expectation-maximization algorithm is further proposed that estimates the maximum-a-posteriori decision among a number of classes larger than the number of different labels provided by the learners. Experimental results with real human DNA sequencing data show that the proposed algorithm is competitive compared to state-of -the-art variant callers as GATK, HTSLIB, and Platypus.(c) 2022 The Author(s). Published by Elsevier Ltd.This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

Files in This Item:
Thumbnail
File
pdf.pdf
Description
Size
1.41 MB
Format
Adobe PDF


Statistics and impact

Items in Dadun are protected by copyright, with all rights reserved, unless otherwise indicated.