Unsupervised ensemble learning for genome sequencing
Palabras clave : 
Área Ciencias de la Computación y Tecnología Informática
Expectation maximization algorithm
Variant calling
Genome sequencing
Unsupervised multi-class ensemble
Classifier
GATK
Framework
Fecha de publicación : 
2022
ISSN : 
0031-3203
Nota: 
This is an open access article under the CC BY-NC-ND license
Cita: 
Pages-Zamora, A.; Ochoa-Álvarez, I. (Idoia); Ruiz-Cavero, G.; et al. "Unsupervised ensemble learning for genome sequencing". Pattern Recognition. 129, 2022, 108721
Resumen
Unsupervised ensemble learning refers to methods devised for a particular task that combine data pro-vided by decision learners taking into account their reliability, which is usually inferred from the data. Here, the variant calling step of the next generation sequencing technologies is formulated as an unsuper-vised ensemble classification problem. A variant calling algorithm based on the expectation-maximization algorithm is further proposed that estimates the maximum-a-posteriori decision among a number of classes larger than the number of different labels provided by the learners. Experimental results with real human DNA sequencing data show that the proposed algorithm is competitive compared to state-of -the-art variant callers as GATK, HTSLIB, and Platypus.(c) 2022 The Author(s). Published by Elsevier Ltd.This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )

Ficheros en este ítem:
Vista previa
Fichero
pdf.pdf
Descripción
Tamaño
1.41 MB
Formato
Adobe PDF


Estadísticas e impacto
0 citas en
0 citas en

Los ítems de Dadun están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.