Robust active learning with binary responses
Palabras clave : 
Logistic regression
Loss function
Mislabelling
Optimal subsampling
Probit regression
Sequential sampling
Fecha de publicación : 
2022
Editorial : 
Elsevier
ISSN : 
0378-3758
Nota: 
This is an open access article under the CC BY-NC-ND license
Cita: 
López-Fidalgo, J. (Jesús); Wiens, D.P. (Douglas P.). "Robust active learning with binary responses". Journal of Statistical Planning and Inference. (220), 2022, 1 - 14
Resumen
We introduce a method of Robust Learning (‘robl’) for binary data, and propose its use in situations where Active Learning is appropriate, and where sampling the predictors is easy and cheap, but learning the responses is hard and expensive. We seek robustness against both modelling errors and the mislabelling of the binary responses. Thus we aim to sample effectively from the population of predictors, and learn the responses only for an ‘influential’ sub-population. This is carried out by probability weighted sampling, for which we derive optimal ‘unbiased’ sampling weights, and weighted likelihood estimation, for which we also derive optimal estimation weights. The robustness issues can lead to biased estimates and classifiers; it is somewhat remarkable that our weights eliminate the mean of the bias – which is a random variable as a result of the sampling – due to both types of errors mentioned above. These weights are then tailored to minimize the mean squared error of the predicted values. Simulation studies indicate that when bias is of significant concern, robl allows for substantial reductions, relative to Passive Learning, in the prediction errors. The methods are then illustrated in real-data analyses.

Ficheros en este ítem:
Vista previa
Fichero
1-s2.0-S0378375822000040-main.pdf
Descripción
Tamaño
1.64 MB
Formato
Adobe PDF


Estadísticas e impacto

Los ítems de Dadun están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.