Robust active learning with binary responses

López-Fidalgo, J. (Jesús); Wiens, D.P. (Douglas P.)

Autor(es):

López-Fidalgo, J. (Jesús)

Wiens, D.P. (Douglas P.)

Palabras clave :

Logistic regression
Loss function
Mislabelling
Optimal subsampling
Probit regression
Sequential sampling

Fecha de publicación :

2022

Editorial :

Elsevier

ISSN :

0378-3758

Nota:

This is an open access article under the CC BY-NC-ND license

Cita:

López-Fidalgo, J. (Jesús); Wiens, D.P. (Douglas P.). "Robust active learning with binary responses". Journal of Statistical Planning and Inference. (220), 2022, 1 - 14

Resumen

We introduce a method of Robust Learning (‘robl’) for binary data, and propose its use in situations where Active Learning is appropriate, and where sampling the predictors is easy and cheap, but learning the responses is hard and expensive. We seek robustness against both modelling errors and the mislabelling of the binary responses. Thus we aim to sample effectively from the population of predictors, and learn the responses only for an ‘influential’ sub-population. This is carried out by probability weighted sampling, for which we derive optimal ‘unbiased’ sampling weights, and weighted likelihood estimation, for which we also derive optimal estimation weights. The robustness issues can lead to biased estimates and classifiers; it is somewhat remarkable that our weights eliminate the mean of the bias – which is a random variable as a result of the sampling – due to both types of errors mentioned above. These weights are then tailored to minimize the mean squared error of the predicted values. Simulation studies indicate that when bias is of significant concern, robl allows for substantial reductions, relative to Passive Learning, in the prediction errors. The methods are then illustrated in real-data analyses.

URI :

https://hdl.handle.net/10171/63880

DOI:

10.1016/j.jspi.2022.01.004

Aparece en las colecciones:

DA-TECNUN- ECOPYME, DPI2015-70832-R - Artículos de revista

Ficheros en este ítem: