Robust active learning with binary responses
Keywords: 
Logistic regression
Loss function
Mislabelling
Optimal subsampling
Probit regression
Sequential sampling
Issue Date: 
2022
Publisher: 
Elsevier
ISSN: 
0378-3758
Note: 
This is an open access article under the CC BY-NC-ND license
Citation: 
López-Fidalgo, J. (Jesús); Wiens, D.P. (Douglas P.). "Robust active learning with binary responses". Journal of Statistical Planning and Inference. (220), 2022, 1 - 14
Abstract
We introduce a method of Robust Learning (‘robl’) for binary data, and propose its use in situations where Active Learning is appropriate, and where sampling the predictors is easy and cheap, but learning the responses is hard and expensive. We seek robustness against both modelling errors and the mislabelling of the binary responses. Thus we aim to sample effectively from the population of predictors, and learn the responses only for an ‘influential’ sub-population. This is carried out by probability weighted sampling, for which we derive optimal ‘unbiased’ sampling weights, and weighted likelihood estimation, for which we also derive optimal estimation weights. The robustness issues can lead to biased estimates and classifiers; it is somewhat remarkable that our weights eliminate the mean of the bias – which is a random variable as a result of the sampling – due to both types of errors mentioned above. These weights are then tailored to minimize the mean squared error of the predicted values. Simulation studies indicate that when bias is of significant concern, robl allows for substantial reductions, relative to Passive Learning, in the prediction errors. The methods are then illustrated in real-data analyses.

Files in This Item:
Thumbnail
File
1-s2.0-S0378375822000040-main.pdf
Description
Size
1.64 MB
Format
Adobe PDF


Statistics and impact

Items in Dadun are protected by copyright, with all rights reserved, unless otherwise indicated.