Consider the supervised learning setting where the goal is to learn to predict labels given points from a distribution. An \textitomnipredictor for a class of loss functions and a class of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in for every loss in . Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where , but much less is known about the regression setting where can be continuous. Our main conceptual contribution is the notion of \textitsufficient statistics for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Our key technical contribution is a bound of on the -approximate rank of convex, Lipschitz functions on the interval , which we show is tight up to a factor of . This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class . We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting.
AISTATS 2024
Faster Recalibration of an Online Predictor via Approachability
Predictive models in ML need to be trustworthy and reliable, which often at the very least means outputting calibrated probabilities. This can be particularly difficult to guarantee in the online prediction setting when the outcome sequence can be generated adversarially. In this paper we introduce a technique using Blackwell’s approachability theorem for taking an online predictive model which might not be calibrated and transforming its predictions to calibrated predictions without much increase to the loss of the original model. Our proposed algorithm achieves calibration at a faster rate than existing techniques \citepkuleshov and is the first algorithm to offer a flexible tradeoff between calibration error and accuracy in the online setting. We demonstrate this by characterizing the space of jointly achievable calibration and regret using our technique.
2023
AISTATS 2024
On the Vulnerability of Fairness Constrained Learning to Malicious Noise
Blum, Avrim,
Okoroafor, Princewill, Saha, Aadirupa, and Stangl, Kevin
Neurips Workshop on Algorithmic Fairness, International Conference on Artificial Intelligence and Statistics (AISTATS 2024) 2023
We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a Θ(α) loss in accuracy, where α is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an O(α‾‾√) loss, and give a matching Ω(α‾‾√)lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is Ω(1). The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes O(α),O(α‾‾√) and O(1). These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.
SODA 2023
Non-Stochastic CDF Estimation Using Threshold Queries
Okoroafor, Princewill, Gupta, Vaishnavi, Kleinberg, Robert, and Goh, Eleanor
Estimating a univariate distribution is one of the most basic
and fundamental tasks in non-parametric statistics. In this
paper, we tackle the distribution estimation problem in a setting
with two challenging features. First, the algorithm does not
directly observe the data; instead, it only asks
a limited number of threshold queries about each sample.
Second, the data are not assumed to be independent
and identically distributed;
instead, we allow for an arbitrary process generating
the samples, including an adaptive adversary. These
considerations are relevant, for example, when modeling a
seller experimenting with posted prices to estimate the
distribution of consumers’ willingness to pay for a product:
offering a price and observing a consumer’s purchase decision
is equivalent to asking a single threshold query about their value,
and the distribution of consumers’ values may be non-stationary
over time, as early adopters may differ markedly from late adopters.
Our main result quantifies, to within a constant factor, the
sample complexity of estimating the empirical CDF of a sequence
of elements of [n], up to \eps additive error,
using one threshold query per sample. The complexity depends only
logarithmically on n, and our result can be interpreted
as extending the existing logarithmic-complexity results for
noisy binary search to the more challenging setting where
noise is non-stochastic. Along the way to designing our
algorithm, we consider a more general model in which the
algorithm is allowed to make a limited number of
simultaneous threshold queries on each sample. We
solve this problem using Blackwell’s Approachability Theorem
and the exponential weights method.
As a side result of independent interest, we
characterize the minimum number of simultaneous
threshold queries required by deterministic
CDF estimation algorithms.