Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier

Abstract

Selecting an optimal set of exemplars is critical for good performance of in-context learning. However, prior exemplar search methods narrowly optimize for predictive accuracy, critically neglecting model calibration—a key determinant of trustworthiness and safe deployment. In this paper, we formulate exemplar selection as a multi-objective optimization problem, explicitly targeting both the maximization of predictive accuracy and the minimization of expected calibration error. We solve this problem with a sample-efficient Combinatorial Bayesian Optimization algorithm (COM-BOM) to find the Pareto-front that optimally trade-offs the two objectives of accuracy and calibration. We evaluate COM-BOM on multiple tasks from un-saturated MMLU-pro benchmark and find that COM-BOM beats or matches the baselines in jointly optimizing the two objectives, while requiring a minimal number of LLM API calls.

Publication
Thirtieth Conference on Empirical Methods in Natural Language Processing (EMNLP)

Related