Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
We address the contextual bandit problem for high-dimensional combinatorial action spaces involving a class of structured discrete constrained optimization problems. In this setting, key quantities within the optimization formulation, such as performance indicators contributing to the objective or constraints, must be estimated from data during the trial sequence of optimized actions. These problems frequently arise in the many operations management domains including IT resource allocation and retail assortment price optimization. We propose a novel, practical, and transparent approach based on general-purpose regression oracles, leveraging Inverse Gap Weighting (IGW) for seamless integration within an optimization framework. IGW sampling is efficiently managed by: (a) a column generation reformulation of the underlying Mixed Integer Programming (MIP) model which allows for flexible lower-level predictors, causal coherence, and efficient representation of large action spaces; (b) a diverse solution pool generation to balance the exploration-exploitation trade-off in large action spaces. To address non-smoothness in the reward function due to optimization constraints, we incorporate a risk-averse phased learning strategy. We validate our approach on a real-world auto-scaling problem in IT automation, achieving a significant reduction in cumulative regret through skillful exploration, with additional gains from risk-averse methods that effectively manage constraint violations.
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011