Darya Kaviani, Sijun Tan, et al.
RWC 2025
AIOps can provide essential value for data lakehouses as lakehouses pose complex operational challenges for Site Reliability Engineers (SRE). This paper proposes that the unified approach of data lakehouses creates a unique opportunity for unified data resiliency management. We focus on AIOps applied to disaster recovery and backup/restore. In particular, we focus on managing data lakehouse hardware resources to ensure that lakehouse data Recovery Point Objectives (RPO) are met with a high degree of accuracy. The goal is to warn an SRE about an impending RPO violation and to suggest adding given amounts of hardware resources before a given time to avoid violation of the lakehouse data's RPO. We claim AIOps can achieve this goal with an ensemble of machine learning and time series analysis.
Darya Kaviani, Sijun Tan, et al.
RWC 2025
Haoran Qiu, Weichao Mao, et al.
USENIX ATC 2024
Runyu Jin, Paul Muench, et al.
ICPE 2024
Shiqiang Wang, Mingyue Ji
NeurIPS 2022