Pangea: Monolithic distributed storage for data analytics
Jia Zou, Arun Iyengar, et al.
VLDB 2017
Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non-shared execution data in separate systems such as a distributed file system like HDFS, an in-memory file system like Alluxio, and a computation framework like Spark. Such layering introduces significant performance and management costs. In this paper, we propose a single system called Pangea that can manage all data—both intermediate and long-lived data, and their buffer/caching, page replacement, data placement optimization, and failure recovery—all in one monolithic distributed storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.
Jia Zou, Arun Iyengar, et al.
VLDB 2017
Jim Challenger, Arun Iyengar, et al.
IEEE INFOCOM 1991
Bianca Schroeder, Mor Harchol-Balter, et al.
ICDE 2006
Bhuvan Bamba, Ling Liu, et al.
ICDCS 2009