Diana J. Arroyo photo Michael  (Mike) Spreitzer photo

Hannibal - overview

In this project we investigate techniques to improve performance and reliability of multi-stage data flows running in a large scale data analytic environments. In this work, we studied ways to better manage intermediate data of MapReduce jobs, to dynamically control the replication factor of intermediate data in a data flow, and to perform resource and SLA-aware scheduling of MapReduce jobs.