Analytics infrastructure       


Kubilay Atasu photo Thomas Parnell photoHaris Pozidis photo

Analytics infrastructure - overview

Knowledge discovery through data mining and machine learning have recently proliferated in our digital universe. The generation of huge amounts of data and the high computational power now available have fueled an inexorable desire to capture and analyze this data to uncover hidden patterns that promise to lead to better insights and improved decision making. However, as data volumes increase faster than our ability to process them, it becomes crucial to revisit traditional methods of computing. This has led to the emergence of new data processing frameworks such as MapReduce and Spark, that are better suited to the new data-centric computing paradigm.

Our research in data-centric computing for analytics applications is built on two main pillars, namely infrastructure and algorithms. On the infrastructure front, we are designing new user-level I/O architectures that enable high-performance data movement and integrate seamlessly with existing data processing frameworks. On the algorithmic front, we are developing new software infrastructure and algorithms to accelerate large-scale machine-learning workloads with a focus on improving performance and achieving scalability.