R. Govindarajan, N.S.S. Narasimha Rao, et al.
Int. J. Parallel Program
This paper presents a joint study of application and architecture to improve the performance and scalability of an irregular application-computing betweenness centrality-on a many-core architecture IBM Cyclops64. The characteristics of unstructured parallelism, dynamically non-contiguous memory access, and low arithmetic intensity in betweenness centrality pose an obstacle to an efficient mapping of parallel algorithms on such many-core architectures. By identifying several key architectural features, we propose and evaluate efficient strategies for achieving scalability on a massive multi-threading many-core architecture. We demonstrate several optimization strategies including multi-grain parallelism, just-in-time locality with explicit memory hierarchy and non-preemptive thread execution, and fine-grain data synchronization. Comparing with a conventional parallel algorithm, we get 4X-50X improvement in performance and 16X improvement in scalability on a 128-cores IBM Cyclops64 simulator. © 2009 Springer Science+Business Media, LLC.
R. Govindarajan, N.S.S. Narasimha Rao, et al.
Int. J. Parallel Program
Erik R. Altman, R. Govindarajan, et al.
Journal of Parallel and Distributed Computing
Jong-Deok Choi, Manish Gupta, et al.
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)
Marco Pistoia, Robert J. Flynn, et al.
ECOOP 2005