Hiroshi Inoue  Hiroshi Inoue photo         

contact information

Ph.D., Research Staff Member
IBM Research - Tokyo
  +81dash3dash3808dash5345

links

Professional Associations

Professional Associations:  ACM SIGPLAN  |  IEEE Computer Society  |  Information Processing Society of Japan (IPSJ)


"A high-performance sorting algorithm for multicore single-instruction multiple-data processors"
Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani
Software: Practice and Experience, Vol. 42(6), pp. 753-777, 2012.

pre peer-reviewed version [PDF]: SPE-SIMDsort.pdf
The final version is available at http://onlinelibrary.wiley.com/doi/10.1002/spe.1102/abstract

Abstract

Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both single-instruction multiple-data (SIMD) instructions and thread-level parallelism. In this paper, we propose a new high-performance sorting algorithm, called aligned-access sort (AA-sort), that exploits both the SIMD instructions and thread-level parallelism available on today's multicore processors. Our algorithm consists of two phases, an in-core sorting phase and an out-of-core merging phase. The in-core sorting phase uses our new sorting algorithm that extends combsort to exploit SIMD instructions. The out-of-core algorithm is based on mergesort with our novel vectorized merging algorithm. Both phases can take advantage of SIMD instructions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions in both phases. We implemented and evaluated the AA-sort on PowerPC 970MP and Cell Broadband Engine platforms. In summary, a sequential version of the AA-sort using SIMD instructions outperformed IBM's optimized sequential sorting library by 1.8 times and bitonic mergesort using SIMD instructions by 3.3 times on PowerPC 970MP when sorting 32 million random 32-bit integers. Also, a parallel version of AA-sort demonstrated better scalability with increasing numbers of cores than a parallel version of bitonic mergesort on both platforms.

Copyright (c) 2011 John Wiley & Sons, Ltd.