Hiroshi Inoue  Hiroshi Inoue photo         

contact information

Ph.D., Research Staff Member
IBM Research - Tokyo
  +81dash3dash3808dash5345

links

Professional Associations

Professional Associations:  ACM SIGPLAN  |  Information Processing Society of Japan (IPSJ)


"Accelerating UTF-8 Decoding Using SIMD Instructions"
Hiroshi Inoue, Hideaki Komatsu, Toshio Nakatani
The 68th Workshop of IPSJ SIG Programming, Mar 17-18, 2008.

Slides [PDF]: IPSJPRO2008_SIMDdecoding.pdf

Journal version (in Japanese): http://id.nii.ac.jp/1001/00016433/

Abstract

Recently UTF-8 encoding is widely used as a standard format for text data exchange. The Java programming language, however, uses UTF-16 encoding as its internal representation format for text data. As a result, data conversions between UTF-8 and UTF-16 consume considerable amount of CPU time in workloads that process large amount of text data, such as web application servers. Hence accelerating these conversions are important to improve the performance of many applications.
In this paper, we present our new technique to accelerate decoding of variable-length format, such as conversion from UTF-8 to UTF-16, by using the SIMD instructions. The new technique can achieve higher performance by reducing overhead of branch mispredictions in addition to exploiting data parallelism of SIMD instructions. We implemented the new technique using VMX instructions, SIMD instruction set of the PowerPC architecture, and evaluated its performance to decode UTF-8 sequences. As a result, we showed that the technique significantly accelerated the UTF-8 decode processing compared to the existing method.