"Accelerating UTF-8 Decoding Using SIMD Instructions"
Hiroshi Inoue, Hideaki Komatsu, Toshio Nakatani
The 68th Workshop of IPSJ SIG Programming, Mar 17-18, 2008.
Slides [PDF]: IPSJPRO2008_SIMDdecoding.pdf
Journal version (in Japanese): http://id.nii.ac.jp/1001/00016433/
Recently UTF-8 encoding is widely used as a standard format for text data exchange. The Java programming language, however, uses UTF-16 encoding as its internal representation format for text data. As a result, data conversions between UTF-8 and UTF-16 consume considerable amount of CPU time in workloads that process large amount of text data, such as web application servers. Hence accelerating these conversions are important to improve the performance of many applications.
In this paper, we present our new technique to accelerate decoding of variable-length format, such as conversion from UTF-8 to UTF-16, by using the SIMD instructions. The new technique can achieve higher performance by reducing overhead of branch mispredictions in addition to exploiting data parallelism of SIMD instructions. We implemented the new technique using VMX instructions, SIMD instruction set of the PowerPC architecture, and evaluated its performance to decode UTF-8 sequences. As a result, we showed that the technique significantly accelerated the UTF-8 decode processing compared to the existing method.