Mauricio Araya-Polo, Félix Rubio, et al.
Scientific Programming
In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach's viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design. © 2010 Springer Science+Business Media, LLC.
Mauricio Araya-Polo, Félix Rubio, et al.
Scientific Programming
Jamin Naghmouchi, Daniele Paolo Scarpazza, et al.
ICS 2010
Daniele Paolo Scarpazza, Oreste Villa, et al.
IEEE TPDS
Daniele Paolo Scarpazza, Gordon W. Braudaway
IISWC 2009