Hiroshi Inoue  Hiroshi Inoue photo         

contact information

Ph.D., Research Staff Member
IBM Research - Tokyo
  +81dash3dash3808dash5345

links

Professional Associations

Professional Associations:  ACM SIGPLAN  |  IEEE Computer Society  |  Information Processing Society of Japan (IPSJ)


"Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters"
Hiroshi Inoue and Toshio Nakatani
2012 International Symposium on Memory Management (ISMM 2012). Beijing, China. pp 133-142. June 15-16, 2012.
Full text [PDF]: ISMM2012_inoue.pdf
Slides [PDF]: ISMM2012_inoue_slides.pdf

Abstract

Cache miss stalls are one of the major sources of performance bottlenecks for multicore processors. A Hardware Performance Monitor (HPM) in the processor is useful for locating the cache misses, but is rarely used in the real world for various reasons. It would be better to find a simple approach to locate the sources of cache misses and apply runtime optimizations without relying on an HPM. This paper shows that pointer dereferencing in hot loops is a major source of cache misses in Java programs. Based on this observation, we devised a new approach to identify the instructions and objects that cause frequent cache misses. Our heuristic technique effectively identifies the majority of the cache misses in typical Java programs by matching the hot loops to simple idiomatic code patterns. On average, our technique selected only 2.8% of the load and store instructions generated by the JIT compiler and these instructions accounted for 47% of the L1D cache misses and 49% of the L2 cache misses caused by the JIT-compiled code. To prove the effectiveness of our technique in compiler optimizations, we prototyped object placement optimizations, which align objects in cache lines or collocate paired objects in the same cache line to reduce cache misses. For comparison, we also implemented the same optimizations based on the accurate information obtained from the HPM. Our results showed that our heuristic approach was as effective as the HPM-based approach and achieved comparable performance improvements in the SPECjbb2005 and SPECpower_ssj2008 benchmark programs.

Copyright (c) 2012 by Association for Computing Machinery, Inc. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.