Professional AssociationsProfessional Associations: ACM | Information Processing Society of Japan (IPSJ)
- Code Patch to the STAMP Benchmarks for Hardware Transdactional Memory (HTM)
- IISWC 2014 Paper
- PPoPP 2014 Paper
- IISWC 2013 Paper
- Research Report: Eliminating GIL in Ruby through HTM
- ASPLOS 2012 Paper
- CGO 2010 Paper
- VEE 2010 Paper
- ISCA 2015 Paper
- Code Patch to Eliminate Global Interpreter Lock (GIL) in Ruby through Hardware Transactional Memory
ASPLOS 2012 Paper
Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat.
Rei Odaira and Toshio Nakatani.
In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 147--158, 2012
Full text [PDF]: ASPLOS2012_ContinuousObjectAccessProfiling.pdf
Slides [PDF]: ASPLOS2012_ContinuousObjectAccessProfiling_Slides.pdf
Future microprocessors will have more serious memory wall problems since they will include more cores and threads in each chip. Similarly, future applications will have more serious memory bloat problems since they are more often written using object-oriented languages and reusable frameworks. To overcome such problems, the language runtime environments must accurately and efficiently profile how programs access objects. We propose Barrier Profiler, a low-overhead object access profiler using a memory-protection-based approach called pointer barrierization and adaptive overhead reduction techniques. Unlike previous memory-protection-based techniques, pointer barrierization offers per-object protection by converting all of the pointers to a given object to corresponding barrier pointers that point to protected pages. Barrier Profiler achieves low overhead by not causing signals at object accesses that are unrelated to the needed profiles, based on profile feedback and a compiler analysis. Our experimental results showed Barrier Profiler provided sufficiently accurate profiles with 1.3% on average and at most 3.4% performance overhead for allocation-intensive benchmarks, while previous code-instrumentation-based techniques suffered from 9.2% on average and at most 12.6% overhead. The low overhead allows Barrier Profiler to be run continuously on production systems. Using Barrier Profiler, we implemented two new online optimizations to compress write-only character arrays and to adjust the initial sizes of mostly non-accessed arrays. They resulted in speed-ups of up to 8.6% and 36%, respectively.
Copyright (C) 2012 by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.