Professional AssociationsProfessional Associations: ACM | Information Processing Society of Japan (IPSJ)
- Code Patch to the STAMP Benchmarks for Hardware Transdactional Memory (HTM)
- IISWC 2014 Paper
- PPoPP 2014 Paper
- IISWC 2013 Paper
- Research Report: Eliminating GIL in Ruby through HTM
- ASPLOS 2012 Paper
- CGO 2010 Paper
- VEE 2010 Paper
- ISCA 2015 Paper
- Code Patch to Eliminate Global Interpreter Lock (GIL) in Ruby through Hardware Transactional Memory
July 29, 2015: email address fixed.
April 14, 2015: Web page moved from IBM Research - Tokyo to IBM Research - Austin
December 18, 2014: code patch and building instructions uploaded.
Patching and Building HTM-enabled STAMP
We are distributing a code patch to stamp-0.9.10 (mirror). Download our patch from the link above. Supported platforms are Intel TSX/Linux, POWER8/Linux, POWER8/AIX, and z/OS/zEC12. The source code supports Blue Gene/Q, too, but the build scripts do not, so you need to modify common/Defines.common.mk for your Blue Gene/Q build environment.
> cd stamp-0.9.10
> patch -p1 < $PATH_TO_YOUR_PATCH/htm_support_for_STAMP_by_IBM_Research_Tokyo.v1.patch
We support 3 configurations: sequential, locking, and HTM executions. You need to build 3 separate binaries for each benchmark. The locking execution uses a global spin-lock, not Pthread's mutex lock. The fall-back global lock for the HTM execution also uses a global spin-lock.
To build the benchmarks,
> cd (benchmark directory)
> make -f Makefile.seq clean; make -f Makefile.seq default
> make -f Makefile.lock clean; make -f Makefile.lock default
> make -f Makefile.htm_ibm clean; make -f Makefile.htm_ibm default
You should use GNU make. These commands will create three binaries, (benchmark name).seq, (benchmark name).lock, and (benchmark name).htm_ibm, corresponding to the 3 configurations, respectively. The Makefiles are quite incomplete, so your contributions are welcome. To tweak the build options, modify common/Defines.common.mk.
This code patch enables not only HTM but also a couple of optimizations we found beneficial on HTM. In genome, a transaction size is tuned by CHUNK_STEP1. In intruder, to reduce transaction footprints, concurrent hash maps and red-black trees are used instead of red-black trees and linked lists, respectively, without changing any semantics. In kmeans, to avoid false sharing, each cluster occupies dedicated cache lines. In vacation, concurrent hash maps are used instead of red-black trees, as in intruder. To disable our optimizations, comment out "enable_IBM_optimizations := yes" in common/Defines.common.mk.
Except for bayes, the generated binaries use the thread-local memory allocator attached with the original STAMP benchmarks. The thread-local memory allocator avoids contention in malloc(), but it does not support free(). To use the system-provided malloc()/free(), comment out "CFLAGS += -DUSE_TLH" in each benchmark directory's Defines.common.mk. Because bayes temporarily allocates several GB of memory, the system-provided malloc() is used to avoid an out-of-memory error.
For the ease of testing, we provide makefile targets to run each benchmark with the runtime options specified in README. For example, to run a benchmark with 4 threads using HTM, at the benchmark directory,
> make -f Makefile.htm_ibm run4
As another example, to run a benchmark with 8 threads using locking,
> make -f Makefile.lock run8
For kmeans and vacation, which have high- and low-contention modes, use the runhighn or runlown targets (for example, runhigh4, runlow8, etc.)
To output HTM statistics at the end of each execution, enable the HTM_STATS environment variable. In our experience, collecting the statistics does not add visible overhead.
> export HTM_STATS=yes
When a transaction aborts, our implementation retries the transaction a specified number of times. You can change the nuber of times a transaction can abort before reverting back to a global lock by setting 3 environment variables. HTM_TRETRY is for transient aborts, HTM_PRETRY is for persistent aborts, and HTM_GRETRY is for aborts due to contentions at the global lock. For example,
> export HTM_TRETRY=10 # Default 16
> export HTM_PRETRY=3 # Default 1
> export HTM_GRETRY=1 # Default 16
Our implementation does not employ runtime adaptation. In other words, our implementation always tries to execute every transactional code section with a transaction (HTM) before falling back on global locking, no matter how high the abort ratio of that transactional code section is.
Bug Reports and Comments
Contact Rei Odaira (rodaira [at] us . ibm . com). Twitter: @ReiOdaira (I am twitting in Japanese, but you can talk to me in English.)