Microarchitecture Exploration Toolset (MET) - overview

The MET: Microarchitecture Exploration Toolset for PowerPC Processors

Contemporary processor microarchitectures are complex. Superscalar and out-of-order instruction issue pipelines are only the beginning of a large domain of design options; other important aspects include features such as branch prediction, cache hierarchy, speculative execution, penalties for mispredictions, to name just a few. These complex features are often counterintuitive, requiring early, accurate, and timely modeling to ensure proper design trade-offs. The evaluation of such design options is further complicated by the varying behavior of programs. Different classes of applications, such as scientific, data-intensive, graphics-intensive, etc., exercise the features of a given processor in different manners. Even a single application may exhibit phases throughout its execution of very different characteristics, such as periods of varying cache misses, varying branch mispredict rate, and so on. As a result, the adequate evaluation of design trade-offs in a microprocessor requires examining a wide variety of inputs, representing the various types of applications of interest, each of a length that captures the varying behavior of the corresponding application. All these factors lead to a large input set for the evaluation.

The tools in use nowadays for the trade-off analysis of design alternatives in a microprocessor are mostly cycle-accurate processor models. These are detailed representations of the organization and behavior of the corresponding processor, as required to adequately exercise and evaluate the features being considered. Unfortunately, the combination of detailed features in the models and large input set required for an adequate evaluation leads to very long simulation time, which imposes limits to the number of experiments possible in a design trade-off exercise. This problem has been addressed in different ways, including the use of sampled execution traces (instead of entire traces), development of synthetic (compact) traces that attempt to capture the behavior of the programs of interest, combination of simulation with probabilistic modeling methods, among others. However, these alternative strategies have not yet been able to provide the same level of accuracy in the evaluation of trade-offs as that available from long simulations.

Research in this area at the IBM Thomas J. Watson laboratory has led to the development of The MET (Microarchitecture Exploration Toolset), a collection of tools for supporting fast exploration of processor microarchitecture options for the PowerPC architecture. The entire toolset has being designed emphasizing the need for fast execution, so that execution traces consisting of hundreds of millions of instructions can be analized, and multiple experiments can be performed within a reasonably short turn-around time, on a normally configured workstation. The MET's features resemble those in other processor simulation environments used in academia, such as SimpleScalar or Microarchitecture Workbench, but differ from them in several key areas:

  • support for the PowerPC instruction set architecture;
  • higher simulation speed, in the order of 100,000 cycles for a target processor per second;
  • innovative mechanism for simulating the execution of instructions in a speculative state (mispredicted instructions), capturing those instructions, and inserting them into the execution traces;
  • thorough validation of performance results.

To achieve the fast execution required, the tool set relies extensively on novel microarchitecture modeling techniques and judicious programming syles. These include

  • extensive predecoded information, thus avoiding run-time decoding overhead;
  • compile-time parameters for processor options, thus avoding run-time interpretation overhead for determining the structure of the processor; and
  • programming styles that consider the compiler's optimizing behavior, thus reducing execution penalties in the simulator arising from branches, cache footprint size, and so on.

The tool set is based on trace-driven modeling, wherein traces of instructions executed by programs can be pregenerated by separate tracing facilities or collected dynamically from a workload's execution. The interface to both type of traces is the same. Although static traces do not allow the analysis of the behavior of mispredicted instructions, they offer these advantages

  • experiment repeatibility, a feature that might not be guaranteed in execution driven environments if different versions of operating system and.or shared libraries are used across multiple workstations, or if the workload exhibits dependencies on the environment;
  • ability to evaluate workloads from PowerPC platforms other than AIX; and
  • ability to evaluate programs containing instructions not traceable within our environment.

In the case of traces generated dynamically, The MET is capable of analyzing the behavior and performance of single-threaded user programs, in particular the effects of instructions executed by user code and by shared libraries. The MET and Traces

Decoupling the trace generation process from the processor model, while retaining the ability to simulate mispredicted instructions, also has other advantages. For example, the trace generation mechanism allows the simulation of new instructions in the architecture; the mechanism can also be used to profile and tune programs.

The tool set is intended to support microarchitecture exploration prior to the complete definition of a processor implementation. Although the tools perform the analysis of the processor's behavior on a cycle-by-cycle basis, the tools do not model all the details of a processor pipeline but only those that are relevant to the exploratory evaluation.

Throughout this site, we provide a description of the tool set, including our research publications and presentations on the subject. Enjoy it!

Tools in MET

The MET includes:

  • Aria, an execution-simulation library
  • Turandot, a parameterized processor model
  • Rondo, a branch prediction exploration tool
  • LeProf, a profiling and cache analysis tool
  • eOak, a system-level PowerPC 403GCX simulator
  • PavaRotti, a collection of tools for performance analysis and validation.
  • Trace tools for various trace formats.

PowerPC 7xx Chips