A Brief Categorization

A list of tools, organized according to various interesting features. See also a listing of tools ordered alphabetically. Interesting things about the tools include:

Purpose Of The Tool

Simulation and tracing tools can perform a wide variety of tasks. Here are some common uses: In addition, some tools are used for

Handles Application Bugs Robustly

Works with Self-Modifying Code

??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Multiple Processors

??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Support for Multiple Protection Domains

??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Signals and Exceptions

??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Support for System-Mode Code

(Detail) ??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Input Representation

??? THIS CATEGORY NOT YET ORGANIZED, SEE THE SHADE PAPER.

Implementation: Decompilation Technology

``Decompilation technology'' here refers to the process of analyzing a (machine code) fragment and, through analysis, creating some higher-level information about the fragment. For simulation and tracing tools, decompilation is typically simpler than static program decompilation, in which the goal is to read a binary program and produce source code for it in some high-level language. Simulation and tracing ``has it easy'' in comparison because it is possible to get by with a lower-level representation and also to punt hard problems to the runtime, when more information is available.

Even so, executable machine code is difficult to simulate and trace efficiently (within 2 orders of magnitude of the performance of native execution) when using ``naive'' instruction-by-instruction translation, because lots of relevant information is unavailable statically. For example, every instruction is potentially a branch target; every word of memory is potentially used both as code and as data; every mutable word of memory is potentially executed, modified (at runtime), and then executed again; and so on.

Executable machine code is also inherently (target) machine-dependent and thus lexing and parsing the machine code is a source of potential portability problems. (Note that some tools use a high-level input, so that relatively little analysis is needed to determine the original programmers intent, at least at a level needed to simulate the program with modest efficiency.)

The following is a a list of tools and papers that show how to reduce the overhead of analyzing each instruction; how to reduce the number of times each instruction is analyzed; how to perform optimistic analysis and recover when it's wrong; and how to improve the abstraction of machine-dependent parts of the tool.

A short list:

A slightly longer list:

Implementation: Simulation Technology

The ``simulation technology'' is how the original machine instructions (or other source representation) gets translated into an executable representation that is suitable for simulation and/or tracing. Choices include:

Dynamic Compilation: Displaced Execution

Move an instruction from one place to another, but execute with the same host and target.

Dynamic Compilation: Cross-Compilation

Compile instruction sequences from a target machine to run on a host machine.

Hardware Emulation

Interpreters

Simulation and tracing tools that perform execution using interpretation; the original executable code is neither preprocessed (augmentation or static cross-compilation) nor is it dynamically compiled to host code.

Static Cross-Compilation

Statically cross-compile instruction sequences from a target machine to run on some host machine.

Static Augmentation

Augmentation-based tracing tools run host instructions native, but some instructions are simulated. For example, Proteus executes arithmetic and stack-relative memory reference instructions native, and simulates load and store instructions that may reference shared memory.

Multiple Strategies

Some tools rely on having multiple strategies in order to achieve their desired functionality. For the purposes here, ``untraced native execution'' counts as a translator.

Other

Some tools/papers not listed under other headings.

Match Between Host and Target

??? THIS CATEGORY NOT YET ORGANIZED.

Generally, the closer the match between the host and the target, the easier it is to write a simulator, and the better the efficiency. Possible mismatches include:

Note that target support for self-modifying code may be treated as a special case of synchronization. For example, target machines with no caches or unified instruction and data caches will typically write instructions using ordinary store instructions. Therefore, all store instructions must be treated as potential code-modifying instructions.

For timing-accurate simulation (see Talisman and RSIM), some matches between the host and target can improve the efficiency, but many do not.

Timing Simulation

??? THIS CATEGORY NOT YET ORGANIZED.

Some instruction-set simulators also perform timing simulation. Timing is not strictly an element of timing simulation, but is often useful, since one major use for instruction set simulation is to collect information for predicting or analyzing performance. Important features of timing simulation include both the processor pipeline and the memory system (see Talisman and RSIM).

Performance

There are many ways to measure performance. Some common metrics include: Metrics that are more abstract have the advantage that they are typically simple to reason about and applicable across a variety of implementations. For example, host instructions may be counted relatively easily for each of a variety of target instructions, and the counts are relatively isolated from the structure of the caches and microarchitecture. Conversly, concrete metrics tend to more accurately reflect all related costs. For example the effects of caches and microarchitectures are included.l

It is worth noting that few reports give enough information about the measurement methodology in order to make a valid comparison. For example, if dilation is ``typically'' 20x, what is ``typical'', and what is the performance for ``non-typical'' workloads?

Product Status

??? THIS CATEGORY NOT YET ORGANIZED.

The status of tool




From instruction-set simulation and tracing