>From owner-simulators Tue Aug 15 13:29:35 1995 Received: from meitner.cs.washington.edu (meitner.cs.washington.edu [128.95.2.104]) by june.cs.washington.edu (8.6.12/7.2ju) with ESMTP id NAA18624 for ; Tue, 15 Aug 1995 13:29:34 -0700 Received: from localhost (localhost [127.0.0.1]) by meitner.cs.washington.edu (8.6.12/7.2ws+) with SMTP id NAA30644 for ; Tue, 15 Aug 1995 13:29:34 -0700 Message-Id: <199508152029.NAA30644@meitner.cs.washington.edu> To: simulators@cs Subject: Clarity MCode Date: Tue, 15 Aug 1995 13:29:34 PDT From: " pardo@cs.washington.edu" X-Message-Id: simulators@cs.washington.edu, message #1995-08-005 X-Unsubscribe: e-mail `majordomo@cs.washington.edu', body `unsubscribe rtcg' %A Brian T. Lewis %A L. Peter Deutsch %A Theodore C. Goldstein %T Clarity MCode: A Retargetable Intermediate Representation for Compilation %J Proceedings of IR '95 %D January 1995 %C San Francisco, California, USA %P 119-128 %W btlewis@eng.sun.com %X `Clarity' is a C++-related programming language with delegation instead of inheritance and better control over safe and unsafe modules (among other things). For improved code portability, Clarity is compiled to `MCode', a machine-independent IR. The IR may be interpreted or compiled to machine code lazily at runtime. To support systems programming, can call between Clarity and C. Machine code generation is done by converting Clarity stack operations (~bytecodes) to an AST and then uses a tree-walk code generator to generate machine code in one pass. The resulting code is similar in quality to static compilation -O2 and abot 2x slower than the best-optimized C. The raw interpreter is about 100x slower than the best C. (pg 119) Forwards compatability: Code generated specifically for one new SPARC implementation is 25\% faster than ``generic'' code running on the same processor. On-the-fly code generation lets you take advantage of more machine-specific optimizations. ``A runtime code generator can also take advantage of the specfic values used in a program to generate machine code customized for those values.'' (doesn't cite anything). (pg 119, 126) MCode compiled to linkable .o modules. Thus need to include platform-dependent definitions of global variables and procedures and descriptions of refernced symbols. Thus, uses ``entry code [that] consists primarily of a three instruction "trampoline" that redirects the call to the appropriate target procedure chosen by the interpret/compile strategy module in the MCode runtime. The SPARC entry code also has three words used when atomically updating the trampoline. ... [T]he contents of a Linkable MCode file are mostly platform-independent.'' Pardo notes that MCode modules could use a platform-dependent ``installer'' like S/38 and AS/400 that would decorate a truly portable MCode module with e.g., the trampoline code required for static linking (as well as ``default'' static code implementations). You would also want an uninstaller. Pardo notes that dynamically-linked MCode modules could be completely machine-independent if you could convince the dynamic linker to call out to the MCode runtime system whenever it was building a dynamic link. (pg 122) MCode similar to Oak/Java bytecodes but more abstract, refering to platform-dependent type definitions. MCode's control is expressed as high-level instructions (BeginLoop ... EndLoop) while Oak/Java uses jumps. Lower level than Franz's LZ'd AST. (pg 124) MCode includes some optimization information: aliasing, a flag for each procedure leaf/nonleaf. Later maybe variable lifetime information. (pg 126) Type safety will eventually be checked by a prelinker. (pg ??) Code generated on a procedure-by-procedure basis. (pg 126-8) Machine code generator is represented by two classes. One keeps track of state during compilation; machine-dependent part takes care of e.g., parameter passing conventions. The other represents machine code at the level of ``call op_add_real'' to generate code when an ``add real'' is seen in the MCode instruction stream. (pg 120) Current design has strong roots in an earlier system by Deutsch to quickly compile C bytecodes. (pg 128) The interpreter implements calls and returns as SPARC ABI calls. Does not require dynamically-generated closures (ala [Breuel, Chase, DEC Alpha calling standards]) since they are provided by the platform-dependent part of an MCode object module. Does require dynamically-generated sequences for calls to routines with aggregate return, since the size of the return aggregate's lengh is encoded in the I-stream following the call instruction (static code would require one call sequence for each possible aggregate size). Brian says: supporting the SPARC ABI complicated the interpreter, since various struct return policies use an `unimp' instruction after the `call' instruction, thus the interpreter must have the right call/unimp sequences lying around. Clarity code ``doesn't quite'' use the SPARC ABI calling convention, it uses 2 global registers to hold the `self'-like pointers used to implement delegation. [Ted Goldstein was good enough to take some time out to explain a few details of dynamic compilation of MCode bytecodes. Here's a brief summary of his e-mail, though I may have introduced some errors in transliterating it:] The external interface to the compiler works at either whole module level (for static .o file generation) or one-function at a time (for in-core execution). I'll focus on the latter. The CGEN code generator was designed and built by L Peter Deutsch and myself after experience with the Smalltalk-80 code generators built for ParcPlace Systems. The entire code generator is small--only 3750 executable statements. It is heavily object-oriented, with about 180 classes and structs in 11543 lines. Our experience is that OO code is efficient when it is small and compact. Of the 180 classes, 17 are machine dependent. The first time any function in a module is executed, the entire module has its external symbol tables unpickled. A small type table is set up for the bytecodes; as with a conventional compiler, this is the compilation context for the function. The next step is to symbolically execute the bytecodes. The bytecodes target a stack machine, but CGEN reconstructs SSA form. CGEN does a first-best fit targeting (similar to PQCC's TNBIND) and code is generated as soon as a target is known. If CGEN gets in trouble (runs out of registers and too large a spill penalty is created), CGEN goes back and regenerates the function first doing a graph coloring of the SSA form before targeting. In practice, only large FORTRAN-ish code uses regeneration; Clarity is like C++ so double-generation happens rarely. Code generation always uses specific registers and machine instructions. Various local optimizations are done eagerly including delay slot filling, branch selection and memory interleaving. ``Eager'' means that the optimization is triggered as soon as enough consecutive instructions are known. The code generator uses a Smalltalk-style InstructionStream object that performs local optimization. Some optimizations would not even be possible in conventional compilers (for example arithmetic on runtime constants and addresses more complicated than linkers support). Finally instructions are copied out of objects and onto the target page. Locality of reference allows code referenced consecutively to be close in memory, which is important for OO code.