deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Registers::               Accessing the whole register file
  34 * Customizations::          Advanced code generation customizations
  35 * Acknowledgements::        Acknowledgements for GNU lightning
  36 @end menu
  37 @end ifnottex
  38
  39 @node Overview
  40 @chapter Introduction to @lightning{}
  41
  42 @iftex
  43 This document describes @value{TOPIC} the @lightning{} library for
  44 dynamic code generation.
  45 @end iftex
  46
  47 Dynamic code generation is the generation of machine code
  48 at runtime. It is typically used to strip a layer of interpretation
  49 by allowing compilation to occur at runtime.  One of the most
  50 well-known applications of dynamic code generation is perhaps that
  51 of interpreters that compile source code to an intermediate bytecode
  52 form, which is then recompiled to machine code at run-time: this
  53 approach effectively combines the portability of bytecode
  54 representations with the speed of machine code.  Another common
  55 application of dynamic code generation is in the field of hardware
  56 simulators and binary emulators, which can use the same techniques
  57 to translate simulated instructions to the instructions of the
  58 underlying machine.
  59
  60 Yet other applications come to mind: for example, windowing
  61 @dfn{bitblt} operations, matrix manipulations, and network packet
  62 filters.  Albeit very powerful and relatively well known within the
  63 compiler community, dynamic code generation techniques are rarely
  64 exploited to their full potential and, with the exception of the
  65 two applications described above, have remained curiosities because
  66 of their portability and functionality barriers: binary instructions
  67 are generated, so programs using dynamic code generation must be
  68 retargeted for each machine; in addition, coding a run-time code
  69 generator is a tedious and error-prone task more than a difficult one.
  70
  71 @lightning{} provides a portable, fast and easily retargetable dynamic
  72 code generation system.
  73
  74 To be portable, @lightning{} abstracts over current architectures'
  75 quirks and unorthogonalities.  The interface that it exposes to is that
  76 of a standardized RISC architecture loosely based on the SPARC and MIPS
  77 chips.  There are a few general-purpose registers (six, not including
  78 those used to receive and pass parameters between subroutines), and
  79 arithmetic operations involve three operands---either three registers
  80 or two registers and an arbitrarily sized immediate value.
  81
  82 On one hand, this architecture is general enough that it is possible to
  83 generate pretty efficient code even on CISC architectures such as the
  84 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  85 real architectures closely enough that, most of the time, the
  86 compiler's constant folding pass ends up generating code which
  87 assembles machine instructions without further tests.
  88
  89 @node Installation
  90 @chapter Configuring and installing @lightning{}
  91
  92 The first thing to do to use @lightning{} is to configure the
  93 program, picking the set of macros to be used on the host
  94 architecture; this configuration is automatically performed by
  95 the @file{configure} shell script; to run it, merely type:
  96 @example
  97      ./configure
  98 @end example
  99
 100 @lightning{} supports the @code{--enable-disassembler} option, that
 101 enables linking to GNU binutils and optionally print human readable
 102 disassembly of the jit code. This option can be disabled by the
 103 @code{--disable-disassembler} option.
 104
 105 Another option that @file{configure} accepts is
 106 @code{--enable-assertions}, which enables several consistency checks in
 107 the run-time assemblers.  These are not usually needed, so you can
 108 decide to simply forget about it; also remember that these consistency
 109 checks tend to slow down your code generator.
 110
 111 After you've configured @lightning{}, run @file{make} as usual.
 112
 113 @lightning{} has an extensive set of tests to validate it is working
 114 correctly in the build host. To test it run:
 115 @example
 116     make check
 117 @end example
 118
 119 The next important step is:
 120 @example
 121     make install
 122 @end example
 123
 124 This ends the process of installing @lightning{}.
 125
 126 @node The instruction set
 127 @chapter @lightning{}'s instruction set
 128
 129 @lightning{}'s instruction set was designed by deriving instructions
 130 that closely match those of most existing RISC architectures, or
 131 that can be easily syntesized if absent.  Each instruction is composed
 132 of:
 133 @itemize @bullet
 134 @item
 135 an operation, like @code{sub} or @code{mul}
 136
 137 @item
 138 most times, a register/immediate flag (@code{r} or @code{i})
 139
 140 @item
 141 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 142 @end itemize
 143
 144 Examples of legal mnemonics are @code{addr} (integer add, with three
 145 register operands) and @code{muli} (integer multiply, with two
 146 register operands and an immediate operand).  Each instruction takes
 147 two or three operands; in most cases, one of them can be an immediate
 148 value instead of a register.
 149
 150 Most @lightning{} integer operations are signed wordsize operations,
 151 with the exception of operations that convert types, or load or store
 152 values to/from memory. When applicable, the types and C types are as
 153 follow:
 154
 155 @example
 156      _c         @r{signed char}
 157      _uc        @r{unsigned char}
 158      _s         @r{short}
 159      _us        @r{unsigned short}
 160      _i         @r{int}
 161      _ui        @r{unsigned int}
 162      _l         @r{long}
 163      _f         @r{float}
 164      _d         @r{double}
 165 @end example
 166
 167 Most integer operations do not need a type modifier, and when loading or
 168 storing values to memory there is an alias to the proper operation
 169 using wordsize operands, that is, if ommited, the type is @r{int} on
 170 32-bit architectures and @r{long} on 64-bit architectures.  Note
 171 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 172
 173 When an unsigned operation result differs from the equivalent signed
 174 operation, there is a the @code{_u} modifier.
 175
 176 There are at least seven integer registers, of which six are
 177 general-purpose, while the last is used to contain the frame pointer
 178 (@code{FP}).  The frame pointer can be used to allocate and access local
 179 variables on the stack, using the @code{allocai} or @code{allocar}
 180 instruction.
 181
 182 Of the general-purpose registers, at least three are guaranteed to be
 183 preserved across function calls (@code{V0}, @code{V1} and
 184 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 185 @code{R2}).  Six registers are not very much, but this
 186 restriction was forced by the need to target CISC architectures
 187 which, like the x86, are poor of registers; anyway, backends can
 188 specify the actual number of available registers with the calls
 189 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 190 (for callee-save registers).
 191
 192 There are at least six floating-point registers, named @code{F0} to
 193 @code{F5}.  These are usually caller-save and are separate from the integer
 194 registers on the supported architectures; on Intel architectures,
 195 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 196 the register stack is mapped to a flat register file.  As for the
 197 integer registers, the macro @code{JIT_F_NUM} yields the number of
 198 floating-point registers.
 199
 200 The complete instruction set follows; as you can see, most non-memory
 201 operations only take integers (either signed or unsigned) as operands;
 202 this was done in order to reduce the instruction set, and because most
 203 architectures only provide word and long word operations on registers.
 204 There are instructions that allow operands to be extended to fit a larger
 205 data type, both in a signed and in an unsigned way.
 206
 207 @table @b
 208 @item Binary ALU operations
 209 These accept three operands; the last one can be an immediate.
 210 @code{addx} operations must directly follow @code{addc}, and
 211 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 212 Most, if not all, architectures do not support @r{float} or @r{double}
 213 immediate operands; lightning emulates those operations by moving the
 214 immediate to a temporary register and emiting the call with only
 215 register operands.
 216 @example
 217 addr         _f  _d  O1 = O2 + O3
 218 addi         _f  _d  O1 = O2 + O3
 219 addxr                O1 = O2 + (O3 + carry)
 220 addxi                O1 = O2 + (O3 + carry)
 221 addcr                O1 = O2 + O3, set carry
 222 addci                O1 = O2 + O3, set carry
 223 subr         _f  _d  O1 = O2 - O3
 224 subi         _f  _d  O1 = O2 - O3
 225 subxr                O1 = O2 - (O3 + carry)
 226 subxi                O1 = O2 - (O3 + carry)
 227 subcr                O1 = O2 - O3, set carry
 228 subci                O1 = O2 - O3, set carry
 229 rsbr         _f  _d  O1 = O3 - O1
 230 rsbi         _f  _d  O1 = O3 - O1
 231 mulr         _f  _d  O1 = O2 * O3
 232 muli         _f  _d  O1 = O2 * O3
 233 divr     _u  _f  _d  O1 = O2 / O3
 234 divi     _u  _f  _d  O1 = O2 / O3
 235 remr     _u          O1 = O2 % O3
 236 remi     _u          O1 = O2 % O3
 237 andr                 O1 = O2 & O3
 238 andi                 O1 = O2 & O3
 239 orr                  O1 = O2 | O3
 240 ori                  O1 = O2 | O3
 241 xorr                 O1 = O2 ^ O3
 242 xori                 O1 = O2 ^ O3
 243 lshr                 O1 = O2 << O3
 244 lshi                 O1 = O2 << O3
 245 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 246 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 247 @end example
 248
 249 @item Four operand binary ALU operations
 250 These accept two result registers, and two operands; the last one can
 251 be an immediate. The first two arguments cannot be the same register.
 252
 253 @code{qmul} stores the low word of the result in @code{O1} and the
 254 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 255 means there was no overflow. For signed multiplication, no overflow
 256 check is based on sign, and can be detected if @code{O2} is zero or
 257 minus one.
 258
 259 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 260 @code{O2}. It can be used as quick way to check if a division is
 261 exact, in which case the remainder is zero.
 262
 263 @example
 264 qmulr    _u       O1 O2 = O3 * O4
 265 qmuli    _u       O1 O2 = O3 * O4
 266 qdivr    _u       O1 O2 = O3 / O4
 267 qdivi    _u       O1 O2 = O3 / O4
 268 @end example
 269
 270 @item Unary ALU operations
 271 These accept two operands, both of which must be registers.
 272 @example
 273 negr         _f  _d  O1 = -O2
 274 comr                 O1 = ~O2
 275 @end example
 276
 277 These unary ALU operations are only defined for float operands.
 278 @example
 279 absr         _f  _d  O1 = fabs(O2)
 280 sqrtr                O1 = sqrt(O2)
 281 @end example
 282
 283 Besides requiring the @code{r} modifier, there are no unary operations
 284 with an immediate operand.
 285
 286 @item Compare instructions
 287 These accept three operands; again, the last can be an immediate.
 288 The last two operands are compared, and the first operand, that must be
 289 an integer register, is set to either 0 or 1, according to whether the
 290 given condition was met or not.
 291
 292 The conditions given below are for the standard behavior of C,
 293 where the ``unordered'' comparison result is mapped to false.
 294
 295 @example
 296 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 297 lti       _u  _f  _d  O1 =  (O2 <  O3)
 298 ler       _u  _f  _d  O1 =  (O2 <= O3)
 299 lei       _u  _f  _d  O1 =  (O2 <= O3)
 300 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 301 gti       _u  _f  _d  O1 =  (O2 >  O3)
 302 ger       _u  _f  _d  O1 =  (O2 >= O3)
 303 gei       _u  _f  _d  O1 =  (O2 >= O3)
 304 eqr           _f  _d  O1 =  (O2 == O3)
 305 eqi           _f  _d  O1 =  (O2 == O3)
 306 ner           _f  _d  O1 =  (O2 != O3)
 307 nei           _f  _d  O1 =  (O2 != O3)
 308 unltr         _f  _d  O1 = !(O2 >= O3)
 309 unler         _f  _d  O1 = !(O2 >  O3)
 310 ungtr         _f  _d  O1 = !(O2 <= O3)
 311 unger         _f  _d  O1 = !(O2 <  O3)
 312 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 313 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 314 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 315 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 316 @end example
 317
 318 @item Transfer operations
 319 These accept two operands; for @code{ext} both of them must be
 320 registers, while @code{mov} accepts an immediate value as the second
 321 operand.
 322
 323 Unlike @code{movr} and @code{movi}, the other instructions are used
 324 to truncate a wordsize operand to a smaller integer data type or to
 325 convert float data types. You can also use @code{extr} to convert an
 326 integer to a floating point value: the usual options are @code{extr_f}
 327 and @code{extr_d}.
 328
 329 @example
 330 movr                                 _f  _d  O1 = O2
 331 movi                                 _f  _d  O1 = O2
 332 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 333 truncr                               _f  _d  O1 = trunc(O2)
 334 @end example
 335
 336 In 64-bit architectures it may be required to use @code{truncr_f_i},
 337 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 338 the equivalent C code.  Only the @code{_i} modifier is available in
 339 32-bit architectures.
 340
 341 @example
 342 truncr_f_i    = <int> O1 = <float> O2
 343 truncr_f_l    = <long>O1 = <float> O2
 344 truncr_d_i    = <int> O1 = <double>O2
 345 truncr_d_l    = <long>O1 = <double>O2
 346 @end example
 347
 348 The float conversion operations are @emph{destination first,
 349 source second}, but the order of the types is reversed.  This happens
 350 for historical reasons.
 351
 352 @example
 353 extr_f_d    = <double>O1 = <float> O2
 354 extr_d_f    = <float> O1 = <double>O2
 355 @end example
 356
 357 @item Network extensions
 358 These accept two operands, both of which must be registers; these
 359 two instructions actually perform the same task, yet they are
 360 assigned to two mnemonics for the sake of convenience and
 361 completeness.  As usual, the first operand is the destination and
 362 the second is the source.
 363 The @code{_ul} variant is only available in 64-bit architectures.
 364 @example
 365 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 366 ntohr    _us _ui _ul @r{Network-to-host order }
 367 @end example
 368
 369 @item Load operations
 370 @code{ld} accepts two operands while @code{ldx} accepts three;
 371 in both cases, the last can be either a register or an immediate
 372 value. Values are extended (with or without sign, according to
 373 the data type specification) to fit a whole register.
 374 The @code{_ui} and @code{_l} types are only available in 64-bit
 375 architectures.  For convenience, there is a version without a
 376 type modifier for integer or pointer operands that uses the
 377 appropriate wordsize call.
 378 @example
 379 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 380 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 381 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 382 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 383 @end example
 384
 385 @item Store operations
 386 @code{st} accepts two operands while @code{stx} accepts three; in
 387 both cases, the first can be either a register or an immediate
 388 value. Values are sign-extended to fit a whole register.
 389 @example
 390 str     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 391 sti     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 392 stxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 393 stxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 394 @end example
 395 As for the load operations, the @code{_ui} and @code{_l} types are
 396 only available in 64-bit architectures, and for convenience, there
 397 is a version without a type modifier for integer or pointer operands
 398 that uses the appropriate wordsize call.
 399
 400 @item Argument management
 401 These are:
 402 @example
 403 prepare     (not specified)
 404 va_start    (not specified)
 405 pushargr                                   _f  _d
 406 pushargi                                   _f  _d
 407 va_push     (not specified)
 408 arg                                        _f  _d
 409 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 410 va_arg                                         _d
 411 putargr                                    _f  _d
 412 putargi                                    _f  _d
 413 ret         (not specified)
 414 retr                                       _f  _d
 415 reti                                       _f  _d
 416 va_end      (not specified)
 417 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 418 epilog      (not specified)
 419 @end example
 420 As with other operations that use a type modifier, the @code{_ui} and
 421 @code{_l} types are only available in 64-bit architectures, but there
 422 are operations without a type modifier that alias to the appropriate
 423 integer operation with wordsize operands.
 424
 425 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 426 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 427 A code snippet that wants to call another procedure and has to pass
 428 arguments must, in order: use the @code{prepare} instruction and use
 429 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 430 left to right order}; and use @code{finish} or @code{call} (explained below)
 431 to perform the actual call.
 432
 433 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 434 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 435 @code{va_push} is required when passing a @code{va_list} to another function,
 436 because not all architectures expect it as a single pointer. Known case
 437 is DEC Alpha, that requires it as a structure passed by value.
 438
 439 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 440 @code{arg} is different from other instruction in that it does not
 441 actually generate any code: instead, it is a function which returns
 442 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 443 a value'' means that @lightning{} code that compile these
 444 instructions return a value when expanded.} You should call
 445 @code{arg} as soon as possible, before any function call or, more
 446 easily, right after the @code{prolog} instructions
 447 (which is treated later).
 448
 449 @code{getarg} accepts a register argument and a value returned by
 450 @code{arg}, and will move that argument to the register, extending
 451 it (with or without sign, according to the data type specification)
 452 to fit a whole register.  These instructions are more intimately
 453 related to the usage of the @lightning{} instruction set in code
 454 that generates other code, so they will be treated more
 455 specifically in @ref{GNU lightning examples, , Generating code at
 456 run-time}.
 457
 458 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 459 it accepts as first argument a register or immediate, and as
 460 second argument a value returned by @code{arg}. It allows changing,
 461 or restoring an argument to the current function, and is a
 462 construct required to implement tail call optimization. Note that
 463 arguments in registers are very cheap, but will be overwritten
 464 at any moment, including on some operations, for example division,
 465 that on several ports is implemented as a function call.
 466
 467 Finally, the @code{retval} instruction fetches the return value of a
 468 called function in a register.  The @code{retval} instruction takes a
 469 register argument and copies the return value of the previously called
 470 function in that register.  A function with a return value should use
 471 @code{retr} or @code{reti} to put the return value in the return register
 472 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 473
 474 @code{epilog} is an optional call, that marks the end of a function
 475 body. It is automatically generated by @lightning{} if starting a new
 476 function (what should be done after a @code{ret} call) or finishing
 477 generating jit.
 478 It is very important to note that the fact that @code{epilog} being
 479 optional may cause a common mistake. Consider this:
 480 @example
 481 fun1:
 482     prolog
 483     ...
 484     ret
 485 fun2:
 486     prolog
 487 @end example
 488 Because @code{epilog} is added when finding a new @code{prolog},
 489 this will cause the @code{fun2} label to actually be before the
 490 return from @code{fun1}. Because @lightning{} will actually
 491 understand it as:
 492 @example
 493 fun1:
 494     prolog
 495     ...
 496     ret
 497 fun2:
 498     epilog
 499     prolog
 500 @end example
 501
 502 You should observe a few rules when using these macros.  First of
 503 all, if calling a varargs function, you should use the @code{ellipsis}
 504 call to mark the position of the ellipsis in the C prototype.
 505
 506 You should not nest calls to @code{prepare} inside a
 507 @code{prepare/finish} block.  Doing this will result in undefined
 508 behavior. Note that for functions with zero arguments you can use
 509 just @code{call}.
 510
 511 @item Branch instructions
 512 Like @code{arg}, these also return a value which, in this case,
 513 is to be used to compile forward branches as explained in
 514 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 515 compared; of these, the last can be either a register or an immediate.
 516 They are:
 517 @example
 518 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 519 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 520 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 521 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 522 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 523 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 524 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 525 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 526 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 527 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 528 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 529 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 530
 531 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 532 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 533 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 534 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 535 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 536 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 537 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 538 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 539
 540 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 541 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 542 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 543 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 544 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 545 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 546 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 547 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 548 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 549 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 550 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 551 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 552 @end example
 553
 554 @item Jump and return operations
 555 These accept one argument except @code{ret} and @code{jmpi} which
 556 have none; the difference between @code{finishi} and @code{calli}
 557 is that the latter does not clean the stack from pushed parameters
 558 (if any) and the former must @strong{always} follow a @code{prepare}
 559 instruction.
 560 @example
 561 callr     (not specified)                @r{function call to register O1}
 562 calli     (not specified)                @r{function call to immediate O1}
 563 finishr   (not specified)                @r{function call to register O1}
 564 finishi   (not specified)                @r{function call to immediate O1}
 565 jmpr      (not specified)                @r{unconditional jump to register}
 566 jmpi      (not specified)                @r{unconditional jump}
 567 ret       (not specified)                @r{return from subroutine}
 568 retr      _c _uc _s _us _i _ui _l _f _d
 569 reti      _c _uc _s _us _i _ui _l _f _d
 570 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 571                                          @r{to register}
 572 @end example
 573
 574 Like branch instruction, @code{jmpi} also returns a value which is to
 575 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 576 numbers}.
 577
 578 @item Labels
 579 There are 3 @lightning{} instructions to create labels:
 580 @example
 581 label     (not specified)                @r{simple label}
 582 forward   (not specified)                @r{forward label}
 583 indirect  (not specified)                @r{special simple label}
 584 @end example
 585
 586 @code{label} is normally used as @code{patch_at} argument for backward
 587 jumps.
 588
 589 @example
 590         jit_node_t *jump, *label;
 591 label = jit_label();
 592         ...
 593         jump = jit_beqr(JIT_R0, JIT_R1);
 594         jit_patch_at(jump, label);
 595 @end example
 596
 597 @code{forward} is used to patch code generation before the actual
 598 position of the label is known.
 599
 600 @example
 601         jit_node_t *jump, *label;
 602 label = jit_forward();
 603         jump = jit_beqr(JIT_R0, JIT_R1);
 604         jit_patch_at(jump, label);
 605         ...
 606         jit_link(label);
 607 @end example
 608
 609 @code{indirect} is useful when creating jump tables, and tells
 610 @lightning{} to not optimize out a label that is not the target of
 611 any jump, because an indirect jump may land where it is defined.
 612
 613 @example
 614         jit_node_t *jump, *label;
 615         ...
 616         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 617         ...
 618 label = jit_indirect();
 619 @end example
 620
 621 @code{indirect} is an special case of @code{note} and @code{name}
 622 because it is a valid argument to @code{address}.
 623
 624 Note that the usual idiom to write the previous example is
 625 @example
 626         jit_node_t *addr, *jump;
 627 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 628         ...
 629         jmpr(JIT_R0);
 630         ...
 631         jit_patch(addr);                 @rem{/* implicit label added */}
 632 @end example
 633
 634 that automatically binds the implicit label added by @code{patch} with
 635 the @code{movi}, but on some special conditions it is required to create
 636 an "unbound" label.
 637
 638 @item Function prolog
 639
 640 These macros are used to set up a function prolog.  The @code{allocai}
 641 call accept a single integer argument and returns an offset value
 642 for stack storage access.  The @code{allocar} accepts two registers
 643 arguments, the first is set to the offset for stack access, and the
 644 second is the size in bytes argument.
 645
 646 @example
 647 prolog    (not specified)                @r{function prolog}
 648 allocai   (not specified)                @r{reserve space on the stack}
 649 allocar   (not specified)                @r{allocate space on the stack}
 650 @end example
 651
 652 @code{allocai} receives the number of bytes to allocate and returns
 653 the offset from the frame pointer register @code{FP} to the base of
 654 the area.
 655
 656 @code{allocar} receives two register arguments.  The first is where
 657 to store the offset from the frame pointer register @code{FP} to the
 658 base of the area.  The second argument is the size in bytes.  Note
 659 that @code{allocar} is dynamic allocation, and special attention
 660 should be taken when using it.  If called in a loop, every iteration
 661 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 662 depending on backend requirements, even if allocating only one byte.
 663 It is advisable to not use it with @code{frame} and @code{tramp}; it
 664 should work with @code{frame} with special care to call only once,
 665 but is not supported if used in @code{tramp}, even if called only
 666 once.
 667
 668 As a small appetizer, here is a small function that adds 1 to the input
 669 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 670 is a bit different from the one used when writing real subroutines with
 671 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 672 examples, , Generating code at run-time}.
 673
 674 @example
 675 incr:
 676      prolog
 677 in = arg                     @rem{! We have an integer argument}
 678      getarg    R0, in        @rem{! Move it to R0}
 679      addi      R0, R0, 1     @rem{! Add 1}
 680      retr      R0            @rem{! And return the result}
 681 @end example
 682
 683 And here is another function which uses the @code{printf} function from
 684 the standard C library to write a number in hexadecimal notation:
 685
 686 @example
 687 printhex:
 688      prolog
 689 in = arg                     @rem{! Same as above}
 690      getarg    R0, in
 691      prepare                 @rem{! Begin call sequence for printf}
 692      pushargi  "%x"          @rem{! Push format string}
 693      ellipsis                @rem{! Varargs start here}
 694      pushargr  R0            @rem{! Push second argument}
 695      finishi   printf        @rem{! Call printf}
 696      ret                     @rem{! Return to caller}
 697 @end example
 698
 699 @item Register liveness
 700
 701 During code generation, @lightning{} occasionally needs scratch registers
 702 or needs to use architecture-defined registers.  For that, @lightning{}
 703 internally maintains register liveness information.
 704
 705 In the following example, @code{qdivr} will need special registers like
 706 @code{R0} on some architectures.  As @lightning{} understands that
 707 @code{R0} is used in the subsequent instruction, it will create
 708 save/restore code for @code{R0} in case.
 709
 710 @example
 711 ...
 712 qdivr V0, V1, V2, V3
 713 movr  V3, R0
 714 ...
 715 @end example
 716
 717 The same is not true in the example that follows.  Here, @code{R0} is
 718 not alive after the division operation because @code{R0} is neither an
 719 argument register nor a callee-save register.  Thus, no save/restore
 720 code for @code{R0} will be created in case.
 721
 722 @example
 723 ...
 724 qdivr V0, V1, V2, V3
 725 jmpr  R1
 726 ...
 727 @end example
 728
 729 The @code{live} instruction can be used to mark a register as live after
 730 it as in the following example.  Here, @code{R0} will be preserved
 731 across the division.
 732
 733 @example
 734 ...
 735 qdivr V0, V1, V2, V3
 736 live R0
 737 jmpr R1
 738 ...
 739 @end example
 740
 741 The @code{live} instruction is useful at code entry and exit points,
 742 like after and before a @code{callr} instruction.
 743
 744 @item Trampolines, continuations and tail call optimization
 745
 746 Frequently it is required to generate jit code that must jump to
 747 code generated later, possibly from another @code{jit_context_t}.
 748 These require compatible stack frames.
 749
 750 @lightning{} provides two primitives from where trampolines,
 751 continuations and tail call optimization can be implemented.
 752
 753 @example
 754 frame   (not specified)                  @r{create stack frame}
 755 tramp   (not specified)                  @r{assume stack frame}
 756 @end example
 757
 758 @code{frame} receives an integer argument@footnote{It is not
 759 automatically computed because it does not know about the
 760 requirement of later generated code.} that defines the size in
 761 bytes for the stack frame of the current, @code{C} callable,
 762 jit function. To calculate this value, a good formula is maximum
 763 number of arguments to any called native function times
 764 eight@footnote{Times eight so that it works for double arguments.
 765 And would not need conditionals for ports that pass arguments in
 766 the stack.}, plus the sum of the arguments to any call to
 767 @code{jit_allocai}. @lightning{} automatically adjusts this value
 768 for any backend specific stack memory it may need, or any
 769 alignment constraint.
 770
 771 @code{frame} also instructs @lightning{} to save all callee
 772 save registers in the prolog and reload in the epilog.
 773
 774 @example
 775 main:                        @rem{! jit entry point}
 776      prolog                  @rem{! function prolog}
 777      frame  256              @rem{! save all callee save registers and}
 778                              @rem{! reserve at least 256 bytes in stack}
 779 main_loop:
 780      ...
 781      jmpi   handler          @rem{! jumps to external code}
 782      ...
 783      ret                     @rem{! return to the caller}
 784 @end example
 785
 786 @code{tramp} differs from @code{frame} only that a prolog and epilog
 787 will not be generated. Note that @code{prolog} must still be used.
 788 The code under @code{tramp} must be ready to be entered with a jump
 789 at the prolog position, and instead of a return, it must end with
 790 a non conditional jump. @code{tramp} exists solely for the fact
 791 that it allows optimizing out prolog and epilog code that would
 792 never be executed.
 793
 794 @example
 795 handler:                     @rem{! handler entry point}
 796      prolog                  @rem{! function prolog}
 797      tramp  256              @rem{! assumes all callee save registers}
 798                              @rem{! are saved and there is at least}
 799                              @rem{! 256 bytes in stack}
 800      ...
 801      jmpi   main_loop        @rem{! return to the main loop}
 802 @end example
 803
 804 @lightning{} only supports Tail Call Optimization using the
 805 @code{tramp} construct. Any other way is not guaranteed to
 806 work on all ports.
 807
 808 An example of a simple (recursive) tail call optimization:
 809
 810 @example
 811 factorial:                   @rem{! Entry point of the factorial function}
 812      prolog
 813 in = arg                     @rem{! Receive an integer argument}
 814      getarg R0, in           @rem{! Move argument to RO}
 815      prepare
 816          pushargi 1          @rem{! This is the accumulator}
 817          pushargr R0         @rem{! This is the argument}
 818      finishi fact            @rem{! Call the tail call optimized function}
 819      retval R0               @rem{! Fetch the result}
 820      retr R0                 @rem{! Return it}
 821      epilog                  @rem{! Epilog *before* label before prolog}
 822
 823 fact:                        @rem{! Entry point of the helper function}
 824      prolog
 825      frame 16                @rem{! Reserve 16 bytes in the stack}
 826 fact_entry:                  @rem{! This is the tail call entry point}
 827 ac = arg                     @rem{! The accumulator is the first argument}
 828 in = arg                     @rem{! The factorial argument}
 829      getarg R0, ac           @rem{! Move the accumulator to R0}
 830      getarg R1, in           @rem{! Move the argument to R1}
 831      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
 832      mulr R0, R0, R1         @rem{! accumulator *= argument}
 833      putargr R0, ac          @rem{! Update the accumulator}
 834      subi R1, R1, 1          @rem{! argument -= 1}
 835      putargr R1, in          @rem{! Update the argument}
 836      jmpi fact_entry         @rem{! Tail Call Optimize it!}
 837 fact_out:
 838      retr R0                 @rem{! Return the accumulator}
 839 @end example
 840
 841 @item Predicates
 842 @example
 843 forward_p      (not specified)           @r{forward label predicate}
 844 indirect_p     (not specified)           @r{indirect label predicate}
 845 target_p       (not specified)           @r{used label predicate}
 846 arg_register_p (not specified)           @r{argument kind predicate}
 847 callee_save_p  (not specified)           @r{callee save predicate}
 848 pointer_p      (not specified)           @r{pointer predicate}
 849 @end example
 850
 851 @code{forward_p} expects a @code{jit_node_t*} argument, and
 852 returns non zero if it is a forward label reference, that is,
 853 a label returned by @code{forward}, that still needs a
 854 @code{link} call.
 855
 856 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
 857 non zero if it is an indirect label reference, that is, a label that
 858 was returned by @code{indirect}.
 859
 860 @code{target_p} expects a @code{jit_node_t*} argument, that is any
 861 kind of label, and will return non zero if there is at least one
 862 jump or move referencing it.
 863
 864 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
 865 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
 866 will return non zero if the argument lives in a register. This call
 867 is useful to know the live range of register arguments, as those
 868 are very fast to read and write, but have volatile values.
 869
 870 @code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
 871 @code{JIT_Fn}, and will return non zero if the register is callee
 872 save. This call is useful because on several ports, the @code{JIT_Rn}
 873 and @code{JIT_Fn} registers are actually callee save; no need
 874 to save and load the values when making function calls.
 875
 876 @code{pointer_p} expects a pointer argument, and will return non
 877 zero if the pointer is inside the generated jit code. Must be
 878 called after @code{jit_emit} and before @code{jit_destroy_state}.
 879 @end table
 880
 881 @node GNU lightning examples
 882 @chapter Generating code at run-time
 883
 884 To use @lightning{}, you should include the @file{lightning.h} file that
 885 is put in your include directory by the @samp{make install} command.
 886
 887 Each of the instructions above translates to a macro or function call.
 888 All you have to do is prepend @code{jit_} (lowercase) to opcode names
 889 and @code{JIT_} (uppercase) to register names.  Of course, parameters
 890 are to be put between parentheses.
 891
 892 This small tutorial presents three examples:
 893
 894 @iftex
 895 @itemize @bullet
 896 @item
 897 The @code{incr} function found in @ref{The instruction set, ,
 898 @lightning{}'s instruction set}:
 899
 900 @item
 901 A simple function call to @code{printf}
 902
 903 @item
 904 An RPN calculator.
 905
 906 @item
 907 Fibonacci numbers
 908 @end itemize
 909 @end iftex
 910 @ifnottex
 911 @menu
 912 * incr::             A function which increments a number by one
 913 * printf::           A simple function call to printf
 914 * RPN calculator::   A more complex example, an RPN calculator
 915 * Fibonacci::        Calculating Fibonacci numbers
 916 @end menu
 917 @end ifnottex
 918
 919 @node incr
 920 @section A function which increments a number by one
 921
 922 Let's see how to create and use the sample @code{incr} function created
 923 in @ref{The instruction set, , @lightning{}'s instruction set}:
 924
 925 @example
 926 #include <stdio.h>
 927 #include <lightning.h>
 928
 929 static jit_state_t *_jit;
 930
 931 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
 932
 933 int main(int argc, char *argv[])
 934 @{
 935   jit_node_t  *in;
 936   pifi         incr;
 937
 938   init_jit(argv[0]);
 939   _jit = jit_new_state();
 940
 941   jit_prolog();                    @rem{/* @t{     prolog             } */}
 942   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
 943   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
 944   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
 945   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
 946
 947   incr = jit_emit();
 948   jit_clear_state();
 949
 950   @rem{/* call the generated code@comma{} passing 5 as an argument */}
 951   printf("%d + 1 = %d\n", 5, incr(5));
 952
 953   jit_destroy_state();
 954   finish_jit();
 955   return 0;
 956 @}
 957 @end example
 958
 959 Let's examine the code line by line (well, almost@dots{}):
 960
 961 @table @t
 962 @item #include <lightning.h>
 963 You already know about this.  It defines all of @lightning{}'s macros.
 964
 965 @item static jit_state_t *_jit;
 966 You might wonder about what is @code{jit_state_t}.  It is a structure
 967 that stores jit code generation information.  The name @code{_jit} is
 968 special, because since multiple jit generators can run at the same
 969 time, you must either @r{#define _jit my_jit_state} or name it
 970 @code{_jit}.
 971
 972 @item typedef int (*pifi)(int);
 973 Just a handy typedef for a pointer to a function that takes an
 974 @code{int} and returns another.
 975
 976 @item jit_node_t  *in;
 977 Declares a variable to hold an identifier for a function argument. It
 978 is an opaque pointer, that will hold the return of a call to @code{arg}
 979 and be used as argument to @code{getarg}.
 980
 981 @item pifi         incr;
 982 Declares a function pointer variable to a function that receives an
 983 @code{int} and returns an @code{int}.
 984
 985 @item init_jit(argv[0]);
 986 You must call this function before creating a @code{jit_state_t}
 987 object. This function does global state initialization, and may need
 988 to detect CPU or Operating System features.  It receives a string
 989 argument that is later used to read symbols from a shared object using
 990 GNU binutils if disassembly was enabled at configure time. If no
 991 disassembly will be performed a NULL pointer can be used as argument.
 992
 993 @item _jit = jit_new_state();
 994 This call initializes a @lightning{} jit state.
 995
 996 @item jit_prolog();
 997 Ok, so we start generating code for our beloved function@dots{}
 998
 999 @item in = jit_arg();
1000 @itemx jit_getarg(JIT_R0, in);
1001 We retrieve the first (and only) argument, an integer, and store it
1002 into the general-purpose register @code{R0}.
1003
1004 @item jit_addi(JIT_R0, JIT_R0, 1);
1005 We add one to the content of the register.
1006
1007 @item jit_retr(JIT_R0);
1008 This instruction generates a standard function epilog that returns
1009 the contents of the @code{R0} register.
1010
1011 @item incr = jit_emit();
1012 This instruction is very important.  It actually translates the
1013 @lightning{} macros used before to machine code, flushes the generated
1014 code area out of the processor's instruction cache and return a
1015 pointer to the start of the code.
1016
1017 @item jit_clear_state();
1018 This call cleanups any data not required for jit execution. Note
1019 that it must be called after any call to @code{jit_print} or
1020 @code{jit_address}, as this call destroy the @lightning{}
1021 intermediate representation.
1022
1023 @item printf("%d + 1 = %d", 5, incr(5));
1024 Calling our function is this simple---it is not distinguishable from
1025 a normal C function call, the only difference being that @code{incr}
1026 is a variable.
1027
1028 @item jit_destroy_state();
1029 Releases all memory associated with the jit context. It should be
1030 called after known the jit will no longer be called.
1031
1032 @item finish_jit();
1033 This call cleanups any global state hold by @lightning{}, and is
1034 advisable to call it once jit code will no longer be generated.
1035 @end table
1036
1037 @lightning{} abstracts two phases of dynamic code generation: selecting
1038 instructions that map the standard representation, and emitting binary
1039 code for these instructions.  The client program has the responsibility
1040 of describing the code to be generated using the standard @lightning{}
1041 instruction set.
1042
1043 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1044 architecture (on the right is the code that an assembly-language
1045 programmer would write):
1046
1047 @table @b
1048 @item SPARC
1049 @example
1050       save  %sp, -112, %sp
1051       mov  %i0, %g2                 retl
1052       inc  %g2                      inc %o0
1053       mov  %g2, %i0
1054       restore
1055       retl
1056       nop
1057 @end example
1058 In this case, @lightning{} introduces overhead to create a register
1059 window (not knowing that the procedure is a leaf procedure) and to
1060 move the argument to the general purpose register @code{R0} (which
1061 maps to @code{%g2} on the SPARC).
1062 @end table
1063
1064 @table @b
1065 @item x86_64
1066 @example
1067     sub   $0x30,%rsp
1068     mov   %rbp,(%rsp)
1069     mov   %rsp,%rbp
1070     sub   $0x18,%rsp
1071     mov   %rdi,%rax            mov %rdi, %rax
1072     add   $0x1,%rax            inc %rax
1073     mov   %rbp,%rsp
1074     mov   (%rsp),%rbp
1075     add   $0x30,%rsp
1076     retq                       retq
1077 @end example
1078 In this case, the main overhead is due to the function's prolog and
1079 epilog, and stack alignment after reserving stack space for word
1080 to/from float conversions or moving data from/to x87 to/from SSE.
1081 Note that besides allocating space to save callee saved registers,
1082 no registers are saved/restored because @lightning{} notices those
1083 registers are not modified. There is currently no logic to detect
1084 if it needs to allocate stack space for type conversions neither
1085 proper leaf function detection, but these are subject to change
1086 (FIXME).
1087 @end table
1088
1089 @node printf
1090 @section A simple function call to @code{printf}
1091
1092 Again, here is the code for the example:
1093
1094 @example
1095 #include <stdio.h>
1096 #include <lightning.h>
1097
1098 static jit_state_t *_jit;
1099
1100 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1101
1102 int main(int argc, char *argv[])
1103 @{
1104   pvfi          myFunction;             @rem{/* ptr to generated code */}
1105   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1106   jit_node_t    *in;                    @rem{/* to get the argument */}
1107
1108   init_jit(argv[0]);
1109   _jit = jit_new_state();
1110
1111   start = jit_note(__FILE__, __LINE__);
1112   jit_prolog();
1113   in = jit_arg();
1114   jit_getarg(JIT_R1, in);
1115   jit_prepare();
1116   jit_pushargi((jit_word_t)"generated %d bytes\n");
1117   jit_ellipsis();
1118   jit_pushargr(JIT_R1);
1119   jit_finishi(printf);
1120   jit_ret();
1121   jit_epilog();
1122   end = jit_note(__FILE__, __LINE__);
1123
1124   myFunction = jit_emit();
1125
1126   @rem{/* call the generated code@comma{} passing its size as argument */}
1127   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1128   jit_clear_state();
1129
1130   jit_disassemble();
1131
1132   jit_destroy_state();
1133   finish_jit();
1134   return 0;
1135 @}
1136 @end example
1137
1138 The function shows how many bytes were generated.  Most of the code
1139 is not very interesting, as it resembles very closely the program
1140 presented in @ref{incr, , A function which increments a number by one}.
1141
1142 For this reason, we're going to concentrate on just a few statements.
1143
1144 @table @t
1145 @item start = jit_note(__FILE__, __LINE__);
1146 @itemx @r{@dots{}}
1147 @itemx end = jit_note(__FILE__, __LINE__);
1148 These two instruction call the @code{jit_note} macro, which creates
1149 a note in the jit code; arguments to @code{jit_note} usually are a
1150 filename string and line number integer, but using NULL for the
1151 string argument is perfectly valid if only need to create a simple
1152 marker in the code.
1153
1154 @item jit_ellipsis();
1155 @code{ellipsis} usually is only required if calling varargs functions
1156 with double arguments, but it is a good practice to properly describe
1157 the @r{@dots{}} in the call sequence.
1158
1159 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1160 Note the use of the @code{(jit_word_t)} cast, that is used only
1161 to avoid a compiler warning, due to using a pointer where a
1162 wordsize integer type was expected.
1163
1164 @item jit_prepare();
1165 @itemx @r{@dots{}}
1166 @itemx jit_finishi(printf);
1167 Once the arguments to @code{printf} have been pushed, what means
1168 moving them to stack or register arguments, the @code{printf}
1169 function is called and the stack cleaned.  Note how @lightning{}
1170 abstracts the differences between different architectures and
1171 ABI's -- the client program does not know how parameter passing
1172 works on the host architecture.
1173
1174 @item jit_epilog();
1175 Usually it is not required to call @code{epilog}, but because it
1176 is implicitly called when noticing the end of a function, if the
1177 @code{end} variable was set with a @code{note} call after the
1178 @code{ret}, it would not consider the function epilog.
1179
1180 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1181 This calls the generate jit function passing as argument the offset
1182 difference from the @code{start} and @code{end} notes. The @code{address}
1183 call must be done after the @code{emit} call or either a fatal error
1184 will happen (if @lightning{} is built with assertions enable) or an
1185 undefined value will be returned.
1186
1187 @item jit_clear_state();
1188 Note that @code{jit_clear_state} was called after executing jit in
1189 this example. It was done because it must be called after any call
1190 to @code{jit_address} or @code{jit_print}.
1191
1192 @item jit_disassemble();
1193 @code{disassemble} will dump the generated code to standard output,
1194 unless @lightning{} was built with the disassembler disabled, in which
1195 case no output will be shown.
1196 @end table
1197
1198 @node RPN calculator
1199 @section A more complex example, an RPN calculator
1200
1201 We create a small stack-based RPN calculator which applies a series
1202 of operators to a given parameter and to other numeric operands.
1203 Unlike previous examples, the code generator is fully parameterized
1204 and is able to compile different formulas to different functions.
1205 Here is the code for the expression compiler; a sample usage will
1206 follow.
1207
1208 Since @lightning{} does not provide push/pop instruction, this
1209 example uses a stack-allocated area to store the data.  Such an
1210 area can be allocated using the macro @code{allocai}, which
1211 receives the number of bytes to allocate and returns the offset
1212 from the frame pointer register @code{FP} to the base of the
1213 area.
1214
1215 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1216 to access stack-allocated variables.  However, it is possible to
1217 use operations such as @code{add} to compute the address of the
1218 variables, and pass the address around.
1219
1220 @example
1221 #include <stdio.h>
1222 #include <lightning.h>
1223
1224 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1225
1226 static jit_state_t *_jit;
1227
1228 void stack_push(int reg, int *sp)
1229 @{
1230   jit_stxi_i (*sp, JIT_FP, reg);
1231   *sp += sizeof (int);
1232 @}
1233
1234 void stack_pop(int reg, int *sp)
1235 @{
1236   *sp -= sizeof (int);
1237   jit_ldxi_i (reg, JIT_FP, *sp);
1238 @}
1239
1240 jit_node_t *compile_rpn(char *expr)
1241 @{
1242   jit_node_t *in, *fn;
1243   int stack_base, stack_ptr;
1244
1245   fn = jit_note(NULL, 0);
1246   jit_prolog();
1247   in = jit_arg();
1248   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1249
1250   jit_getarg_i(JIT_R2, in);
1251
1252   while (*expr) @{
1253     char buf[32];
1254     int n;
1255     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1256       expr += n - 1;
1257       stack_push(JIT_R0, &stack_ptr);
1258       jit_movi(JIT_R0, atoi(buf));
1259     @} else if (*expr == 'x') @{
1260       stack_push(JIT_R0, &stack_ptr);
1261       jit_movr(JIT_R0, JIT_R2);
1262     @} else if (*expr == '+') @{
1263       stack_pop(JIT_R1, &stack_ptr);
1264       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1265     @} else if (*expr == '-') @{
1266       stack_pop(JIT_R1, &stack_ptr);
1267       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1268     @} else if (*expr == '*') @{
1269       stack_pop(JIT_R1, &stack_ptr);
1270       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1271     @} else if (*expr == '/') @{
1272       stack_pop(JIT_R1, &stack_ptr);
1273       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1274     @} else @{
1275       fprintf(stderr, "cannot compile: %s\n", expr);
1276       abort();
1277     @}
1278     ++expr;
1279   @}
1280   jit_retr(JIT_R0);
1281   jit_epilog();
1282   return fn;
1283 @}
1284 @end example
1285
1286 The principle on which the calculator is based is easy: the stack top
1287 is held in R0, while the remaining items of the stack are held in the
1288 memory area that we allocate with @code{allocai}.  Compiling a numeric
1289 operand or the argument @code{x} pushes the old stack top onto the
1290 stack and moves the operand into R0; compiling an operator pops the
1291 second operand off the stack into R1, and compiles the operation so
1292 that the result goes into R0, thus becoming the new stack top.
1293
1294 This example allocates a fixed area for 32 @code{int}s.  This is not
1295 a problem when the function is a leaf like in this case; in a full-blown
1296 compiler you will want to analyze the input and determine the number
1297 of needed stack slots---a very simple example of register allocation.
1298 The area is then managed like a stack using @code{stack_push} and
1299 @code{stack_pop}.
1300
1301 Source code for the client (which lies in the same source file) follows:
1302
1303 @example
1304 int main(int argc, char *argv[])
1305 @{
1306   jit_node_t *nc, *nf;
1307   pifi c2f, f2c;
1308   int i;
1309
1310   init_jit(argv[0]);
1311   _jit = jit_new_state();
1312
1313   nc = compile_rpn("32x9*5/+");
1314   nf = compile_rpn("x32-5*9/");
1315   (void)jit_emit();
1316   c2f = (pifi)jit_address(nc);
1317   f2c = (pifi)jit_address(nf);
1318   jit_clear_state();
1319
1320   printf("\nC:");
1321   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1322   printf("\nF:");
1323   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1324   printf("\n");
1325
1326   printf("\nF:");
1327   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1328   printf("\nC:");
1329   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1330   printf("\n");
1331
1332   jit_destroy_state();
1333   finish_jit();
1334   return 0;
1335 @}
1336 @end example
1337
1338 The client displays a conversion table between Celsius and Fahrenheit
1339 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1340 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1341 respectively.
1342
1343 Providing the formula as an argument to @code{compile_rpn} effectively
1344 parameterizes code generation, making it possible to use the same code
1345 to compile different functions; this is what makes dynamic code
1346 generation so powerful.
1347
1348 @node Fibonacci
1349 @section Fibonacci numbers
1350
1351 The code in this section calculates the Fibonacci sequence. That is
1352 modeled by the recurrence relation:
1353 @display
1354      f(0) = 0
1355      f(1) = f(2) = 1
1356      f(n) = f(n-1) + f(n-2)
1357 @end display
1358
1359 The purpose of this example is to introduce branches.  There are two
1360 kind of branches: backward branches and forward branches.  We'll
1361 present the calculation in a recursive and iterative form; the
1362 former only uses forward branches, while the latter uses both.
1363
1364 @example
1365 #include <stdio.h>
1366 #include <lightning.h>
1367
1368 static jit_state_t *_jit;
1369
1370 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1371
1372 int main(int argc, char *argv[])
1373 @{
1374   pifi       fib;
1375   jit_node_t *label;
1376   jit_node_t *call;
1377   jit_node_t *in;                 @rem{/* offset of the argument */}
1378   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1379   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1380
1381   init_jit(argv[0]);
1382   _jit = jit_new_state();
1383
1384   label = jit_label();
1385         jit_prolog   ();
1386   in =  jit_arg      ();
1387         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1388  zero = jit_beqi     (JIT_R0, 0);
1389         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1390         jit_movi     (JIT_R0, 1);
1391   ref = jit_blei     (JIT_V0, 2);
1392         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1393         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1394         jit_prepare();
1395           jit_pushargr(JIT_V1);
1396         call = jit_finishi(NULL);
1397         jit_patch_at(call, label);
1398         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1399         jit_prepare();
1400           jit_pushargr(JIT_V2);
1401         call = jit_finishi(NULL);
1402         jit_patch_at(call, label);
1403         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1404         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1405
1406   jit_patch(ref);                               @rem{/* patch jump */}
1407   jit_patch(zero);                              @rem{/* patch jump */}
1408         jit_retr(JIT_R0);
1409
1410   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1411   fib = jit_emit();
1412   jit_clear_state();
1413   printf("fib(%d) = %d\n", 32, fib(32));
1414   jit_destroy_state();
1415   finish_jit();
1416   return 0;
1417 @}
1418 @end example
1419
1420 As said above, this is the first example of dynamically compiling
1421 branches.  Branch instructions have two operands containing the
1422 values to be compared, and return a @code{jit_note_t *} object
1423 to be patched.
1424
1425 Because labels final address are only known after calling @code{emit},
1426 it is required to call @code{patch} or @code{patch_at}, what does
1427 tell @lightning{} that the target to patch is actually a pointer to
1428 a @code{jit_node_t *} object, otherwise, it would assume that is
1429 a pointer to a C function. Note that conditional branches do not
1430 receive a label argument, so they must be patched.
1431
1432 You need to call @code{patch_at} on the return of value @code{calli},
1433 @code{finishi}, and @code{calli} if it is actually referencing a label
1434 in the jit code. All branch instructions do not receive a label
1435 argument. Note that @code{movi} is an special case, and patching it
1436 is usually done to get the final address of a label, usually to later
1437 call @code{jmpr}.
1438
1439 Now, here is the iterative version:
1440
1441 @example
1442 #include <stdio.h>
1443 #include <lightning.h>
1444
1445 static jit_state_t *_jit;
1446
1447 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1448
1449 int main(int argc, char *argv[])
1450 @{
1451   pifi       fib;
1452   jit_node_t *in;               @rem{/* offset of the argument */}
1453   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1454   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1455   jit_node_t *jump;             @rem{/* jump to start of loop */}
1456   jit_node_t *loop;             @rem{/* start of the loop */}
1457
1458   init_jit(argv[0]);
1459   _jit = jit_new_state();
1460
1461         jit_prolog   ();
1462   in =  jit_arg      ();
1463         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1464  zero = jit_beqi     (JIT_R0, 0);
1465         jit_movr     (JIT_R1, JIT_R0);
1466         jit_movi     (JIT_R0, 1);
1467   ref = jit_blti     (JIT_R1, 2);
1468         jit_subi     (JIT_R2, JIT_R2, 2);
1469         jit_movr     (JIT_R1, JIT_R0);
1470
1471   loop= jit_label();
1472         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1473         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1474         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1475         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1476   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1477   jit_patch_at(jump, loop);
1478
1479   jit_patch(ref);                               @rem{/* patch forward jump */}
1480   jit_patch(zero);                              @rem{/* patch forward jump */}
1481         jit_retr     (JIT_R0);
1482
1483   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1484   fib = jit_emit();
1485   jit_clear_state();
1486   printf("fib(%d) = %d\n", 36, fib(36));
1487   jit_destroy_state();
1488   finish_jit();
1489   return 0;
1490 @}
1491 @end example
1492
1493 This code calculates the recurrence relation using iteration (a
1494 @code{for} loop in high-level languages).  There are no function
1495 calls anymore: instead, there is a backward jump (the @code{bnei} at
1496 the end of the loop).
1497
1498 Note that the program must remember the address for backward jumps;
1499 for forward jumps it is only required to remember the jump code,
1500 and call @code{patch} for the implicit label.
1501
1502 @node Reentrancy
1503 @chapter Re-entrant usage of @lightning{}
1504
1505 @lightning{} uses the special @code{_jit} identifier. To be able
1506 to be able to use multiple jit generation states at the same
1507 time, it is required to used code similar to:
1508
1509 @example
1510     struct jit_state lightning;
1511     #define lightning _jit
1512 @end example
1513
1514 This will cause the symbol defined to @code{_jit} to be passed as
1515 the first argument to the underlying @lightning{} implementation,
1516 that is usually a function with an @code{_} (underscode) prefix
1517 and with an argument named @code{_jit}, in the pattern:
1518
1519 @example
1520     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1521     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1522 @end example
1523
1524 The reason for this is to use the same syntax as the initial lightning
1525 implementation and to avoid needing the user to keep adding an extra
1526 argument to every call, as multiple jit states generating code in
1527 paralell should be very uncommon.
1528
1529 @node Registers
1530 @chapter Accessing the whole register file
1531
1532 As mentioned earlier in this chapter, all @lightning{} back-ends are
1533 guaranteed to have at least six general-purpose integer registers and
1534 six floating-point registers, but many back-ends will have more.
1535
1536 To access the entire register files, you can use the
1537 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1538 accept a parameter that identifies the register number, which
1539 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1540 and @code{JIT_F_NUM} respectively; the number need not be
1541 constant.  Of course, expressions like @code{JIT_R0} and
1542 @code{JIT_R(0)} denote the same register, and likewise for
1543 integer callee-saved, or floating-point, registers.
1544
1545 @section Scratch registers
1546
1547 For operations, @lightning{} does not support directly, like storing
1548 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1549 acquire and release a scratch register as in the following pattern:
1550
1551 @example
1552     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1553     jit_movi (reg, immediate);
1554     jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1555     jit_unget_reg (reg);
1556 @end example
1557
1558 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1559 reloads but don't follow branches, the code between both must be in
1560 the same basic block and must not contain any branches as in the
1561 following (bad) example.
1562
1563 @example
1564     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1565     jit_ldxi (reg, JIT_V0, offset);
1566     jump = jit_bnei (reg, V0);
1567     jit_movr (JIT_V1, reg);
1568     jit_patch (jump);
1569     jit_unget_reg (reg);
1570 @end example
1571
1572 @node Customizations
1573 @chapter Customizations
1574
1575 Frequently it is desirable to have more control over how code is
1576 generated or how memory is used during jit generation or execution.
1577
1578 @section Memory functions
1579 To aid in complete control of memory allocation and deallocation
1580 @lightning{} provides wrappers that default to standard @code{malloc},
1581 @code{realloc} and @code{free}. These are loosely based on the
1582 GNU GMP counterparts, with the difference that they use the same
1583 prototype of the system allocation functions, that is, no @code{size}
1584 for @code{free} or @code{old_size} for @code{realloc}.
1585
1586 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1587 @lightning{} guarantees that memory is only allocated or released
1588 using these wrapped functions, but you must note that if lightning
1589 was linked to GNU binutils, malloc is probably will be called multiple
1590 times from there when initializing the disassembler.
1591
1592 Because @code{init_jit} may call memory functions, if you need to call
1593 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1594 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1595 previous or default wrappers will be passed.
1596 @end deftypefun
1597
1598 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1599 Get the current memory allocation function. Also, unlike the GNU GMP
1600 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1601 @end deftypefun
1602
1603 @section Alternate code buffer
1604 To instruct @lightning{} to use an alternate code buffer it is required
1605 to call @code{jit_realize} before @code{jit_emit}, and then query states
1606 and customize as appropriate.
1607
1608 @deftypefun void jit_realize ()
1609 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1610 that no other @code{jit_xyz} call will be made.
1611 @end deftypefun
1612
1613 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1614 Returns NULL or the previous value set with @code{jit_set_code}, and
1615 sets the @var{code_size} argument to an appropriate value.
1616 If @code{jit_get_code} is called before @code{jit_emit}, the
1617 @var{code_size} argument is set to the expected amount of bytes
1618 required to generate code.
1619 If @code{jit_get_code} is called after @code{jit_emit}, the
1620 @var{code_size} argument is set to the exact amount of bytes used
1621 by the code.
1622 @end deftypefun
1623
1624 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1625 Instructs @lightning{} to output to the @var{code} argument and
1626 use @var{size} as a guard to not write to invalid memory. If during
1627 @code{jit_emit} @lightning{} finds out that the code would not fit
1628 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1629 @end deftypefun
1630
1631 A simple example of a loop using an alternate buffer is:
1632
1633 @example
1634   jit_uint8_t   *code;
1635   int           *(func)(int);      @rem{/* function pointer */}
1636   jit_word_t     code_size;
1637   jit_word_t     real_code_size;
1638   @rem{...}
1639   jit_realize();                   @rem{/* ready to generate code */}
1640   jit_get_code(&code_size);        @rem{/* get expected code size */}
1641   code_size = (code_size + 4095) & -4096;
1642   do (;;) @{
1643     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1644                 MAP_PRIVATE | MAP_ANON, -1, 0);
1645     jit_set_code(code, code_size);
1646     if ((func = jit_emit()) == NULL) @{
1647       munmap(code, code_size);
1648       code_size += 4096;
1649     @}
1650   @} while (func == NULL);
1651   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1652 @end example
1653
1654 The first call to @code{jit_get_code} should return @code{NULL} and set
1655 the @code{code_size} argument to the expected amount of bytes required
1656 to emit code.
1657 The second call to @code{jit_get_code} is after a successful call to
1658 @code{jit_emit}, and will return the value previously set with
1659 @code{jit_set_code} and set the @code{real_code_size} argument to the
1660 exact amount of bytes used to emit the code.
1661
1662 @section Alternate data buffer
1663 Sometimes it may be desirable to customize how, or to prevent
1664 @lightning{} from using an extra buffer for constants or debug
1665 annotation. Usually when also using an alternate code buffer.
1666
1667 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1668 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1669 and sets the @var{data_size} argument to how many bytes are required
1670 for the constants data buffer, and @var{note_size} to how many bytes
1671 are required to store the debug note information.
1672 Note that it always preallocate one debug note entry even if
1673 @code{jit_name} or @code{jit_note} are never called, but will return
1674 zero in the @var{data_size} argument if no constant is required;
1675 constants are only used for the @code{float} and @code{double} operations
1676 that have an immediate argument, and not in all @lightning{} ports.
1677 @end deftypefun
1678
1679 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1680
1681 @var{data} can be NULL if disabling constants and annotations, otherwise,
1682 a valid pointer must be passed. An assertion is done that the data will
1683 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1684 with @code{-DNDEBUG}).
1685
1686 @var{size} tells the space in bytes available in @var{data}.
1687
1688 @var{flags} can be zero to tell to just use the alternate data buffer,
1689 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1690
1691 @table @t
1692 @item JIT_DISABLE_DATA
1693 @cindex JIT_DISABLE_DATA
1694 Instructs @lightning{} to not use a constant table, but to use an
1695 alternate method to synthesize those, usually with a larger code
1696 sequence using stack space to transfer the value from a GPR to a
1697 FPR register.
1698
1699 @item JIT_DISABLE_NOTE
1700 @cindex JIT_DISABLE_NOTE
1701 Instructs @lightning{} to not store file or function name, and
1702 line numbers in the constant buffer.
1703 @end table
1704 @end deftypefun
1705
1706 A simple example of a preventing usage of a data buffer is:
1707
1708 @example
1709   @rem{...}
1710   jit_realize();                        @rem{/* ready to generate code */}
1711   jit_get_data(NULL, NULL);
1712   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1713   @rem{...}
1714 @end example
1715
1716 Or to only use a data buffer, if required:
1717
1718 @example
1719   jit_uint8_t   *data;
1720   jit_word_t     data_size;
1721   @rem{...}
1722   jit_realize();                        @rem{/* ready to generate code */}
1723   jit_get_data(&data_size, NULL);
1724   if (data_size)
1725     data = malloc(data_size);
1726   else
1727     data = NULL;
1728   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1729   @rem{...}
1730   if (data)
1731     free(data);
1732   @rem{...}
1733 @end example
1734
1735 @node Acknowledgements
1736 @chapter Acknowledgements
1737
1738 As far as I know, the first general-purpose portable dynamic code
1739 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1740 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1741 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1742 directly inspired @lightning{}.
1743
1744 Thanks go to Ian Piumarta, who kindly accepted to release his own
1745 program @sc{ccg} under the GNU General Public License, thereby allowing
1746 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1747 @sc{ccg} provides a way of dynamically assemble programs written in the
1748 underlying architecture's assembly language.  So it is not portable,
1749 yet very interesting.
1750
1751 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1752 was first developed as a tool to be used in GNU Smalltalk's dynamic
1753 translator from bytecodes to native code.