deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Registers::               Accessing the whole register file
  34 * Customizations::          Advanced code generation customizations
  35 * Acknowledgements::        Acknowledgements for GNU lightning
  36 @end menu
  37 @end ifnottex
  38
  39 @node Overview
  40 @chapter Introduction to @lightning{}
  41
  42 @iftex
  43 This document describes @value{TOPIC} the @lightning{} library for
  44 dynamic code generation.
  45 @end iftex
  46
  47 Dynamic code generation is the generation of machine code
  48 at runtime. It is typically used to strip a layer of interpretation
  49 by allowing compilation to occur at runtime.  One of the most
  50 well-known applications of dynamic code generation is perhaps that
  51 of interpreters that compile source code to an intermediate bytecode
  52 form, which is then recompiled to machine code at run-time: this
  53 approach effectively combines the portability of bytecode
  54 representations with the speed of machine code.  Another common
  55 application of dynamic code generation is in the field of hardware
  56 simulators and binary emulators, which can use the same techniques
  57 to translate simulated instructions to the instructions of the
  58 underlying machine.
  59
  60 Yet other applications come to mind: for example, windowing
  61 @dfn{bitblt} operations, matrix manipulations, and network packet
  62 filters.  Albeit very powerful and relatively well known within the
  63 compiler community, dynamic code generation techniques are rarely
  64 exploited to their full potential and, with the exception of the
  65 two applications described above, have remained curiosities because
  66 of their portability and functionality barriers: binary instructions
  67 are generated, so programs using dynamic code generation must be
  68 retargeted for each machine; in addition, coding a run-time code
  69 generator is a tedious and error-prone task more than a difficult one.
  70
  71 @lightning{} provides a portable, fast and easily retargetable dynamic
  72 code generation system.
  73
  74 To be portable, @lightning{} abstracts over current architectures'
  75 quirks and unorthogonalities.  The interface that it exposes to is that
  76 of a standardized RISC architecture loosely based on the SPARC and MIPS
  77 chips.  There are a few general-purpose registers (six, not including
  78 those used to receive and pass parameters between subroutines), and
  79 arithmetic operations involve three operands---either three registers
  80 or two registers and an arbitrarily sized immediate value.
  81
  82 On one hand, this architecture is general enough that it is possible to
  83 generate pretty efficient code even on CISC architectures such as the
  84 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  85 real architectures closely enough that, most of the time, the
  86 compiler's constant folding pass ends up generating code which
  87 assembles machine instructions without further tests.
  88
  89 @node Installation
  90 @chapter Configuring and installing @lightning{}
  91
  92 The first thing to do to use @lightning{} is to configure the
  93 program, picking the set of macros to be used on the host
  94 architecture; this configuration is automatically performed by
  95 the @file{configure} shell script; to run it, merely type:
  96 @example
  97      ./configure
  98 @end example
  99
 100 @lightning{} supports the @code{--enable-disassembler} option, that
 101 enables linking to GNU binutils and optionally print human readable
 102 disassembly of the jit code. This option can be disabled by the
 103 @code{--disable-disassembler} option.
 104
 105 Another option that @file{configure} accepts is
 106 @code{--enable-assertions}, which enables several consistency checks in
 107 the run-time assemblers.  These are not usually needed, so you can
 108 decide to simply forget about it; also remember that these consistency
 109 checks tend to slow down your code generator.
 110
 111 After you've configured @lightning{}, run @file{make} as usual.
 112
 113 @lightning{} has an extensive set of tests to validate it is working
 114 correctly in the build host. To test it run:
 115 @example
 116     make check
 117 @end example
 118
 119 The next important step is:
 120 @example
 121     make install
 122 @end example
 123
 124 This ends the process of installing @lightning{}.
 125
 126 @node The instruction set
 127 @chapter @lightning{}'s instruction set
 128
 129 @lightning{}'s instruction set was designed by deriving instructions
 130 that closely match those of most existing RISC architectures, or
 131 that can be easily syntesized if absent.  Each instruction is composed
 132 of:
 133 @itemize @bullet
 134 @item
 135 an operation, like @code{sub} or @code{mul}
 136
 137 @item
 138 most times, a register/immediate flag (@code{r} or @code{i})
 139
 140 @item
 141 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 142 @end itemize
 143
 144 Examples of legal mnemonics are @code{addr} (integer add, with three
 145 register operands) and @code{muli} (integer multiply, with two
 146 register operands and an immediate operand).  Each instruction takes
 147 two or three operands; in most cases, one of them can be an immediate
 148 value instead of a register.
 149
 150 Most @lightning{} integer operations are signed wordsize operations,
 151 with the exception of operations that convert types, or load or store
 152 values to/from memory. When applicable, the types and C types are as
 153 follow:
 154
 155 @example
 156      _c         @r{signed char}
 157      _uc        @r{unsigned char}
 158      _s         @r{short}
 159      _us        @r{unsigned short}
 160      _i         @r{int}
 161      _ui        @r{unsigned int}
 162      _l         @r{long}
 163      _f         @r{float}
 164      _d         @r{double}
 165 @end example
 166
 167 Most integer operations do not need a type modifier, and when loading or
 168 storing values to memory there is an alias to the proper operation
 169 using wordsize operands, that is, if ommited, the type is @r{int} on
 170 32-bit architectures and @r{long} on 64-bit architectures.  Note
 171 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 172
 173 When an unsigned operation result differs from the equivalent signed
 174 operation, there is a the @code{_u} modifier.
 175
 176 There are at least seven integer registers, of which six are
 177 general-purpose, while the last is used to contain the frame pointer
 178 (@code{FP}).  The frame pointer can be used to allocate and access local
 179 variables on the stack, using the @code{allocai} or @code{allocar}
 180 instruction.
 181
 182 Of the general-purpose registers, at least three are guaranteed to be
 183 preserved across function calls (@code{V0}, @code{V1} and
 184 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 185 @code{R2}).  Six registers are not very much, but this
 186 restriction was forced by the need to target CISC architectures
 187 which, like the x86, are poor of registers; anyway, backends can
 188 specify the actual number of available registers with the calls
 189 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 190 (for callee-save registers).
 191
 192 There are at least six floating-point registers, named @code{F0} to
 193 @code{F5}.  These are usually caller-save and are separate from the integer
 194 registers on the supported architectures; on Intel architectures,
 195 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 196 the register stack is mapped to a flat register file.  As for the
 197 integer registers, the macro @code{JIT_F_NUM} yields the number of
 198 floating-point registers.
 199
 200 The complete instruction set follows; as you can see, most non-memory
 201 operations only take integers (either signed or unsigned) as operands;
 202 this was done in order to reduce the instruction set, and because most
 203 architectures only provide word and long word operations on registers.
 204 There are instructions that allow operands to be extended to fit a larger
 205 data type, both in a signed and in an unsigned way.
 206
 207 @table @b
 208 @item Binary ALU operations
 209 These accept three operands; the last one can be an immediate.
 210 @code{addx} operations must directly follow @code{addc}, and
 211 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 212 Most, if not all, architectures do not support @r{float} or @r{double}
 213 immediate operands; lightning emulates those operations by moving the
 214 immediate to a temporary register and emiting the call with only
 215 register operands.
 216 @example
 217 addr         _f  _d  O1 = O2 + O3
 218 addi         _f  _d  O1 = O2 + O3
 219 addxr                O1 = O2 + (O3 + carry)
 220 addxi                O1 = O2 + (O3 + carry)
 221 addcr                O1 = O2 + O3, set carry
 222 addci                O1 = O2 + O3, set carry
 223 subr         _f  _d  O1 = O2 - O3
 224 subi         _f  _d  O1 = O2 - O3
 225 subxr                O1 = O2 - (O3 + carry)
 226 subxi                O1 = O2 - (O3 + carry)
 227 subcr                O1 = O2 - O3, set carry
 228 subci                O1 = O2 - O3, set carry
 229 rsbr         _f  _d  O1 = O3 - O1
 230 rsbi         _f  _d  O1 = O3 - O1
 231 mulr         _f  _d  O1 = O2 * O3
 232 muli         _f  _d  O1 = O2 * O3
 233 divr     _u  _f  _d  O1 = O2 / O3
 234 divi     _u  _f  _d  O1 = O2 / O3
 235 remr     _u          O1 = O2 % O3
 236 remi     _u          O1 = O2 % O3
 237 andr                 O1 = O2 & O3
 238 andi                 O1 = O2 & O3
 239 orr                  O1 = O2 | O3
 240 ori                  O1 = O2 | O3
 241 xorr                 O1 = O2 ^ O3
 242 xori                 O1 = O2 ^ O3
 243 lshr                 O1 = O2 << O3
 244 lshi                 O1 = O2 << O3
 245 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 246 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 247 movzr                O1 = O3 ? O1 : O2
 248 movnr                O1 = O3 ? O2 : O1
 249 @end example
 250
 251 @item Four operand binary ALU operations
 252 These accept two result registers, and two operands; the last one can
 253 be an immediate. The first two arguments cannot be the same register.
 254
 255 @code{qmul} stores the low word of the result in @code{O1} and the
 256 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 257 means there was no overflow. For signed multiplication, no overflow
 258 check is based on sign, and can be detected if @code{O2} is zero or
 259 minus one.
 260
 261 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 262 @code{O2}. It can be used as quick way to check if a division is
 263 exact, in which case the remainder is zero.
 264
 265 @example
 266 qmulr    _u       O1 O2 = O3 * O4
 267 qmuli    _u       O1 O2 = O3 * O4
 268 qdivr    _u       O1 O2 = O3 / O4
 269 qdivi    _u       O1 O2 = O3 / O4
 270 @end example
 271
 272 @item Unary ALU operations
 273 These accept two operands, both of which must be registers.
 274 @example
 275 negr         _f  _d  O1 = -O2
 276 comr                 O1 = ~O2
 277 @end example
 278
 279 These unary ALU operations are only defined for float operands.
 280 @example
 281 absr         _f  _d  O1 = fabs(O2)
 282 sqrtr                O1 = sqrt(O2)
 283 @end example
 284
 285 Besides requiring the @code{r} modifier, there are no unary operations
 286 with an immediate operand.
 287
 288 @item Compare instructions
 289 These accept three operands; again, the last can be an immediate.
 290 The last two operands are compared, and the first operand, that must be
 291 an integer register, is set to either 0 or 1, according to whether the
 292 given condition was met or not.
 293
 294 The conditions given below are for the standard behavior of C,
 295 where the ``unordered'' comparison result is mapped to false.
 296
 297 @example
 298 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 299 lti       _u  _f  _d  O1 =  (O2 <  O3)
 300 ler       _u  _f  _d  O1 =  (O2 <= O3)
 301 lei       _u  _f  _d  O1 =  (O2 <= O3)
 302 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 303 gti       _u  _f  _d  O1 =  (O2 >  O3)
 304 ger       _u  _f  _d  O1 =  (O2 >= O3)
 305 gei       _u  _f  _d  O1 =  (O2 >= O3)
 306 eqr           _f  _d  O1 =  (O2 == O3)
 307 eqi           _f  _d  O1 =  (O2 == O3)
 308 ner           _f  _d  O1 =  (O2 != O3)
 309 nei           _f  _d  O1 =  (O2 != O3)
 310 unltr         _f  _d  O1 = !(O2 >= O3)
 311 unler         _f  _d  O1 = !(O2 >  O3)
 312 ungtr         _f  _d  O1 = !(O2 <= O3)
 313 unger         _f  _d  O1 = !(O2 <  O3)
 314 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 315 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 316 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 317 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 318 @end example
 319
 320 @item Transfer operations
 321 These accept two operands; for @code{ext} both of them must be
 322 registers, while @code{mov} accepts an immediate value as the second
 323 operand.
 324
 325 Unlike @code{movr} and @code{movi}, the other instructions are used
 326 to truncate a wordsize operand to a smaller integer data type or to
 327 convert float data types. You can also use @code{extr} to convert an
 328 integer to a floating point value: the usual options are @code{extr_f}
 329 and @code{extr_d}.
 330
 331 @example
 332 movr                                 _f  _d  O1 = O2
 333 movi                                 _f  _d  O1 = O2
 334 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 335 truncr                               _f  _d  O1 = trunc(O2)
 336 @end example
 337
 338 In 64-bit architectures it may be required to use @code{truncr_f_i},
 339 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 340 the equivalent C code.  Only the @code{_i} modifier is available in
 341 32-bit architectures.
 342
 343 @example
 344 truncr_f_i    = <int> O1 = <float> O2
 345 truncr_f_l    = <long>O1 = <float> O2
 346 truncr_d_i    = <int> O1 = <double>O2
 347 truncr_d_l    = <long>O1 = <double>O2
 348 @end example
 349
 350 The float conversion operations are @emph{destination first,
 351 source second}, but the order of the types is reversed.  This happens
 352 for historical reasons.
 353
 354 @example
 355 extr_f_d    = <double>O1 = <float> O2
 356 extr_d_f    = <float> O1 = <double>O2
 357 @end example
 358
 359 @item Network extensions
 360 These accept two operands, both of which must be registers; these
 361 two instructions actually perform the same task, yet they are
 362 assigned to two mnemonics for the sake of convenience and
 363 completeness.  As usual, the first operand is the destination and
 364 the second is the source.
 365 The @code{_ul} variant is only available in 64-bit architectures.
 366 @example
 367 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 368 ntohr    _us _ui _ul @r{Network-to-host order }
 369 @end example
 370
 371 @item Load operations
 372 @code{ld} accepts two operands while @code{ldx} accepts three;
 373 in both cases, the last can be either a register or an immediate
 374 value. Values are extended (with or without sign, according to
 375 the data type specification) to fit a whole register.
 376 The @code{_ui} and @code{_l} types are only available in 64-bit
 377 architectures.  For convenience, there is a version without a
 378 type modifier for integer or pointer operands that uses the
 379 appropriate wordsize call.
 380 @example
 381 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 382 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 383 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 384 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 385 @end example
 386
 387 @item Store operations
 388 @code{st} accepts two operands while @code{stx} accepts three; in
 389 both cases, the first can be either a register or an immediate
 390 value. Values are sign-extended to fit a whole register.
 391 @example
 392 str     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 393 sti     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 394 stxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 395 stxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 396 @end example
 397 As for the load operations, the @code{_ui} and @code{_l} types are
 398 only available in 64-bit architectures, and for convenience, there
 399 is a version without a type modifier for integer or pointer operands
 400 that uses the appropriate wordsize call.
 401
 402 @item Argument management
 403 These are:
 404 @example
 405 prepare     (not specified)
 406 va_start    (not specified)
 407 pushargr                                   _f  _d
 408 pushargi                                   _f  _d
 409 va_push     (not specified)
 410 arg                                        _f  _d
 411 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 412 va_arg                                         _d
 413 putargr                                    _f  _d
 414 putargi                                    _f  _d
 415 ret         (not specified)
 416 retr                                       _f  _d
 417 reti                                       _f  _d
 418 va_end      (not specified)
 419 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 420 epilog      (not specified)
 421 @end example
 422 As with other operations that use a type modifier, the @code{_ui} and
 423 @code{_l} types are only available in 64-bit architectures, but there
 424 are operations without a type modifier that alias to the appropriate
 425 integer operation with wordsize operands.
 426
 427 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 428 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 429 A code snippet that wants to call another procedure and has to pass
 430 arguments must, in order: use the @code{prepare} instruction and use
 431 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 432 left to right order}; and use @code{finish} or @code{call} (explained below)
 433 to perform the actual call.
 434
 435 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 436 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 437 @code{va_push} is required when passing a @code{va_list} to another function,
 438 because not all architectures expect it as a single pointer. Known case
 439 is DEC Alpha, that requires it as a structure passed by value.
 440
 441 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 442 @code{arg} is different from other instruction in that it does not
 443 actually generate any code: instead, it is a function which returns
 444 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 445 a value'' means that @lightning{} code that compile these
 446 instructions return a value when expanded.} You should call
 447 @code{arg} as soon as possible, before any function call or, more
 448 easily, right after the @code{prolog} instructions
 449 (which is treated later).
 450
 451 @code{getarg} accepts a register argument and a value returned by
 452 @code{arg}, and will move that argument to the register, extending
 453 it (with or without sign, according to the data type specification)
 454 to fit a whole register.  These instructions are more intimately
 455 related to the usage of the @lightning{} instruction set in code
 456 that generates other code, so they will be treated more
 457 specifically in @ref{GNU lightning examples, , Generating code at
 458 run-time}.
 459
 460 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 461 it accepts as first argument a register or immediate, and as
 462 second argument a value returned by @code{arg}. It allows changing,
 463 or restoring an argument to the current function, and is a
 464 construct required to implement tail call optimization. Note that
 465 arguments in registers are very cheap, but will be overwritten
 466 at any moment, including on some operations, for example division,
 467 that on several ports is implemented as a function call.
 468
 469 Finally, the @code{retval} instruction fetches the return value of a
 470 called function in a register.  The @code{retval} instruction takes a
 471 register argument and copies the return value of the previously called
 472 function in that register.  A function with a return value should use
 473 @code{retr} or @code{reti} to put the return value in the return register
 474 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 475
 476 @code{epilog} is an optional call, that marks the end of a function
 477 body. It is automatically generated by @lightning{} if starting a new
 478 function (what should be done after a @code{ret} call) or finishing
 479 generating jit.
 480 It is very important to note that the fact that @code{epilog} being
 481 optional may cause a common mistake. Consider this:
 482 @example
 483 fun1:
 484     prolog
 485     ...
 486     ret
 487 fun2:
 488     prolog
 489 @end example
 490 Because @code{epilog} is added when finding a new @code{prolog},
 491 this will cause the @code{fun2} label to actually be before the
 492 return from @code{fun1}. Because @lightning{} will actually
 493 understand it as:
 494 @example
 495 fun1:
 496     prolog
 497     ...
 498     ret
 499 fun2:
 500     epilog
 501     prolog
 502 @end example
 503
 504 You should observe a few rules when using these macros.  First of
 505 all, if calling a varargs function, you should use the @code{ellipsis}
 506 call to mark the position of the ellipsis in the C prototype.
 507
 508 You should not nest calls to @code{prepare} inside a
 509 @code{prepare/finish} block.  Doing this will result in undefined
 510 behavior. Note that for functions with zero arguments you can use
 511 just @code{call}.
 512
 513 @item Branch instructions
 514 Like @code{arg}, these also return a value which, in this case,
 515 is to be used to compile forward branches as explained in
 516 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 517 compared; of these, the last can be either a register or an immediate.
 518 They are:
 519 @example
 520 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 521 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 522 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 523 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 524 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 525 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 526 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 527 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 528 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 529 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 530 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 531 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 532
 533 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 534 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 535 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 536 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 537 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 538 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 539 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 540 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 541
 542 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 543 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 544 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 545 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 546 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 547 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 548 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 549 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 550 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 551 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 552 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 553 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 554 @end example
 555
 556 @item Jump and return operations
 557 These accept one argument except @code{ret} and @code{jmpi} which
 558 have none; the difference between @code{finishi} and @code{calli}
 559 is that the latter does not clean the stack from pushed parameters
 560 (if any) and the former must @strong{always} follow a @code{prepare}
 561 instruction.
 562 @example
 563 callr     (not specified)                @r{function call to register O1}
 564 calli     (not specified)                @r{function call to immediate O1}
 565 finishr   (not specified)                @r{function call to register O1}
 566 finishi   (not specified)                @r{function call to immediate O1}
 567 jmpr      (not specified)                @r{unconditional jump to register}
 568 jmpi      (not specified)                @r{unconditional jump}
 569 ret       (not specified)                @r{return from subroutine}
 570 retr      _c _uc _s _us _i _ui _l _f _d
 571 reti      _c _uc _s _us _i _ui _l _f _d
 572 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 573                                          @r{to register}
 574 @end example
 575
 576 Like branch instruction, @code{jmpi} also returns a value which is to
 577 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 578 numbers}.
 579
 580 @item Labels
 581 There are 3 @lightning{} instructions to create labels:
 582 @example
 583 label     (not specified)                @r{simple label}
 584 forward   (not specified)                @r{forward label}
 585 indirect  (not specified)                @r{special simple label}
 586 @end example
 587
 588 @code{label} is normally used as @code{patch_at} argument for backward
 589 jumps.
 590
 591 @example
 592         jit_node_t *jump, *label;
 593 label = jit_label();
 594         ...
 595         jump = jit_beqr(JIT_R0, JIT_R1);
 596         jit_patch_at(jump, label);
 597 @end example
 598
 599 @code{forward} is used to patch code generation before the actual
 600 position of the label is known.
 601
 602 @example
 603         jit_node_t *jump, *label;
 604 label = jit_forward();
 605         jump = jit_beqr(JIT_R0, JIT_R1);
 606         jit_patch_at(jump, label);
 607         ...
 608         jit_link(label);
 609 @end example
 610
 611 @code{indirect} is useful when creating jump tables, and tells
 612 @lightning{} to not optimize out a label that is not the target of
 613 any jump, because an indirect jump may land where it is defined.
 614
 615 @example
 616         jit_node_t *jump, *label;
 617         ...
 618         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 619         ...
 620 label = jit_indirect();
 621 @end example
 622
 623 @code{indirect} is an special case of @code{note} and @code{name}
 624 because it is a valid argument to @code{address}.
 625
 626 Note that the usual idiom to write the previous example is
 627 @example
 628         jit_node_t *addr, *jump;
 629 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 630         ...
 631         jmpr(JIT_R0);
 632         ...
 633         jit_patch(addr);                 @rem{/* implicit label added */}
 634 @end example
 635
 636 that automatically binds the implicit label added by @code{patch} with
 637 the @code{movi}, but on some special conditions it is required to create
 638 an "unbound" label.
 639
 640 @item Function prolog
 641
 642 These macros are used to set up a function prolog.  The @code{allocai}
 643 call accept a single integer argument and returns an offset value
 644 for stack storage access.  The @code{allocar} accepts two registers
 645 arguments, the first is set to the offset for stack access, and the
 646 second is the size in bytes argument.
 647
 648 @example
 649 prolog    (not specified)                @r{function prolog}
 650 allocai   (not specified)                @r{reserve space on the stack}
 651 allocar   (not specified)                @r{allocate space on the stack}
 652 @end example
 653
 654 @code{allocai} receives the number of bytes to allocate and returns
 655 the offset from the frame pointer register @code{FP} to the base of
 656 the area.
 657
 658 @code{allocar} receives two register arguments.  The first is where
 659 to store the offset from the frame pointer register @code{FP} to the
 660 base of the area.  The second argument is the size in bytes.  Note
 661 that @code{allocar} is dynamic allocation, and special attention
 662 should be taken when using it.  If called in a loop, every iteration
 663 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 664 depending on backend requirements, even if allocating only one byte.
 665 It is advisable to not use it with @code{frame} and @code{tramp}; it
 666 should work with @code{frame} with special care to call only once,
 667 but is not supported if used in @code{tramp}, even if called only
 668 once.
 669
 670 As a small appetizer, here is a small function that adds 1 to the input
 671 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 672 is a bit different from the one used when writing real subroutines with
 673 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 674 examples, , Generating code at run-time}.
 675
 676 @example
 677 incr:
 678      prolog
 679 in = arg                     @rem{! We have an integer argument}
 680      getarg    R0, in        @rem{! Move it to R0}
 681      addi      R0, R0, 1     @rem{! Add 1}
 682      retr      R0            @rem{! And return the result}
 683 @end example
 684
 685 And here is another function which uses the @code{printf} function from
 686 the standard C library to write a number in hexadecimal notation:
 687
 688 @example
 689 printhex:
 690      prolog
 691 in = arg                     @rem{! Same as above}
 692      getarg    R0, in
 693      prepare                 @rem{! Begin call sequence for printf}
 694      pushargi  "%x"          @rem{! Push format string}
 695      ellipsis                @rem{! Varargs start here}
 696      pushargr  R0            @rem{! Push second argument}
 697      finishi   printf        @rem{! Call printf}
 698      ret                     @rem{! Return to caller}
 699 @end example
 700
 701 @item Register liveness
 702
 703 During code generation, @lightning{} occasionally needs scratch registers
 704 or needs to use architecture-defined registers.  For that, @lightning{}
 705 internally maintains register liveness information.
 706
 707 In the following example, @code{qdivr} will need special registers like
 708 @code{R0} on some architectures.  As @lightning{} understands that
 709 @code{R0} is used in the subsequent instruction, it will create
 710 save/restore code for @code{R0} in case.
 711
 712 @example
 713 ...
 714 qdivr V0, V1, V2, V3
 715 movr  V3, R0
 716 ...
 717 @end example
 718
 719 The same is not true in the example that follows.  Here, @code{R0} is
 720 not alive after the division operation because @code{R0} is neither an
 721 argument register nor a callee-save register.  Thus, no save/restore
 722 code for @code{R0} will be created in case.
 723
 724 @example
 725 ...
 726 qdivr V0, V1, V2, V3
 727 jmpr  R1
 728 ...
 729 @end example
 730
 731 The @code{live} instruction can be used to mark a register as live after
 732 it as in the following example.  Here, @code{R0} will be preserved
 733 across the division.
 734
 735 @example
 736 ...
 737 qdivr V0, V1, V2, V3
 738 live R0
 739 jmpr R1
 740 ...
 741 @end example
 742
 743 The @code{live} instruction is useful at code entry and exit points,
 744 like after and before a @code{callr} instruction.
 745
 746 @item Trampolines, continuations and tail call optimization
 747
 748 Frequently it is required to generate jit code that must jump to
 749 code generated later, possibly from another @code{jit_context_t}.
 750 These require compatible stack frames.
 751
 752 @lightning{} provides two primitives from where trampolines,
 753 continuations and tail call optimization can be implemented.
 754
 755 @example
 756 frame   (not specified)                  @r{create stack frame}
 757 tramp   (not specified)                  @r{assume stack frame}
 758 @end example
 759
 760 @code{frame} receives an integer argument@footnote{It is not
 761 automatically computed because it does not know about the
 762 requirement of later generated code.} that defines the size in
 763 bytes for the stack frame of the current, @code{C} callable,
 764 jit function. To calculate this value, a good formula is maximum
 765 number of arguments to any called native function times
 766 eight@footnote{Times eight so that it works for double arguments.
 767 And would not need conditionals for ports that pass arguments in
 768 the stack.}, plus the sum of the arguments to any call to
 769 @code{jit_allocai}. @lightning{} automatically adjusts this value
 770 for any backend specific stack memory it may need, or any
 771 alignment constraint.
 772
 773 @code{frame} also instructs @lightning{} to save all callee
 774 save registers in the prolog and reload in the epilog.
 775
 776 @example
 777 main:                        @rem{! jit entry point}
 778      prolog                  @rem{! function prolog}
 779      frame  256              @rem{! save all callee save registers and}
 780                              @rem{! reserve at least 256 bytes in stack}
 781 main_loop:
 782      ...
 783      jmpi   handler          @rem{! jumps to external code}
 784      ...
 785      ret                     @rem{! return to the caller}
 786 @end example
 787
 788 @code{tramp} differs from @code{frame} only that a prolog and epilog
 789 will not be generated. Note that @code{prolog} must still be used.
 790 The code under @code{tramp} must be ready to be entered with a jump
 791 at the prolog position, and instead of a return, it must end with
 792 a non conditional jump. @code{tramp} exists solely for the fact
 793 that it allows optimizing out prolog and epilog code that would
 794 never be executed.
 795
 796 @example
 797 handler:                     @rem{! handler entry point}
 798      prolog                  @rem{! function prolog}
 799      tramp  256              @rem{! assumes all callee save registers}
 800                              @rem{! are saved and there is at least}
 801                              @rem{! 256 bytes in stack}
 802      ...
 803      jmpi   main_loop        @rem{! return to the main loop}
 804 @end example
 805
 806 @lightning{} only supports Tail Call Optimization using the
 807 @code{tramp} construct. Any other way is not guaranteed to
 808 work on all ports.
 809
 810 An example of a simple (recursive) tail call optimization:
 811
 812 @example
 813 factorial:                   @rem{! Entry point of the factorial function}
 814      prolog
 815 in = arg                     @rem{! Receive an integer argument}
 816      getarg R0, in           @rem{! Move argument to RO}
 817      prepare
 818          pushargi 1          @rem{! This is the accumulator}
 819          pushargr R0         @rem{! This is the argument}
 820      finishi fact            @rem{! Call the tail call optimized function}
 821      retval R0               @rem{! Fetch the result}
 822      retr R0                 @rem{! Return it}
 823      epilog                  @rem{! Epilog *before* label before prolog}
 824
 825 fact:                        @rem{! Entry point of the helper function}
 826      prolog
 827      frame 16                @rem{! Reserve 16 bytes in the stack}
 828 fact_entry:                  @rem{! This is the tail call entry point}
 829 ac = arg                     @rem{! The accumulator is the first argument}
 830 in = arg                     @rem{! The factorial argument}
 831      getarg R0, ac           @rem{! Move the accumulator to R0}
 832      getarg R1, in           @rem{! Move the argument to R1}
 833      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
 834      mulr R0, R0, R1         @rem{! accumulator *= argument}
 835      putargr R0, ac          @rem{! Update the accumulator}
 836      subi R1, R1, 1          @rem{! argument -= 1}
 837      putargr R1, in          @rem{! Update the argument}
 838      jmpi fact_entry         @rem{! Tail Call Optimize it!}
 839 fact_out:
 840      retr R0                 @rem{! Return the accumulator}
 841 @end example
 842
 843 @item Predicates
 844 @example
 845 forward_p      (not specified)           @r{forward label predicate}
 846 indirect_p     (not specified)           @r{indirect label predicate}
 847 target_p       (not specified)           @r{used label predicate}
 848 arg_register_p (not specified)           @r{argument kind predicate}
 849 callee_save_p  (not specified)           @r{callee save predicate}
 850 pointer_p      (not specified)           @r{pointer predicate}
 851 @end example
 852
 853 @code{forward_p} expects a @code{jit_node_t*} argument, and
 854 returns non zero if it is a forward label reference, that is,
 855 a label returned by @code{forward}, that still needs a
 856 @code{link} call.
 857
 858 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
 859 non zero if it is an indirect label reference, that is, a label that
 860 was returned by @code{indirect}.
 861
 862 @code{target_p} expects a @code{jit_node_t*} argument, that is any
 863 kind of label, and will return non zero if there is at least one
 864 jump or move referencing it.
 865
 866 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
 867 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
 868 will return non zero if the argument lives in a register. This call
 869 is useful to know the live range of register arguments, as those
 870 are very fast to read and write, but have volatile values.
 871
 872 @code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
 873 @code{JIT_Fn}, and will return non zero if the register is callee
 874 save. This call is useful because on several ports, the @code{JIT_Rn}
 875 and @code{JIT_Fn} registers are actually callee save; no need
 876 to save and load the values when making function calls.
 877
 878 @code{pointer_p} expects a pointer argument, and will return non
 879 zero if the pointer is inside the generated jit code. Must be
 880 called after @code{jit_emit} and before @code{jit_destroy_state}.
 881 @end table
 882
 883 @node GNU lightning examples
 884 @chapter Generating code at run-time
 885
 886 To use @lightning{}, you should include the @file{lightning.h} file that
 887 is put in your include directory by the @samp{make install} command.
 888
 889 Each of the instructions above translates to a macro or function call.
 890 All you have to do is prepend @code{jit_} (lowercase) to opcode names
 891 and @code{JIT_} (uppercase) to register names.  Of course, parameters
 892 are to be put between parentheses.
 893
 894 This small tutorial presents three examples:
 895
 896 @iftex
 897 @itemize @bullet
 898 @item
 899 The @code{incr} function found in @ref{The instruction set, ,
 900 @lightning{}'s instruction set}:
 901
 902 @item
 903 A simple function call to @code{printf}
 904
 905 @item
 906 An RPN calculator.
 907
 908 @item
 909 Fibonacci numbers
 910 @end itemize
 911 @end iftex
 912 @ifnottex
 913 @menu
 914 * incr::             A function which increments a number by one
 915 * printf::           A simple function call to printf
 916 * RPN calculator::   A more complex example, an RPN calculator
 917 * Fibonacci::        Calculating Fibonacci numbers
 918 @end menu
 919 @end ifnottex
 920
 921 @node incr
 922 @section A function which increments a number by one
 923
 924 Let's see how to create and use the sample @code{incr} function created
 925 in @ref{The instruction set, , @lightning{}'s instruction set}:
 926
 927 @example
 928 #include <stdio.h>
 929 #include <lightning.h>
 930
 931 static jit_state_t *_jit;
 932
 933 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
 934
 935 int main(int argc, char *argv[])
 936 @{
 937   jit_node_t  *in;
 938   pifi         incr;
 939
 940   init_jit(argv[0]);
 941   _jit = jit_new_state();
 942
 943   jit_prolog();                    @rem{/* @t{     prolog             } */}
 944   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
 945   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
 946   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
 947   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
 948
 949   incr = jit_emit();
 950   jit_clear_state();
 951
 952   @rem{/* call the generated code@comma{} passing 5 as an argument */}
 953   printf("%d + 1 = %d\n", 5, incr(5));
 954
 955   jit_destroy_state();
 956   finish_jit();
 957   return 0;
 958 @}
 959 @end example
 960
 961 Let's examine the code line by line (well, almost@dots{}):
 962
 963 @table @t
 964 @item #include <lightning.h>
 965 You already know about this.  It defines all of @lightning{}'s macros.
 966
 967 @item static jit_state_t *_jit;
 968 You might wonder about what is @code{jit_state_t}.  It is a structure
 969 that stores jit code generation information.  The name @code{_jit} is
 970 special, because since multiple jit generators can run at the same
 971 time, you must either @r{#define _jit my_jit_state} or name it
 972 @code{_jit}.
 973
 974 @item typedef int (*pifi)(int);
 975 Just a handy typedef for a pointer to a function that takes an
 976 @code{int} and returns another.
 977
 978 @item jit_node_t  *in;
 979 Declares a variable to hold an identifier for a function argument. It
 980 is an opaque pointer, that will hold the return of a call to @code{arg}
 981 and be used as argument to @code{getarg}.
 982
 983 @item pifi         incr;
 984 Declares a function pointer variable to a function that receives an
 985 @code{int} and returns an @code{int}.
 986
 987 @item init_jit(argv[0]);
 988 You must call this function before creating a @code{jit_state_t}
 989 object. This function does global state initialization, and may need
 990 to detect CPU or Operating System features.  It receives a string
 991 argument that is later used to read symbols from a shared object using
 992 GNU binutils if disassembly was enabled at configure time. If no
 993 disassembly will be performed a NULL pointer can be used as argument.
 994
 995 @item _jit = jit_new_state();
 996 This call initializes a @lightning{} jit state.
 997
 998 @item jit_prolog();
 999 Ok, so we start generating code for our beloved function@dots{}
1000
1001 @item in = jit_arg();
1002 @itemx jit_getarg(JIT_R0, in);
1003 We retrieve the first (and only) argument, an integer, and store it
1004 into the general-purpose register @code{R0}.
1005
1006 @item jit_addi(JIT_R0, JIT_R0, 1);
1007 We add one to the content of the register.
1008
1009 @item jit_retr(JIT_R0);
1010 This instruction generates a standard function epilog that returns
1011 the contents of the @code{R0} register.
1012
1013 @item incr = jit_emit();
1014 This instruction is very important.  It actually translates the
1015 @lightning{} macros used before to machine code, flushes the generated
1016 code area out of the processor's instruction cache and return a
1017 pointer to the start of the code.
1018
1019 @item jit_clear_state();
1020 This call cleanups any data not required for jit execution. Note
1021 that it must be called after any call to @code{jit_print} or
1022 @code{jit_address}, as this call destroy the @lightning{}
1023 intermediate representation.
1024
1025 @item printf("%d + 1 = %d", 5, incr(5));
1026 Calling our function is this simple---it is not distinguishable from
1027 a normal C function call, the only difference being that @code{incr}
1028 is a variable.
1029
1030 @item jit_destroy_state();
1031 Releases all memory associated with the jit context. It should be
1032 called after known the jit will no longer be called.
1033
1034 @item finish_jit();
1035 This call cleanups any global state hold by @lightning{}, and is
1036 advisable to call it once jit code will no longer be generated.
1037 @end table
1038
1039 @lightning{} abstracts two phases of dynamic code generation: selecting
1040 instructions that map the standard representation, and emitting binary
1041 code for these instructions.  The client program has the responsibility
1042 of describing the code to be generated using the standard @lightning{}
1043 instruction set.
1044
1045 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1046 architecture (on the right is the code that an assembly-language
1047 programmer would write):
1048
1049 @table @b
1050 @item SPARC
1051 @example
1052       save  %sp, -112, %sp
1053       mov  %i0, %g2                 retl
1054       inc  %g2                      inc %o0
1055       mov  %g2, %i0
1056       restore
1057       retl
1058       nop
1059 @end example
1060 In this case, @lightning{} introduces overhead to create a register
1061 window (not knowing that the procedure is a leaf procedure) and to
1062 move the argument to the general purpose register @code{R0} (which
1063 maps to @code{%g2} on the SPARC).
1064 @end table
1065
1066 @table @b
1067 @item x86_64
1068 @example
1069     sub   $0x30,%rsp
1070     mov   %rbp,(%rsp)
1071     mov   %rsp,%rbp
1072     sub   $0x18,%rsp
1073     mov   %rdi,%rax            mov %rdi, %rax
1074     add   $0x1,%rax            inc %rax
1075     mov   %rbp,%rsp
1076     mov   (%rsp),%rbp
1077     add   $0x30,%rsp
1078     retq                       retq
1079 @end example
1080 In this case, the main overhead is due to the function's prolog and
1081 epilog, and stack alignment after reserving stack space for word
1082 to/from float conversions or moving data from/to x87 to/from SSE.
1083 Note that besides allocating space to save callee saved registers,
1084 no registers are saved/restored because @lightning{} notices those
1085 registers are not modified. There is currently no logic to detect
1086 if it needs to allocate stack space for type conversions neither
1087 proper leaf function detection, but these are subject to change
1088 (FIXME).
1089 @end table
1090
1091 @node printf
1092 @section A simple function call to @code{printf}
1093
1094 Again, here is the code for the example:
1095
1096 @example
1097 #include <stdio.h>
1098 #include <lightning.h>
1099
1100 static jit_state_t *_jit;
1101
1102 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1103
1104 int main(int argc, char *argv[])
1105 @{
1106   pvfi          myFunction;             @rem{/* ptr to generated code */}
1107   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1108   jit_node_t    *in;                    @rem{/* to get the argument */}
1109
1110   init_jit(argv[0]);
1111   _jit = jit_new_state();
1112
1113   start = jit_note(__FILE__, __LINE__);
1114   jit_prolog();
1115   in = jit_arg();
1116   jit_getarg(JIT_R1, in);
1117   jit_prepare();
1118   jit_pushargi((jit_word_t)"generated %d bytes\n");
1119   jit_ellipsis();
1120   jit_pushargr(JIT_R1);
1121   jit_finishi(printf);
1122   jit_ret();
1123   jit_epilog();
1124   end = jit_note(__FILE__, __LINE__);
1125
1126   myFunction = jit_emit();
1127
1128   @rem{/* call the generated code@comma{} passing its size as argument */}
1129   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1130   jit_clear_state();
1131
1132   jit_disassemble();
1133
1134   jit_destroy_state();
1135   finish_jit();
1136   return 0;
1137 @}
1138 @end example
1139
1140 The function shows how many bytes were generated.  Most of the code
1141 is not very interesting, as it resembles very closely the program
1142 presented in @ref{incr, , A function which increments a number by one}.
1143
1144 For this reason, we're going to concentrate on just a few statements.
1145
1146 @table @t
1147 @item start = jit_note(__FILE__, __LINE__);
1148 @itemx @r{@dots{}}
1149 @itemx end = jit_note(__FILE__, __LINE__);
1150 These two instruction call the @code{jit_note} macro, which creates
1151 a note in the jit code; arguments to @code{jit_note} usually are a
1152 filename string and line number integer, but using NULL for the
1153 string argument is perfectly valid if only need to create a simple
1154 marker in the code.
1155
1156 @item jit_ellipsis();
1157 @code{ellipsis} usually is only required if calling varargs functions
1158 with double arguments, but it is a good practice to properly describe
1159 the @r{@dots{}} in the call sequence.
1160
1161 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1162 Note the use of the @code{(jit_word_t)} cast, that is used only
1163 to avoid a compiler warning, due to using a pointer where a
1164 wordsize integer type was expected.
1165
1166 @item jit_prepare();
1167 @itemx @r{@dots{}}
1168 @itemx jit_finishi(printf);
1169 Once the arguments to @code{printf} have been pushed, what means
1170 moving them to stack or register arguments, the @code{printf}
1171 function is called and the stack cleaned.  Note how @lightning{}
1172 abstracts the differences between different architectures and
1173 ABI's -- the client program does not know how parameter passing
1174 works on the host architecture.
1175
1176 @item jit_epilog();
1177 Usually it is not required to call @code{epilog}, but because it
1178 is implicitly called when noticing the end of a function, if the
1179 @code{end} variable was set with a @code{note} call after the
1180 @code{ret}, it would not consider the function epilog.
1181
1182 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1183 This calls the generate jit function passing as argument the offset
1184 difference from the @code{start} and @code{end} notes. The @code{address}
1185 call must be done after the @code{emit} call or either a fatal error
1186 will happen (if @lightning{} is built with assertions enable) or an
1187 undefined value will be returned.
1188
1189 @item jit_clear_state();
1190 Note that @code{jit_clear_state} was called after executing jit in
1191 this example. It was done because it must be called after any call
1192 to @code{jit_address} or @code{jit_print}.
1193
1194 @item jit_disassemble();
1195 @code{disassemble} will dump the generated code to standard output,
1196 unless @lightning{} was built with the disassembler disabled, in which
1197 case no output will be shown.
1198 @end table
1199
1200 @node RPN calculator
1201 @section A more complex example, an RPN calculator
1202
1203 We create a small stack-based RPN calculator which applies a series
1204 of operators to a given parameter and to other numeric operands.
1205 Unlike previous examples, the code generator is fully parameterized
1206 and is able to compile different formulas to different functions.
1207 Here is the code for the expression compiler; a sample usage will
1208 follow.
1209
1210 Since @lightning{} does not provide push/pop instruction, this
1211 example uses a stack-allocated area to store the data.  Such an
1212 area can be allocated using the macro @code{allocai}, which
1213 receives the number of bytes to allocate and returns the offset
1214 from the frame pointer register @code{FP} to the base of the
1215 area.
1216
1217 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1218 to access stack-allocated variables.  However, it is possible to
1219 use operations such as @code{add} to compute the address of the
1220 variables, and pass the address around.
1221
1222 @example
1223 #include <stdio.h>
1224 #include <lightning.h>
1225
1226 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1227
1228 static jit_state_t *_jit;
1229
1230 void stack_push(int reg, int *sp)
1231 @{
1232   jit_stxi_i (*sp, JIT_FP, reg);
1233   *sp += sizeof (int);
1234 @}
1235
1236 void stack_pop(int reg, int *sp)
1237 @{
1238   *sp -= sizeof (int);
1239   jit_ldxi_i (reg, JIT_FP, *sp);
1240 @}
1241
1242 jit_node_t *compile_rpn(char *expr)
1243 @{
1244   jit_node_t *in, *fn;
1245   int stack_base, stack_ptr;
1246
1247   fn = jit_note(NULL, 0);
1248   jit_prolog();
1249   in = jit_arg();
1250   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1251
1252   jit_getarg_i(JIT_R2, in);
1253
1254   while (*expr) @{
1255     char buf[32];
1256     int n;
1257     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1258       expr += n - 1;
1259       stack_push(JIT_R0, &stack_ptr);
1260       jit_movi(JIT_R0, atoi(buf));
1261     @} else if (*expr == 'x') @{
1262       stack_push(JIT_R0, &stack_ptr);
1263       jit_movr(JIT_R0, JIT_R2);
1264     @} else if (*expr == '+') @{
1265       stack_pop(JIT_R1, &stack_ptr);
1266       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1267     @} else if (*expr == '-') @{
1268       stack_pop(JIT_R1, &stack_ptr);
1269       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1270     @} else if (*expr == '*') @{
1271       stack_pop(JIT_R1, &stack_ptr);
1272       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1273     @} else if (*expr == '/') @{
1274       stack_pop(JIT_R1, &stack_ptr);
1275       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1276     @} else @{
1277       fprintf(stderr, "cannot compile: %s\n", expr);
1278       abort();
1279     @}
1280     ++expr;
1281   @}
1282   jit_retr(JIT_R0);
1283   jit_epilog();
1284   return fn;
1285 @}
1286 @end example
1287
1288 The principle on which the calculator is based is easy: the stack top
1289 is held in R0, while the remaining items of the stack are held in the
1290 memory area that we allocate with @code{allocai}.  Compiling a numeric
1291 operand or the argument @code{x} pushes the old stack top onto the
1292 stack and moves the operand into R0; compiling an operator pops the
1293 second operand off the stack into R1, and compiles the operation so
1294 that the result goes into R0, thus becoming the new stack top.
1295
1296 This example allocates a fixed area for 32 @code{int}s.  This is not
1297 a problem when the function is a leaf like in this case; in a full-blown
1298 compiler you will want to analyze the input and determine the number
1299 of needed stack slots---a very simple example of register allocation.
1300 The area is then managed like a stack using @code{stack_push} and
1301 @code{stack_pop}.
1302
1303 Source code for the client (which lies in the same source file) follows:
1304
1305 @example
1306 int main(int argc, char *argv[])
1307 @{
1308   jit_node_t *nc, *nf;
1309   pifi c2f, f2c;
1310   int i;
1311
1312   init_jit(argv[0]);
1313   _jit = jit_new_state();
1314
1315   nc = compile_rpn("32x9*5/+");
1316   nf = compile_rpn("x32-5*9/");
1317   (void)jit_emit();
1318   c2f = (pifi)jit_address(nc);
1319   f2c = (pifi)jit_address(nf);
1320   jit_clear_state();
1321
1322   printf("\nC:");
1323   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1324   printf("\nF:");
1325   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1326   printf("\n");
1327
1328   printf("\nF:");
1329   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1330   printf("\nC:");
1331   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1332   printf("\n");
1333
1334   jit_destroy_state();
1335   finish_jit();
1336   return 0;
1337 @}
1338 @end example
1339
1340 The client displays a conversion table between Celsius and Fahrenheit
1341 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1342 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1343 respectively.
1344
1345 Providing the formula as an argument to @code{compile_rpn} effectively
1346 parameterizes code generation, making it possible to use the same code
1347 to compile different functions; this is what makes dynamic code
1348 generation so powerful.
1349
1350 @node Fibonacci
1351 @section Fibonacci numbers
1352
1353 The code in this section calculates the Fibonacci sequence. That is
1354 modeled by the recurrence relation:
1355 @display
1356      f(0) = 0
1357      f(1) = f(2) = 1
1358      f(n) = f(n-1) + f(n-2)
1359 @end display
1360
1361 The purpose of this example is to introduce branches.  There are two
1362 kind of branches: backward branches and forward branches.  We'll
1363 present the calculation in a recursive and iterative form; the
1364 former only uses forward branches, while the latter uses both.
1365
1366 @example
1367 #include <stdio.h>
1368 #include <lightning.h>
1369
1370 static jit_state_t *_jit;
1371
1372 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1373
1374 int main(int argc, char *argv[])
1375 @{
1376   pifi       fib;
1377   jit_node_t *label;
1378   jit_node_t *call;
1379   jit_node_t *in;                 @rem{/* offset of the argument */}
1380   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1381   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1382
1383   init_jit(argv[0]);
1384   _jit = jit_new_state();
1385
1386   label = jit_label();
1387         jit_prolog   ();
1388   in =  jit_arg      ();
1389         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1390  zero = jit_beqi     (JIT_R0, 0);
1391         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1392         jit_movi     (JIT_R0, 1);
1393   ref = jit_blei     (JIT_V0, 2);
1394         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1395         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1396         jit_prepare();
1397           jit_pushargr(JIT_V1);
1398         call = jit_finishi(NULL);
1399         jit_patch_at(call, label);
1400         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1401         jit_prepare();
1402           jit_pushargr(JIT_V2);
1403         call = jit_finishi(NULL);
1404         jit_patch_at(call, label);
1405         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1406         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1407
1408   jit_patch(ref);                               @rem{/* patch jump */}
1409   jit_patch(zero);                              @rem{/* patch jump */}
1410         jit_retr(JIT_R0);
1411
1412   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1413   fib = jit_emit();
1414   jit_clear_state();
1415   printf("fib(%d) = %d\n", 32, fib(32));
1416   jit_destroy_state();
1417   finish_jit();
1418   return 0;
1419 @}
1420 @end example
1421
1422 As said above, this is the first example of dynamically compiling
1423 branches.  Branch instructions have two operands containing the
1424 values to be compared, and return a @code{jit_note_t *} object
1425 to be patched.
1426
1427 Because labels final address are only known after calling @code{emit},
1428 it is required to call @code{patch} or @code{patch_at}, what does
1429 tell @lightning{} that the target to patch is actually a pointer to
1430 a @code{jit_node_t *} object, otherwise, it would assume that is
1431 a pointer to a C function. Note that conditional branches do not
1432 receive a label argument, so they must be patched.
1433
1434 You need to call @code{patch_at} on the return of value @code{calli},
1435 @code{finishi}, and @code{calli} if it is actually referencing a label
1436 in the jit code. All branch instructions do not receive a label
1437 argument. Note that @code{movi} is an special case, and patching it
1438 is usually done to get the final address of a label, usually to later
1439 call @code{jmpr}.
1440
1441 Now, here is the iterative version:
1442
1443 @example
1444 #include <stdio.h>
1445 #include <lightning.h>
1446
1447 static jit_state_t *_jit;
1448
1449 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1450
1451 int main(int argc, char *argv[])
1452 @{
1453   pifi       fib;
1454   jit_node_t *in;               @rem{/* offset of the argument */}
1455   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1456   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1457   jit_node_t *jump;             @rem{/* jump to start of loop */}
1458   jit_node_t *loop;             @rem{/* start of the loop */}
1459
1460   init_jit(argv[0]);
1461   _jit = jit_new_state();
1462
1463         jit_prolog   ();
1464   in =  jit_arg      ();
1465         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1466  zero = jit_beqi     (JIT_R0, 0);
1467         jit_movr     (JIT_R1, JIT_R0);
1468         jit_movi     (JIT_R0, 1);
1469   ref = jit_blti     (JIT_R1, 2);
1470         jit_subi     (JIT_R2, JIT_R2, 2);
1471         jit_movr     (JIT_R1, JIT_R0);
1472
1473   loop= jit_label();
1474         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1475         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1476         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1477         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1478   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1479   jit_patch_at(jump, loop);
1480
1481   jit_patch(ref);                               @rem{/* patch forward jump */}
1482   jit_patch(zero);                              @rem{/* patch forward jump */}
1483         jit_retr     (JIT_R0);
1484
1485   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1486   fib = jit_emit();
1487   jit_clear_state();
1488   printf("fib(%d) = %d\n", 36, fib(36));
1489   jit_destroy_state();
1490   finish_jit();
1491   return 0;
1492 @}
1493 @end example
1494
1495 This code calculates the recurrence relation using iteration (a
1496 @code{for} loop in high-level languages).  There are no function
1497 calls anymore: instead, there is a backward jump (the @code{bnei} at
1498 the end of the loop).
1499
1500 Note that the program must remember the address for backward jumps;
1501 for forward jumps it is only required to remember the jump code,
1502 and call @code{patch} for the implicit label.
1503
1504 @node Reentrancy
1505 @chapter Re-entrant usage of @lightning{}
1506
1507 @lightning{} uses the special @code{_jit} identifier. To be able
1508 to be able to use multiple jit generation states at the same
1509 time, it is required to used code similar to:
1510
1511 @example
1512     struct jit_state lightning;
1513     #define lightning _jit
1514 @end example
1515
1516 This will cause the symbol defined to @code{_jit} to be passed as
1517 the first argument to the underlying @lightning{} implementation,
1518 that is usually a function with an @code{_} (underscode) prefix
1519 and with an argument named @code{_jit}, in the pattern:
1520
1521 @example
1522     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1523     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1524 @end example
1525
1526 The reason for this is to use the same syntax as the initial lightning
1527 implementation and to avoid needing the user to keep adding an extra
1528 argument to every call, as multiple jit states generating code in
1529 paralell should be very uncommon.
1530
1531 @node Registers
1532 @chapter Accessing the whole register file
1533
1534 As mentioned earlier in this chapter, all @lightning{} back-ends are
1535 guaranteed to have at least six general-purpose integer registers and
1536 six floating-point registers, but many back-ends will have more.
1537
1538 To access the entire register files, you can use the
1539 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1540 accept a parameter that identifies the register number, which
1541 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1542 and @code{JIT_F_NUM} respectively; the number need not be
1543 constant.  Of course, expressions like @code{JIT_R0} and
1544 @code{JIT_R(0)} denote the same register, and likewise for
1545 integer callee-saved, or floating-point, registers.
1546
1547 @section Scratch registers
1548
1549 For operations, @lightning{} does not support directly, like storing
1550 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1551 acquire and release a scratch register as in the following pattern:
1552
1553 @example
1554     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1555     jit_movi (reg, immediate);
1556     jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1557     jit_unget_reg (reg);
1558 @end example
1559
1560 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1561 reloads but don't follow branches, the code between both must be in
1562 the same basic block and must not contain any branches as in the
1563 following (bad) example.
1564
1565 @example
1566     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1567     jit_ldxi (reg, JIT_V0, offset);
1568     jump = jit_bnei (reg, V0);
1569     jit_movr (JIT_V1, reg);
1570     jit_patch (jump);
1571     jit_unget_reg (reg);
1572 @end example
1573
1574 @node Customizations
1575 @chapter Customizations
1576
1577 Frequently it is desirable to have more control over how code is
1578 generated or how memory is used during jit generation or execution.
1579
1580 @section Memory functions
1581 To aid in complete control of memory allocation and deallocation
1582 @lightning{} provides wrappers that default to standard @code{malloc},
1583 @code{realloc} and @code{free}. These are loosely based on the
1584 GNU GMP counterparts, with the difference that they use the same
1585 prototype of the system allocation functions, that is, no @code{size}
1586 for @code{free} or @code{old_size} for @code{realloc}.
1587
1588 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1589 @lightning{} guarantees that memory is only allocated or released
1590 using these wrapped functions, but you must note that if lightning
1591 was linked to GNU binutils, malloc is probably will be called multiple
1592 times from there when initializing the disassembler.
1593
1594 Because @code{init_jit} may call memory functions, if you need to call
1595 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1596 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1597 previous or default wrappers will be passed.
1598 @end deftypefun
1599
1600 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1601 Get the current memory allocation function. Also, unlike the GNU GMP
1602 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1603 @end deftypefun
1604
1605 @section Alternate code buffer
1606 To instruct @lightning{} to use an alternate code buffer it is required
1607 to call @code{jit_realize} before @code{jit_emit}, and then query states
1608 and customize as appropriate.
1609
1610 @deftypefun void jit_realize ()
1611 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1612 that no other @code{jit_xyz} call will be made.
1613 @end deftypefun
1614
1615 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1616 Returns NULL or the previous value set with @code{jit_set_code}, and
1617 sets the @var{code_size} argument to an appropriate value.
1618 If @code{jit_get_code} is called before @code{jit_emit}, the
1619 @var{code_size} argument is set to the expected amount of bytes
1620 required to generate code.
1621 If @code{jit_get_code} is called after @code{jit_emit}, the
1622 @var{code_size} argument is set to the exact amount of bytes used
1623 by the code.
1624 @end deftypefun
1625
1626 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1627 Instructs @lightning{} to output to the @var{code} argument and
1628 use @var{size} as a guard to not write to invalid memory. If during
1629 @code{jit_emit} @lightning{} finds out that the code would not fit
1630 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1631 @end deftypefun
1632
1633 A simple example of a loop using an alternate buffer is:
1634
1635 @example
1636   jit_uint8_t   *code;
1637   int           *(func)(int);      @rem{/* function pointer */}
1638   jit_word_t     code_size;
1639   jit_word_t     real_code_size;
1640   @rem{...}
1641   jit_realize();                   @rem{/* ready to generate code */}
1642   jit_get_code(&code_size);        @rem{/* get expected code size */}
1643   code_size = (code_size + 4095) & -4096;
1644   do (;;) @{
1645     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1646                 MAP_PRIVATE | MAP_ANON, -1, 0);
1647     jit_set_code(code, code_size);
1648     if ((func = jit_emit()) == NULL) @{
1649       munmap(code, code_size);
1650       code_size += 4096;
1651     @}
1652   @} while (func == NULL);
1653   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1654 @end example
1655
1656 The first call to @code{jit_get_code} should return @code{NULL} and set
1657 the @code{code_size} argument to the expected amount of bytes required
1658 to emit code.
1659 The second call to @code{jit_get_code} is after a successful call to
1660 @code{jit_emit}, and will return the value previously set with
1661 @code{jit_set_code} and set the @code{real_code_size} argument to the
1662 exact amount of bytes used to emit the code.
1663
1664 @section Alternate data buffer
1665 Sometimes it may be desirable to customize how, or to prevent
1666 @lightning{} from using an extra buffer for constants or debug
1667 annotation. Usually when also using an alternate code buffer.
1668
1669 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1670 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1671 and sets the @var{data_size} argument to how many bytes are required
1672 for the constants data buffer, and @var{note_size} to how many bytes
1673 are required to store the debug note information.
1674 Note that it always preallocate one debug note entry even if
1675 @code{jit_name} or @code{jit_note} are never called, but will return
1676 zero in the @var{data_size} argument if no constant is required;
1677 constants are only used for the @code{float} and @code{double} operations
1678 that have an immediate argument, and not in all @lightning{} ports.
1679 @end deftypefun
1680
1681 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1682
1683 @var{data} can be NULL if disabling constants and annotations, otherwise,
1684 a valid pointer must be passed. An assertion is done that the data will
1685 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1686 with @code{-DNDEBUG}).
1687
1688 @var{size} tells the space in bytes available in @var{data}.
1689
1690 @var{flags} can be zero to tell to just use the alternate data buffer,
1691 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1692
1693 @table @t
1694 @item JIT_DISABLE_DATA
1695 @cindex JIT_DISABLE_DATA
1696 Instructs @lightning{} to not use a constant table, but to use an
1697 alternate method to synthesize those, usually with a larger code
1698 sequence using stack space to transfer the value from a GPR to a
1699 FPR register.
1700
1701 @item JIT_DISABLE_NOTE
1702 @cindex JIT_DISABLE_NOTE
1703 Instructs @lightning{} to not store file or function name, and
1704 line numbers in the constant buffer.
1705 @end table
1706 @end deftypefun
1707
1708 A simple example of a preventing usage of a data buffer is:
1709
1710 @example
1711   @rem{...}
1712   jit_realize();                        @rem{/* ready to generate code */}
1713   jit_get_data(NULL, NULL);
1714   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1715   @rem{...}
1716 @end example
1717
1718 Or to only use a data buffer, if required:
1719
1720 @example
1721   jit_uint8_t   *data;
1722   jit_word_t     data_size;
1723   @rem{...}
1724   jit_realize();                        @rem{/* ready to generate code */}
1725   jit_get_data(&data_size, NULL);
1726   if (data_size)
1727     data = malloc(data_size);
1728   else
1729     data = NULL;
1730   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1731   @rem{...}
1732   if (data)
1733     free(data);
1734   @rem{...}
1735 @end example
1736
1737 @node Acknowledgements
1738 @chapter Acknowledgements
1739
1740 As far as I know, the first general-purpose portable dynamic code
1741 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1742 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1743 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1744 directly inspired @lightning{}.
1745
1746 Thanks go to Ian Piumarta, who kindly accepted to release his own
1747 program @sc{ccg} under the GNU General Public License, thereby allowing
1748 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1749 @sc{ccg} provides a way of dynamically assemble programs written in the
1750 underlying architecture's assembly language.  So it is not portable,
1751 yet very interesting.
1752
1753 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1754 was first developed as a tool to be used in GNU Smalltalk's dynamic
1755 translator from bytecodes to native code.