deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Registers::               Accessing the whole register file
  34 * Customizations::          Advanced code generation customizations
  35 * Acknowledgements::        Acknowledgements for GNU lightning
  36 @end menu
  37 @end ifnottex
  38
  39 @node Overview
  40 @chapter Introduction to @lightning{}
  41
  42 @iftex
  43 This document describes @value{TOPIC} the @lightning{} library for
  44 dynamic code generation.
  45 @end iftex
  46
  47 Dynamic code generation is the generation of machine code
  48 at runtime. It is typically used to strip a layer of interpretation
  49 by allowing compilation to occur at runtime.  One of the most
  50 well-known applications of dynamic code generation is perhaps that
  51 of interpreters that compile source code to an intermediate bytecode
  52 form, which is then recompiled to machine code at run-time: this
  53 approach effectively combines the portability of bytecode
  54 representations with the speed of machine code.  Another common
  55 application of dynamic code generation is in the field of hardware
  56 simulators and binary emulators, which can use the same techniques
  57 to translate simulated instructions to the instructions of the
  58 underlying machine.
  59
  60 Yet other applications come to mind: for example, windowing
  61 @dfn{bitblt} operations, matrix manipulations, and network packet
  62 filters.  Albeit very powerful and relatively well known within the
  63 compiler community, dynamic code generation techniques are rarely
  64 exploited to their full potential and, with the exception of the
  65 two applications described above, have remained curiosities because
  66 of their portability and functionality barriers: binary instructions
  67 are generated, so programs using dynamic code generation must be
  68 retargeted for each machine; in addition, coding a run-time code
  69 generator is a tedious and error-prone task more than a difficult one.
  70
  71 @lightning{} provides a portable, fast and easily retargetable dynamic
  72 code generation system.
  73
  74 To be portable, @lightning{} abstracts over current architectures'
  75 quirks and unorthogonalities.  The interface that it exposes to is that
  76 of a standardized RISC architecture loosely based on the SPARC and MIPS
  77 chips.  There are a few general-purpose registers (six, not including
  78 those used to receive and pass parameters between subroutines), and
  79 arithmetic operations involve three operands---either three registers
  80 or two registers and an arbitrarily sized immediate value.
  81
  82 On one hand, this architecture is general enough that it is possible to
  83 generate pretty efficient code even on CISC architectures such as the
  84 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  85 real architectures closely enough that, most of the time, the
  86 compiler's constant folding pass ends up generating code which
  87 assembles machine instructions without further tests.
  88
  89 @node Installation
  90 @chapter Configuring and installing @lightning{}
  91
  92 Here we will assume that your system already has the dependencies
  93 necessary to build @lightning{}. For more on dependencies, see
  94 @lightning{}'s @file{README-hacking} file.
  95
  96 The first thing to do to build @lightning{} is to configure the
  97 program, picking the set of macros to be used on the host
  98 architecture; this configuration is automatically performed by
  99 the @file{configure} shell script; to run it, merely type:
 100 @example
 101      ./configure
 102 @end example
 103
 104 @lightning{} supports the @code{--enable-disassembler} option, that
 105 enables linking to GNU binutils and optionally print human readable
 106 disassembly of the jit code. This option can be disabled by the
 107 @code{--disable-disassembler} option.
 108
 109 Another option that @file{configure} accepts is
 110 @code{--enable-assertions}, which enables several consistency checks in
 111 the run-time assemblers.  These are not usually needed, so you can
 112 decide to simply forget about it; also remember that these consistency
 113 checks tend to slow down your code generator.
 114
 115 After you've configured @lightning{}, run @file{make} as usual.
 116
 117 @lightning{} has an extensive set of tests to validate it is working
 118 correctly in the build host. To test it run:
 119 @example
 120     make check
 121 @end example
 122
 123 The next important step is:
 124 @example
 125     make install
 126 @end example
 127
 128 This ends the process of installing @lightning{}.
 129
 130 @node The instruction set
 131 @chapter @lightning{}'s instruction set
 132
 133 @lightning{}'s instruction set was designed by deriving instructions
 134 that closely match those of most existing RISC architectures, or
 135 that can be easily syntesized if absent.  Each instruction is composed
 136 of:
 137 @itemize @bullet
 138 @item
 139 an operation, like @code{sub} or @code{mul}
 140
 141 @item
 142 most times, a register/immediate flag (@code{r} or @code{i})
 143
 144 @item
 145 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 146 @end itemize
 147
 148 Examples of legal mnemonics are @code{addr} (integer add, with three
 149 register operands) and @code{muli} (integer multiply, with two
 150 register operands and an immediate operand).  Each instruction takes
 151 two or three operands; in most cases, one of them can be an immediate
 152 value instead of a register.
 153
 154 Most @lightning{} integer operations are signed wordsize operations,
 155 with the exception of operations that convert types, or load or store
 156 values to/from memory. When applicable, the types and C types are as
 157 follow:
 158
 159 @example
 160      _c         @r{signed char}
 161      _uc        @r{unsigned char}
 162      _s         @r{short}
 163      _us        @r{unsigned short}
 164      _i         @r{int}
 165      _ui        @r{unsigned int}
 166      _l         @r{long}
 167      _f         @r{float}
 168      _d         @r{double}
 169 @end example
 170
 171 Most integer operations do not need a type modifier, and when loading or
 172 storing values to memory there is an alias to the proper operation
 173 using wordsize operands, that is, if ommited, the type is @r{int} on
 174 32-bit architectures and @r{long} on 64-bit architectures.  Note
 175 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 176
 177 When an unsigned operation result differs from the equivalent signed
 178 operation, there is a the @code{_u} modifier.
 179
 180 There are at least seven integer registers, of which six are
 181 general-purpose, while the last is used to contain the frame pointer
 182 (@code{FP}).  The frame pointer can be used to allocate and access local
 183 variables on the stack, using the @code{allocai} or @code{allocar}
 184 instruction.
 185
 186 Of the general-purpose registers, at least three are guaranteed to be
 187 preserved across function calls (@code{V0}, @code{V1} and
 188 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 189 @code{R2}).  Six registers are not very much, but this
 190 restriction was forced by the need to target CISC architectures
 191 which, like the x86, are poor of registers; anyway, backends can
 192 specify the actual number of available registers with the calls
 193 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 194 (for callee-save registers).
 195
 196 There are at least six floating-point registers, named @code{F0} to
 197 @code{F5}.  These are usually caller-save and are separate from the integer
 198 registers on the supported architectures; on Intel architectures,
 199 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 200 the register stack is mapped to a flat register file.  As for the
 201 integer registers, the macro @code{JIT_F_NUM} yields the number of
 202 floating-point registers.
 203
 204 The complete instruction set follows; as you can see, most non-memory
 205 operations only take integers (either signed or unsigned) as operands;
 206 this was done in order to reduce the instruction set, and because most
 207 architectures only provide word and long word operations on registers.
 208 There are instructions that allow operands to be extended to fit a larger
 209 data type, both in a signed and in an unsigned way.
 210
 211 @table @b
 212 @item Binary ALU operations
 213 These accept three operands; the last one can be an immediate.
 214 @code{addx} operations must directly follow @code{addc}, and
 215 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 216 Most, if not all, architectures do not support @r{float} or @r{double}
 217 immediate operands; lightning emulates those operations by moving the
 218 immediate to a temporary register and emiting the call with only
 219 register operands.
 220 @example
 221 addr         _f  _d  O1 = O2 + O3
 222 addi         _f  _d  O1 = O2 + O3
 223 addxr                O1 = O2 + (O3 + carry)
 224 addxi                O1 = O2 + (O3 + carry)
 225 addcr                O1 = O2 + O3, set carry
 226 addci                O1 = O2 + O3, set carry
 227 subr         _f  _d  O1 = O2 - O3
 228 subi         _f  _d  O1 = O2 - O3
 229 subxr                O1 = O2 - (O3 + carry)
 230 subxi                O1 = O2 - (O3 + carry)
 231 subcr                O1 = O2 - O3, set carry
 232 subci                O1 = O2 - O3, set carry
 233 rsbr         _f  _d  O1 = O3 - O1
 234 rsbi         _f  _d  O1 = O3 - O1
 235 mulr         _f  _d  O1 = O2 * O3
 236 muli         _f  _d  O1 = O2 * O3
 237 divr     _u  _f  _d  O1 = O2 / O3
 238 divi     _u  _f  _d  O1 = O2 / O3
 239 remr     _u          O1 = O2 % O3
 240 remi     _u          O1 = O2 % O3
 241 andr                 O1 = O2 & O3
 242 andi                 O1 = O2 & O3
 243 orr                  O1 = O2 | O3
 244 ori                  O1 = O2 | O3
 245 xorr                 O1 = O2 ^ O3
 246 xori                 O1 = O2 ^ O3
 247 lshr                 O1 = O2 << O3
 248 lshi                 O1 = O2 << O3
 249 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 250 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 251 movzr                O1 = O3 ? O1 : O2
 252 movnr                O1 = O3 ? O2 : O1
 253 @end example
 254
 255 @item Four operand binary ALU operations
 256 These accept two result registers, and two operands; the last one can
 257 be an immediate. The first two arguments cannot be the same register.
 258
 259 @code{qmul} stores the low word of the result in @code{O1} and the
 260 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 261 means there was no overflow. For signed multiplication, no overflow
 262 check is based on sign, and can be detected if @code{O2} is zero or
 263 minus one.
 264
 265 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 266 @code{O2}. It can be used as quick way to check if a division is
 267 exact, in which case the remainder is zero.
 268
 269 @example
 270 qmulr    _u       O1 O2 = O3 * O4
 271 qmuli    _u       O1 O2 = O3 * O4
 272 qdivr    _u       O1 O2 = O3 / O4
 273 qdivi    _u       O1 O2 = O3 / O4
 274 @end example
 275
 276 @item Unary ALU operations
 277 These accept two operands, both of which must be registers.
 278 @example
 279 negr         _f  _d  O1 = -O2
 280 comr                 O1 = ~O2
 281 @end example
 282
 283 These unary ALU operations are only defined for float operands.
 284 @example
 285 absr         _f  _d  O1 = fabs(O2)
 286 sqrtr                O1 = sqrt(O2)
 287 @end example
 288
 289 Besides requiring the @code{r} modifier, there are no unary operations
 290 with an immediate operand.
 291
 292 @item Compare instructions
 293 These accept three operands; again, the last can be an immediate.
 294 The last two operands are compared, and the first operand, that must be
 295 an integer register, is set to either 0 or 1, according to whether the
 296 given condition was met or not.
 297
 298 The conditions given below are for the standard behavior of C,
 299 where the ``unordered'' comparison result is mapped to false.
 300
 301 @example
 302 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 303 lti       _u  _f  _d  O1 =  (O2 <  O3)
 304 ler       _u  _f  _d  O1 =  (O2 <= O3)
 305 lei       _u  _f  _d  O1 =  (O2 <= O3)
 306 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 307 gti       _u  _f  _d  O1 =  (O2 >  O3)
 308 ger       _u  _f  _d  O1 =  (O2 >= O3)
 309 gei       _u  _f  _d  O1 =  (O2 >= O3)
 310 eqr           _f  _d  O1 =  (O2 == O3)
 311 eqi           _f  _d  O1 =  (O2 == O3)
 312 ner           _f  _d  O1 =  (O2 != O3)
 313 nei           _f  _d  O1 =  (O2 != O3)
 314 unltr         _f  _d  O1 = !(O2 >= O3)
 315 unler         _f  _d  O1 = !(O2 >  O3)
 316 ungtr         _f  _d  O1 = !(O2 <= O3)
 317 unger         _f  _d  O1 = !(O2 <  O3)
 318 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 319 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 320 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 321 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 322 @end example
 323
 324 @item Transfer operations
 325 These accept two operands; for @code{ext} both of them must be
 326 registers, while @code{mov} accepts an immediate value as the second
 327 operand.
 328
 329 Unlike @code{movr} and @code{movi}, the other instructions are used
 330 to truncate a wordsize operand to a smaller integer data type or to
 331 convert float data types. You can also use @code{extr} to convert an
 332 integer to a floating point value: the usual options are @code{extr_f}
 333 and @code{extr_d}.
 334
 335 @example
 336 movr                                 _f  _d  O1 = O2
 337 movi                                 _f  _d  O1 = O2
 338 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 339 truncr                               _f  _d  O1 = trunc(O2)
 340 @end example
 341
 342 In 64-bit architectures it may be required to use @code{truncr_f_i},
 343 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 344 the equivalent C code.  Only the @code{_i} modifier is available in
 345 32-bit architectures.
 346
 347 @example
 348 truncr_f_i    = <int> O1 = <float> O2
 349 truncr_f_l    = <long>O1 = <float> O2
 350 truncr_d_i    = <int> O1 = <double>O2
 351 truncr_d_l    = <long>O1 = <double>O2
 352 @end example
 353
 354 The float conversion operations are @emph{destination first,
 355 source second}, but the order of the types is reversed.  This happens
 356 for historical reasons.
 357
 358 @example
 359 extr_f_d    = <double>O1 = <float> O2
 360 extr_d_f    = <float> O1 = <double>O2
 361 @end example
 362
 363 @item Network extensions
 364 These accept two operands, both of which must be registers; these
 365 two instructions actually perform the same task, yet they are
 366 assigned to two mnemonics for the sake of convenience and
 367 completeness.  As usual, the first operand is the destination and
 368 the second is the source.
 369 The @code{_ul} variant is only available in 64-bit architectures.
 370 @example
 371 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 372 ntohr    _us _ui _ul @r{Network-to-host order }
 373 @end example
 374
 375 @code{bswapr} can be used to unconditionally byte-swap an operand.
 376 On little-endian architectures, @code{htonr} and @code{ntohr} resolve
 377 to this.
 378 The @code{_ul} variant is only available in 64-bit architectures.
 379 @example
 380 bswapr    _us _ui _ul  01 = byte_swap(02)
 381 @end example
 382
 383 @item Load operations
 384 @code{ld} accepts two operands while @code{ldx} accepts three;
 385 in both cases, the last can be either a register or an immediate
 386 value. Values are extended (with or without sign, according to
 387 the data type specification) to fit a whole register.
 388 The @code{_ui} and @code{_l} types are only available in 64-bit
 389 architectures.  For convenience, there is a version without a
 390 type modifier for integer or pointer operands that uses the
 391 appropriate wordsize call.
 392 @example
 393 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 394 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 395 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 396 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 397 @end example
 398
 399 @item Store operations
 400 @code{st} accepts two operands while @code{stx} accepts three; in
 401 both cases, the first can be either a register or an immediate
 402 value. Values are sign-extended to fit a whole register.
 403 @example
 404 str     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 405 sti     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 406 stxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 407 stxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 408 @end example
 409 As for the load operations, the @code{_ui} and @code{_l} types are
 410 only available in 64-bit architectures, and for convenience, there
 411 is a version without a type modifier for integer or pointer operands
 412 that uses the appropriate wordsize call.
 413
 414 @item Argument management
 415 These are:
 416 @example
 417 prepare     (not specified)
 418 va_start    (not specified)
 419 pushargr                                   _f  _d
 420 pushargi                                   _f  _d
 421 va_push     (not specified)
 422 arg                                        _f  _d
 423 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 424 va_arg                                         _d
 425 putargr                                    _f  _d
 426 putargi                                    _f  _d
 427 ret         (not specified)
 428 retr                                       _f  _d
 429 reti                                       _f  _d
 430 va_end      (not specified)
 431 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 432 epilog      (not specified)
 433 @end example
 434 As with other operations that use a type modifier, the @code{_ui} and
 435 @code{_l} types are only available in 64-bit architectures, but there
 436 are operations without a type modifier that alias to the appropriate
 437 integer operation with wordsize operands.
 438
 439 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 440 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 441 A code snippet that wants to call another procedure and has to pass
 442 arguments must, in order: use the @code{prepare} instruction and use
 443 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 444 left to right order}; and use @code{finish} or @code{call} (explained below)
 445 to perform the actual call.
 446
 447 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 448 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 449 @code{va_push} is required when passing a @code{va_list} to another function,
 450 because not all architectures expect it as a single pointer. Known case
 451 is DEC Alpha, that requires it as a structure passed by value.
 452
 453 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 454 @code{arg} is different from other instruction in that it does not
 455 actually generate any code: instead, it is a function which returns
 456 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 457 a value'' means that @lightning{} code that compile these
 458 instructions return a value when expanded.} You should call
 459 @code{arg} as soon as possible, before any function call or, more
 460 easily, right after the @code{prolog} instructions
 461 (which is treated later).
 462
 463 @code{getarg} accepts a register argument and a value returned by
 464 @code{arg}, and will move that argument to the register, extending
 465 it (with or without sign, according to the data type specification)
 466 to fit a whole register.  These instructions are more intimately
 467 related to the usage of the @lightning{} instruction set in code
 468 that generates other code, so they will be treated more
 469 specifically in @ref{GNU lightning examples, , Generating code at
 470 run-time}.
 471
 472 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 473 it accepts as first argument a register or immediate, and as
 474 second argument a value returned by @code{arg}. It allows changing,
 475 or restoring an argument to the current function, and is a
 476 construct required to implement tail call optimization. Note that
 477 arguments in registers are very cheap, but will be overwritten
 478 at any moment, including on some operations, for example division,
 479 that on several ports is implemented as a function call.
 480
 481 Finally, the @code{retval} instruction fetches the return value of a
 482 called function in a register.  The @code{retval} instruction takes a
 483 register argument and copies the return value of the previously called
 484 function in that register.  A function with a return value should use
 485 @code{retr} or @code{reti} to put the return value in the return register
 486 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 487
 488 @code{epilog} is an optional call, that marks the end of a function
 489 body. It is automatically generated by @lightning{} if starting a new
 490 function (what should be done after a @code{ret} call) or finishing
 491 generating jit.
 492 It is very important to note that the fact that @code{epilog} being
 493 optional may cause a common mistake. Consider this:
 494 @example
 495 fun1:
 496     prolog
 497     ...
 498     ret
 499 fun2:
 500     prolog
 501 @end example
 502 Because @code{epilog} is added when finding a new @code{prolog},
 503 this will cause the @code{fun2} label to actually be before the
 504 return from @code{fun1}. Because @lightning{} will actually
 505 understand it as:
 506 @example
 507 fun1:
 508     prolog
 509     ...
 510     ret
 511 fun2:
 512     epilog
 513     prolog
 514 @end example
 515
 516 You should observe a few rules when using these macros.  First of
 517 all, if calling a varargs function, you should use the @code{ellipsis}
 518 call to mark the position of the ellipsis in the C prototype.
 519
 520 You should not nest calls to @code{prepare} inside a
 521 @code{prepare/finish} block.  Doing this will result in undefined
 522 behavior. Note that for functions with zero arguments you can use
 523 just @code{call}.
 524
 525 @item Branch instructions
 526 Like @code{arg}, these also return a value which, in this case,
 527 is to be used to compile forward branches as explained in
 528 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 529 compared; of these, the last can be either a register or an immediate.
 530 They are:
 531 @example
 532 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 533 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 534 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 535 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 536 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 537 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 538 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 539 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 540 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 541 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 542 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 543 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 544
 545 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 546 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 547 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 548 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 549 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 550 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 551 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 552 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 553
 554 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 555 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 556 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 557 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 558 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 559 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 560 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 561 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 562 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 563 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 564 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 565 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 566 @end example
 567
 568 @item Jump and return operations
 569 These accept one argument except @code{ret} and @code{jmpi} which
 570 have none; the difference between @code{finishi} and @code{calli}
 571 is that the latter does not clean the stack from pushed parameters
 572 (if any) and the former must @strong{always} follow a @code{prepare}
 573 instruction.
 574 @example
 575 callr     (not specified)                @r{function call to register O1}
 576 calli     (not specified)                @r{function call to immediate O1}
 577 finishr   (not specified)                @r{function call to register O1}
 578 finishi   (not specified)                @r{function call to immediate O1}
 579 jmpr      (not specified)                @r{unconditional jump to register}
 580 jmpi      (not specified)                @r{unconditional jump}
 581 ret       (not specified)                @r{return from subroutine}
 582 retr      _c _uc _s _us _i _ui _l _f _d
 583 reti      _c _uc _s _us _i _ui _l _f _d
 584 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 585                                          @r{to register}
 586 @end example
 587
 588 Like branch instruction, @code{jmpi} also returns a value which is to
 589 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 590 numbers}.
 591
 592 @item Labels
 593 There are 3 @lightning{} instructions to create labels:
 594 @example
 595 label     (not specified)                @r{simple label}
 596 forward   (not specified)                @r{forward label}
 597 indirect  (not specified)                @r{special simple label}
 598 @end example
 599
 600 @code{label} is normally used as @code{patch_at} argument for backward
 601 jumps.
 602
 603 @example
 604         jit_node_t *jump, *label;
 605 label = jit_label();
 606         ...
 607         jump = jit_beqr(JIT_R0, JIT_R1);
 608         jit_patch_at(jump, label);
 609 @end example
 610
 611 @code{forward} is used to patch code generation before the actual
 612 position of the label is known.
 613
 614 @example
 615         jit_node_t *jump, *label;
 616 label = jit_forward();
 617         jump = jit_beqr(JIT_R0, JIT_R1);
 618         jit_patch_at(jump, label);
 619         ...
 620         jit_link(label);
 621 @end example
 622
 623 @code{indirect} is useful when creating jump tables, and tells
 624 @lightning{} to not optimize out a label that is not the target of
 625 any jump, because an indirect jump may land where it is defined.
 626
 627 @example
 628         jit_node_t *jump, *label;
 629         ...
 630         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 631         ...
 632 label = jit_indirect();
 633 @end example
 634
 635 @code{indirect} is an special case of @code{note} and @code{name}
 636 because it is a valid argument to @code{address}.
 637
 638 Note that the usual idiom to write the previous example is
 639 @example
 640         jit_node_t *addr, *jump;
 641 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 642         ...
 643         jmpr(JIT_R0);
 644         ...
 645         jit_patch(addr);                 @rem{/* implicit label added */}
 646 @end example
 647
 648 that automatically binds the implicit label added by @code{patch} with
 649 the @code{movi}, but on some special conditions it is required to create
 650 an "unbound" label.
 651
 652 @item Function prolog
 653
 654 These macros are used to set up a function prolog.  The @code{allocai}
 655 call accept a single integer argument and returns an offset value
 656 for stack storage access.  The @code{allocar} accepts two registers
 657 arguments, the first is set to the offset for stack access, and the
 658 second is the size in bytes argument.
 659
 660 @example
 661 prolog    (not specified)                @r{function prolog}
 662 allocai   (not specified)                @r{reserve space on the stack}
 663 allocar   (not specified)                @r{allocate space on the stack}
 664 @end example
 665
 666 @code{allocai} receives the number of bytes to allocate and returns
 667 the offset from the frame pointer register @code{FP} to the base of
 668 the area.
 669
 670 @code{allocar} receives two register arguments.  The first is where
 671 to store the offset from the frame pointer register @code{FP} to the
 672 base of the area.  The second argument is the size in bytes.  Note
 673 that @code{allocar} is dynamic allocation, and special attention
 674 should be taken when using it.  If called in a loop, every iteration
 675 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 676 depending on backend requirements, even if allocating only one byte.
 677 It is advisable to not use it with @code{frame} and @code{tramp}; it
 678 should work with @code{frame} with special care to call only once,
 679 but is not supported if used in @code{tramp}, even if called only
 680 once.
 681
 682 As a small appetizer, here is a small function that adds 1 to the input
 683 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 684 is a bit different from the one used when writing real subroutines with
 685 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 686 examples, , Generating code at run-time}.
 687
 688 @example
 689 incr:
 690      prolog
 691 in = arg                     @rem{! We have an integer argument}
 692      getarg    R0, in        @rem{! Move it to R0}
 693      addi      R0, R0, 1     @rem{! Add 1}
 694      retr      R0            @rem{! And return the result}
 695 @end example
 696
 697 And here is another function which uses the @code{printf} function from
 698 the standard C library to write a number in hexadecimal notation:
 699
 700 @example
 701 printhex:
 702      prolog
 703 in = arg                     @rem{! Same as above}
 704      getarg    R0, in
 705      prepare                 @rem{! Begin call sequence for printf}
 706      pushargi  "%x"          @rem{! Push format string}
 707      ellipsis                @rem{! Varargs start here}
 708      pushargr  R0            @rem{! Push second argument}
 709      finishi   printf        @rem{! Call printf}
 710      ret                     @rem{! Return to caller}
 711 @end example
 712
 713 @item Register liveness
 714
 715 During code generation, @lightning{} occasionally needs scratch registers
 716 or needs to use architecture-defined registers.  For that, @lightning{}
 717 internally maintains register liveness information.
 718
 719 In the following example, @code{qdivr} will need special registers like
 720 @code{R0} on some architectures.  As @lightning{} understands that
 721 @code{R0} is used in the subsequent instruction, it will create
 722 save/restore code for @code{R0} in case.
 723
 724 @example
 725 ...
 726 qdivr V0, V1, V2, V3
 727 movr  V3, R0
 728 ...
 729 @end example
 730
 731 The same is not true in the example that follows.  Here, @code{R0} is
 732 not alive after the division operation because @code{R0} is neither an
 733 argument register nor a callee-save register.  Thus, no save/restore
 734 code for @code{R0} will be created in case.
 735
 736 @example
 737 ...
 738 qdivr V0, V1, V2, V3
 739 jmpr  R1
 740 ...
 741 @end example
 742
 743 The @code{live} instruction can be used to mark a register as live after
 744 it as in the following example.  Here, @code{R0} will be preserved
 745 across the division.
 746
 747 @example
 748 ...
 749 qdivr V0, V1, V2, V3
 750 live R0
 751 jmpr R1
 752 ...
 753 @end example
 754
 755 The @code{live} instruction is useful at code entry and exit points,
 756 like after and before a @code{callr} instruction.
 757
 758 @item Trampolines, continuations and tail call optimization
 759
 760 Frequently it is required to generate jit code that must jump to
 761 code generated later, possibly from another @code{jit_context_t}.
 762 These require compatible stack frames.
 763
 764 @lightning{} provides two primitives from where trampolines,
 765 continuations and tail call optimization can be implemented.
 766
 767 @example
 768 frame   (not specified)                  @r{create stack frame}
 769 tramp   (not specified)                  @r{assume stack frame}
 770 @end example
 771
 772 @code{frame} receives an integer argument@footnote{It is not
 773 automatically computed because it does not know about the
 774 requirement of later generated code.} that defines the size in
 775 bytes for the stack frame of the current, @code{C} callable,
 776 jit function. To calculate this value, a good formula is maximum
 777 number of arguments to any called native function times
 778 eight@footnote{Times eight so that it works for double arguments.
 779 And would not need conditionals for ports that pass arguments in
 780 the stack.}, plus the sum of the arguments to any call to
 781 @code{jit_allocai}. @lightning{} automatically adjusts this value
 782 for any backend specific stack memory it may need, or any
 783 alignment constraint.
 784
 785 @code{frame} also instructs @lightning{} to save all callee
 786 save registers in the prolog and reload in the epilog.
 787
 788 @example
 789 main:                        @rem{! jit entry point}
 790      prolog                  @rem{! function prolog}
 791      frame  256              @rem{! save all callee save registers and}
 792                              @rem{! reserve at least 256 bytes in stack}
 793 main_loop:
 794      ...
 795      jmpi   handler          @rem{! jumps to external code}
 796      ...
 797      ret                     @rem{! return to the caller}
 798 @end example
 799
 800 @code{tramp} differs from @code{frame} only that a prolog and epilog
 801 will not be generated. Note that @code{prolog} must still be used.
 802 The code under @code{tramp} must be ready to be entered with a jump
 803 at the prolog position, and instead of a return, it must end with
 804 a non conditional jump. @code{tramp} exists solely for the fact
 805 that it allows optimizing out prolog and epilog code that would
 806 never be executed.
 807
 808 @example
 809 handler:                     @rem{! handler entry point}
 810      prolog                  @rem{! function prolog}
 811      tramp  256              @rem{! assumes all callee save registers}
 812                              @rem{! are saved and there is at least}
 813                              @rem{! 256 bytes in stack}
 814      ...
 815      jmpi   main_loop        @rem{! return to the main loop}
 816 @end example
 817
 818 @lightning{} only supports Tail Call Optimization using the
 819 @code{tramp} construct. Any other way is not guaranteed to
 820 work on all ports.
 821
 822 An example of a simple (recursive) tail call optimization:
 823
 824 @example
 825 factorial:                   @rem{! Entry point of the factorial function}
 826      prolog
 827 in = arg                     @rem{! Receive an integer argument}
 828      getarg R0, in           @rem{! Move argument to RO}
 829      prepare
 830          pushargi 1          @rem{! This is the accumulator}
 831          pushargr R0         @rem{! This is the argument}
 832      finishi fact            @rem{! Call the tail call optimized function}
 833      retval R0               @rem{! Fetch the result}
 834      retr R0                 @rem{! Return it}
 835      epilog                  @rem{! Epilog *before* label before prolog}
 836
 837 fact:                        @rem{! Entry point of the helper function}
 838      prolog
 839      frame 16                @rem{! Reserve 16 bytes in the stack}
 840 fact_entry:                  @rem{! This is the tail call entry point}
 841 ac = arg                     @rem{! The accumulator is the first argument}
 842 in = arg                     @rem{! The factorial argument}
 843      getarg R0, ac           @rem{! Move the accumulator to R0}
 844      getarg R1, in           @rem{! Move the argument to R1}
 845      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
 846      mulr R0, R0, R1         @rem{! accumulator *= argument}
 847      putargr R0, ac          @rem{! Update the accumulator}
 848      subi R1, R1, 1          @rem{! argument -= 1}
 849      putargr R1, in          @rem{! Update the argument}
 850      jmpi fact_entry         @rem{! Tail Call Optimize it!}
 851 fact_out:
 852      retr R0                 @rem{! Return the accumulator}
 853 @end example
 854
 855 @item Predicates
 856 @example
 857 forward_p      (not specified)           @r{forward label predicate}
 858 indirect_p     (not specified)           @r{indirect label predicate}
 859 target_p       (not specified)           @r{used label predicate}
 860 arg_register_p (not specified)           @r{argument kind predicate}
 861 callee_save_p  (not specified)           @r{callee save predicate}
 862 pointer_p      (not specified)           @r{pointer predicate}
 863 @end example
 864
 865 @code{forward_p} expects a @code{jit_node_t*} argument, and
 866 returns non zero if it is a forward label reference, that is,
 867 a label returned by @code{forward}, that still needs a
 868 @code{link} call.
 869
 870 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
 871 non zero if it is an indirect label reference, that is, a label that
 872 was returned by @code{indirect}.
 873
 874 @code{target_p} expects a @code{jit_node_t*} argument, that is any
 875 kind of label, and will return non zero if there is at least one
 876 jump or move referencing it.
 877
 878 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
 879 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
 880 will return non zero if the argument lives in a register. This call
 881 is useful to know the live range of register arguments, as those
 882 are very fast to read and write, but have volatile values.
 883
 884 @code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
 885 @code{JIT_Fn}, and will return non zero if the register is callee
 886 save. This call is useful because on several ports, the @code{JIT_Rn}
 887 and @code{JIT_Fn} registers are actually callee save; no need
 888 to save and load the values when making function calls.
 889
 890 @code{pointer_p} expects a pointer argument, and will return non
 891 zero if the pointer is inside the generated jit code. Must be
 892 called after @code{jit_emit} and before @code{jit_destroy_state}.
 893 @end table
 894
 895 @node GNU lightning examples
 896 @chapter Generating code at run-time
 897
 898 To use @lightning{}, you should include the @file{lightning.h} file that
 899 is put in your include directory by the @samp{make install} command.
 900
 901 Each of the instructions above translates to a macro or function call.
 902 All you have to do is prepend @code{jit_} (lowercase) to opcode names
 903 and @code{JIT_} (uppercase) to register names.  Of course, parameters
 904 are to be put between parentheses.
 905
 906 This small tutorial presents three examples:
 907
 908 @iftex
 909 @itemize @bullet
 910 @item
 911 The @code{incr} function found in @ref{The instruction set, ,
 912 @lightning{}'s instruction set}:
 913
 914 @item
 915 A simple function call to @code{printf}
 916
 917 @item
 918 An RPN calculator.
 919
 920 @item
 921 Fibonacci numbers
 922 @end itemize
 923 @end iftex
 924 @ifnottex
 925 @menu
 926 * incr::             A function which increments a number by one
 927 * printf::           A simple function call to printf
 928 * RPN calculator::   A more complex example, an RPN calculator
 929 * Fibonacci::        Calculating Fibonacci numbers
 930 @end menu
 931 @end ifnottex
 932
 933 @node incr
 934 @section A function which increments a number by one
 935
 936 Let's see how to create and use the sample @code{incr} function created
 937 in @ref{The instruction set, , @lightning{}'s instruction set}:
 938
 939 @example
 940 #include <stdio.h>
 941 #include <lightning.h>
 942
 943 static jit_state_t *_jit;
 944
 945 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
 946
 947 int main(int argc, char *argv[])
 948 @{
 949   jit_node_t  *in;
 950   pifi         incr;
 951
 952   init_jit(argv[0]);
 953   _jit = jit_new_state();
 954
 955   jit_prolog();                    @rem{/* @t{     prolog             } */}
 956   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
 957   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
 958   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
 959   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
 960
 961   incr = jit_emit();
 962   jit_clear_state();
 963
 964   @rem{/* call the generated code@comma{} passing 5 as an argument */}
 965   printf("%d + 1 = %d\n", 5, incr(5));
 966
 967   jit_destroy_state();
 968   finish_jit();
 969   return 0;
 970 @}
 971 @end example
 972
 973 Let's examine the code line by line (well, almost@dots{}):
 974
 975 @table @t
 976 @item #include <lightning.h>
 977 You already know about this.  It defines all of @lightning{}'s macros.
 978
 979 @item static jit_state_t *_jit;
 980 You might wonder about what is @code{jit_state_t}.  It is a structure
 981 that stores jit code generation information.  The name @code{_jit} is
 982 special, because since multiple jit generators can run at the same
 983 time, you must either @r{#define _jit my_jit_state} or name it
 984 @code{_jit}.
 985
 986 @item typedef int (*pifi)(int);
 987 Just a handy typedef for a pointer to a function that takes an
 988 @code{int} and returns another.
 989
 990 @item jit_node_t  *in;
 991 Declares a variable to hold an identifier for a function argument. It
 992 is an opaque pointer, that will hold the return of a call to @code{arg}
 993 and be used as argument to @code{getarg}.
 994
 995 @item pifi         incr;
 996 Declares a function pointer variable to a function that receives an
 997 @code{int} and returns an @code{int}.
 998
 999 @item init_jit(argv[0]);
1000 You must call this function before creating a @code{jit_state_t}
1001 object. This function does global state initialization, and may need
1002 to detect CPU or Operating System features.  It receives a string
1003 argument that is later used to read symbols from a shared object using
1004 GNU binutils if disassembly was enabled at configure time. If no
1005 disassembly will be performed a NULL pointer can be used as argument.
1006
1007 @item _jit = jit_new_state();
1008 This call initializes a @lightning{} jit state.
1009
1010 @item jit_prolog();
1011 Ok, so we start generating code for our beloved function@dots{}
1012
1013 @item in = jit_arg();
1014 @itemx jit_getarg(JIT_R0, in);
1015 We retrieve the first (and only) argument, an integer, and store it
1016 into the general-purpose register @code{R0}.
1017
1018 @item jit_addi(JIT_R0, JIT_R0, 1);
1019 We add one to the content of the register.
1020
1021 @item jit_retr(JIT_R0);
1022 This instruction generates a standard function epilog that returns
1023 the contents of the @code{R0} register.
1024
1025 @item incr = jit_emit();
1026 This instruction is very important.  It actually translates the
1027 @lightning{} macros used before to machine code, flushes the generated
1028 code area out of the processor's instruction cache and return a
1029 pointer to the start of the code.
1030
1031 @item jit_clear_state();
1032 This call cleanups any data not required for jit execution. Note
1033 that it must be called after any call to @code{jit_print} or
1034 @code{jit_address}, as this call destroy the @lightning{}
1035 intermediate representation.
1036
1037 @item printf("%d + 1 = %d", 5, incr(5));
1038 Calling our function is this simple---it is not distinguishable from
1039 a normal C function call, the only difference being that @code{incr}
1040 is a variable.
1041
1042 @item jit_destroy_state();
1043 Releases all memory associated with the jit context. It should be
1044 called after known the jit will no longer be called.
1045
1046 @item finish_jit();
1047 This call cleanups any global state hold by @lightning{}, and is
1048 advisable to call it once jit code will no longer be generated.
1049 @end table
1050
1051 @lightning{} abstracts two phases of dynamic code generation: selecting
1052 instructions that map the standard representation, and emitting binary
1053 code for these instructions.  The client program has the responsibility
1054 of describing the code to be generated using the standard @lightning{}
1055 instruction set.
1056
1057 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1058 architecture (on the right is the code that an assembly-language
1059 programmer would write):
1060
1061 @table @b
1062 @item SPARC
1063 @example
1064       save  %sp, -112, %sp
1065       mov  %i0, %g2                 retl
1066       inc  %g2                      inc %o0
1067       mov  %g2, %i0
1068       restore
1069       retl
1070       nop
1071 @end example
1072 In this case, @lightning{} introduces overhead to create a register
1073 window (not knowing that the procedure is a leaf procedure) and to
1074 move the argument to the general purpose register @code{R0} (which
1075 maps to @code{%g2} on the SPARC).
1076 @end table
1077
1078 @table @b
1079 @item x86_64
1080 @example
1081     sub   $0x30,%rsp
1082     mov   %rbp,(%rsp)
1083     mov   %rsp,%rbp
1084     sub   $0x18,%rsp
1085     mov   %rdi,%rax            mov %rdi, %rax
1086     add   $0x1,%rax            inc %rax
1087     mov   %rbp,%rsp
1088     mov   (%rsp),%rbp
1089     add   $0x30,%rsp
1090     retq                       retq
1091 @end example
1092 In this case, the main overhead is due to the function's prolog and
1093 epilog, and stack alignment after reserving stack space for word
1094 to/from float conversions or moving data from/to x87 to/from SSE.
1095 Note that besides allocating space to save callee saved registers,
1096 no registers are saved/restored because @lightning{} notices those
1097 registers are not modified. There is currently no logic to detect
1098 if it needs to allocate stack space for type conversions neither
1099 proper leaf function detection, but these are subject to change
1100 (FIXME).
1101 @end table
1102
1103 @node printf
1104 @section A simple function call to @code{printf}
1105
1106 Again, here is the code for the example:
1107
1108 @example
1109 #include <stdio.h>
1110 #include <lightning.h>
1111
1112 static jit_state_t *_jit;
1113
1114 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1115
1116 int main(int argc, char *argv[])
1117 @{
1118   pvfi          myFunction;             @rem{/* ptr to generated code */}
1119   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1120   jit_node_t    *in;                    @rem{/* to get the argument */}
1121
1122   init_jit(argv[0]);
1123   _jit = jit_new_state();
1124
1125   start = jit_note(__FILE__, __LINE__);
1126   jit_prolog();
1127   in = jit_arg();
1128   jit_getarg(JIT_R1, in);
1129   jit_prepare();
1130   jit_pushargi((jit_word_t)"generated %d bytes\n");
1131   jit_ellipsis();
1132   jit_pushargr(JIT_R1);
1133   jit_finishi(printf);
1134   jit_ret();
1135   jit_epilog();
1136   end = jit_note(__FILE__, __LINE__);
1137
1138   myFunction = jit_emit();
1139
1140   @rem{/* call the generated code@comma{} passing its size as argument */}
1141   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1142   jit_clear_state();
1143
1144   jit_disassemble();
1145
1146   jit_destroy_state();
1147   finish_jit();
1148   return 0;
1149 @}
1150 @end example
1151
1152 The function shows how many bytes were generated.  Most of the code
1153 is not very interesting, as it resembles very closely the program
1154 presented in @ref{incr, , A function which increments a number by one}.
1155
1156 For this reason, we're going to concentrate on just a few statements.
1157
1158 @table @t
1159 @item start = jit_note(__FILE__, __LINE__);
1160 @itemx @r{@dots{}}
1161 @itemx end = jit_note(__FILE__, __LINE__);
1162 These two instruction call the @code{jit_note} macro, which creates
1163 a note in the jit code; arguments to @code{jit_note} usually are a
1164 filename string and line number integer, but using NULL for the
1165 string argument is perfectly valid if only need to create a simple
1166 marker in the code.
1167
1168 @item jit_ellipsis();
1169 @code{ellipsis} usually is only required if calling varargs functions
1170 with double arguments, but it is a good practice to properly describe
1171 the @r{@dots{}} in the call sequence.
1172
1173 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1174 Note the use of the @code{(jit_word_t)} cast, that is used only
1175 to avoid a compiler warning, due to using a pointer where a
1176 wordsize integer type was expected.
1177
1178 @item jit_prepare();
1179 @itemx @r{@dots{}}
1180 @itemx jit_finishi(printf);
1181 Once the arguments to @code{printf} have been pushed, what means
1182 moving them to stack or register arguments, the @code{printf}
1183 function is called and the stack cleaned.  Note how @lightning{}
1184 abstracts the differences between different architectures and
1185 ABI's -- the client program does not know how parameter passing
1186 works on the host architecture.
1187
1188 @item jit_epilog();
1189 Usually it is not required to call @code{epilog}, but because it
1190 is implicitly called when noticing the end of a function, if the
1191 @code{end} variable was set with a @code{note} call after the
1192 @code{ret}, it would not consider the function epilog.
1193
1194 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1195 This calls the generate jit function passing as argument the offset
1196 difference from the @code{start} and @code{end} notes. The @code{address}
1197 call must be done after the @code{emit} call or either a fatal error
1198 will happen (if @lightning{} is built with assertions enable) or an
1199 undefined value will be returned.
1200
1201 @item jit_clear_state();
1202 Note that @code{jit_clear_state} was called after executing jit in
1203 this example. It was done because it must be called after any call
1204 to @code{jit_address} or @code{jit_print}.
1205
1206 @item jit_disassemble();
1207 @code{disassemble} will dump the generated code to standard output,
1208 unless @lightning{} was built with the disassembler disabled, in which
1209 case no output will be shown.
1210 @end table
1211
1212 @node RPN calculator
1213 @section A more complex example, an RPN calculator
1214
1215 We create a small stack-based RPN calculator which applies a series
1216 of operators to a given parameter and to other numeric operands.
1217 Unlike previous examples, the code generator is fully parameterized
1218 and is able to compile different formulas to different functions.
1219 Here is the code for the expression compiler; a sample usage will
1220 follow.
1221
1222 Since @lightning{} does not provide push/pop instruction, this
1223 example uses a stack-allocated area to store the data.  Such an
1224 area can be allocated using the macro @code{allocai}, which
1225 receives the number of bytes to allocate and returns the offset
1226 from the frame pointer register @code{FP} to the base of the
1227 area.
1228
1229 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1230 to access stack-allocated variables.  However, it is possible to
1231 use operations such as @code{add} to compute the address of the
1232 variables, and pass the address around.
1233
1234 @example
1235 #include <stdio.h>
1236 #include <lightning.h>
1237
1238 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1239
1240 static jit_state_t *_jit;
1241
1242 void stack_push(int reg, int *sp)
1243 @{
1244   jit_stxi_i (*sp, JIT_FP, reg);
1245   *sp += sizeof (int);
1246 @}
1247
1248 void stack_pop(int reg, int *sp)
1249 @{
1250   *sp -= sizeof (int);
1251   jit_ldxi_i (reg, JIT_FP, *sp);
1252 @}
1253
1254 jit_node_t *compile_rpn(char *expr)
1255 @{
1256   jit_node_t *in, *fn;
1257   int stack_base, stack_ptr;
1258
1259   fn = jit_note(NULL, 0);
1260   jit_prolog();
1261   in = jit_arg();
1262   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1263
1264   jit_getarg_i(JIT_R2, in);
1265
1266   while (*expr) @{
1267     char buf[32];
1268     int n;
1269     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1270       expr += n - 1;
1271       stack_push(JIT_R0, &stack_ptr);
1272       jit_movi(JIT_R0, atoi(buf));
1273     @} else if (*expr == 'x') @{
1274       stack_push(JIT_R0, &stack_ptr);
1275       jit_movr(JIT_R0, JIT_R2);
1276     @} else if (*expr == '+') @{
1277       stack_pop(JIT_R1, &stack_ptr);
1278       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1279     @} else if (*expr == '-') @{
1280       stack_pop(JIT_R1, &stack_ptr);
1281       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1282     @} else if (*expr == '*') @{
1283       stack_pop(JIT_R1, &stack_ptr);
1284       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1285     @} else if (*expr == '/') @{
1286       stack_pop(JIT_R1, &stack_ptr);
1287       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1288     @} else @{
1289       fprintf(stderr, "cannot compile: %s\n", expr);
1290       abort();
1291     @}
1292     ++expr;
1293   @}
1294   jit_retr(JIT_R0);
1295   jit_epilog();
1296   return fn;
1297 @}
1298 @end example
1299
1300 The principle on which the calculator is based is easy: the stack top
1301 is held in R0, while the remaining items of the stack are held in the
1302 memory area that we allocate with @code{allocai}.  Compiling a numeric
1303 operand or the argument @code{x} pushes the old stack top onto the
1304 stack and moves the operand into R0; compiling an operator pops the
1305 second operand off the stack into R1, and compiles the operation so
1306 that the result goes into R0, thus becoming the new stack top.
1307
1308 This example allocates a fixed area for 32 @code{int}s.  This is not
1309 a problem when the function is a leaf like in this case; in a full-blown
1310 compiler you will want to analyze the input and determine the number
1311 of needed stack slots---a very simple example of register allocation.
1312 The area is then managed like a stack using @code{stack_push} and
1313 @code{stack_pop}.
1314
1315 Source code for the client (which lies in the same source file) follows:
1316
1317 @example
1318 int main(int argc, char *argv[])
1319 @{
1320   jit_node_t *nc, *nf;
1321   pifi c2f, f2c;
1322   int i;
1323
1324   init_jit(argv[0]);
1325   _jit = jit_new_state();
1326
1327   nc = compile_rpn("32x9*5/+");
1328   nf = compile_rpn("x32-5*9/");
1329   (void)jit_emit();
1330   c2f = (pifi)jit_address(nc);
1331   f2c = (pifi)jit_address(nf);
1332   jit_clear_state();
1333
1334   printf("\nC:");
1335   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1336   printf("\nF:");
1337   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1338   printf("\n");
1339
1340   printf("\nF:");
1341   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1342   printf("\nC:");
1343   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1344   printf("\n");
1345
1346   jit_destroy_state();
1347   finish_jit();
1348   return 0;
1349 @}
1350 @end example
1351
1352 The client displays a conversion table between Celsius and Fahrenheit
1353 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1354 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1355 respectively.
1356
1357 Providing the formula as an argument to @code{compile_rpn} effectively
1358 parameterizes code generation, making it possible to use the same code
1359 to compile different functions; this is what makes dynamic code
1360 generation so powerful.
1361
1362 @node Fibonacci
1363 @section Fibonacci numbers
1364
1365 The code in this section calculates the Fibonacci sequence. That is
1366 modeled by the recurrence relation:
1367 @display
1368      f(0) = 0
1369      f(1) = f(2) = 1
1370      f(n) = f(n-1) + f(n-2)
1371 @end display
1372
1373 The purpose of this example is to introduce branches.  There are two
1374 kind of branches: backward branches and forward branches.  We'll
1375 present the calculation in a recursive and iterative form; the
1376 former only uses forward branches, while the latter uses both.
1377
1378 @example
1379 #include <stdio.h>
1380 #include <lightning.h>
1381
1382 static jit_state_t *_jit;
1383
1384 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1385
1386 int main(int argc, char *argv[])
1387 @{
1388   pifi       fib;
1389   jit_node_t *label;
1390   jit_node_t *call;
1391   jit_node_t *in;                 @rem{/* offset of the argument */}
1392   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1393   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1394
1395   init_jit(argv[0]);
1396   _jit = jit_new_state();
1397
1398   label = jit_label();
1399         jit_prolog   ();
1400   in =  jit_arg      ();
1401         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1402  zero = jit_beqi     (JIT_R0, 0);
1403         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1404         jit_movi     (JIT_R0, 1);
1405   ref = jit_blei     (JIT_V0, 2);
1406         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1407         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1408         jit_prepare();
1409           jit_pushargr(JIT_V1);
1410         call = jit_finishi(NULL);
1411         jit_patch_at(call, label);
1412         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1413         jit_prepare();
1414           jit_pushargr(JIT_V2);
1415         call = jit_finishi(NULL);
1416         jit_patch_at(call, label);
1417         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1418         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1419
1420   jit_patch(ref);                               @rem{/* patch jump */}
1421   jit_patch(zero);                              @rem{/* patch jump */}
1422         jit_retr(JIT_R0);
1423
1424   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1425   fib = jit_emit();
1426   jit_clear_state();
1427   printf("fib(%d) = %d\n", 32, fib(32));
1428   jit_destroy_state();
1429   finish_jit();
1430   return 0;
1431 @}
1432 @end example
1433
1434 As said above, this is the first example of dynamically compiling
1435 branches.  Branch instructions have two operands containing the
1436 values to be compared, and return a @code{jit_note_t *} object
1437 to be patched.
1438
1439 Because labels final address are only known after calling @code{emit},
1440 it is required to call @code{patch} or @code{patch_at}, what does
1441 tell @lightning{} that the target to patch is actually a pointer to
1442 a @code{jit_node_t *} object, otherwise, it would assume that is
1443 a pointer to a C function. Note that conditional branches do not
1444 receive a label argument, so they must be patched.
1445
1446 You need to call @code{patch_at} on the return of value @code{calli},
1447 @code{finishi}, and @code{calli} if it is actually referencing a label
1448 in the jit code. All branch instructions do not receive a label
1449 argument. Note that @code{movi} is an special case, and patching it
1450 is usually done to get the final address of a label, usually to later
1451 call @code{jmpr}.
1452
1453 Now, here is the iterative version:
1454
1455 @example
1456 #include <stdio.h>
1457 #include <lightning.h>
1458
1459 static jit_state_t *_jit;
1460
1461 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1462
1463 int main(int argc, char *argv[])
1464 @{
1465   pifi       fib;
1466   jit_node_t *in;               @rem{/* offset of the argument */}
1467   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1468   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1469   jit_node_t *jump;             @rem{/* jump to start of loop */}
1470   jit_node_t *loop;             @rem{/* start of the loop */}
1471
1472   init_jit(argv[0]);
1473   _jit = jit_new_state();
1474
1475         jit_prolog   ();
1476   in =  jit_arg      ();
1477         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1478  zero = jit_beqi     (JIT_R0, 0);
1479         jit_movr     (JIT_R1, JIT_R0);
1480         jit_movi     (JIT_R0, 1);
1481   ref = jit_blti     (JIT_R1, 2);
1482         jit_subi     (JIT_R2, JIT_R2, 2);
1483         jit_movr     (JIT_R1, JIT_R0);
1484
1485   loop= jit_label();
1486         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1487         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1488         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1489         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1490   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1491   jit_patch_at(jump, loop);
1492
1493   jit_patch(ref);                               @rem{/* patch forward jump */}
1494   jit_patch(zero);                              @rem{/* patch forward jump */}
1495         jit_retr     (JIT_R0);
1496
1497   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1498   fib = jit_emit();
1499   jit_clear_state();
1500   printf("fib(%d) = %d\n", 36, fib(36));
1501   jit_destroy_state();
1502   finish_jit();
1503   return 0;
1504 @}
1505 @end example
1506
1507 This code calculates the recurrence relation using iteration (a
1508 @code{for} loop in high-level languages).  There are no function
1509 calls anymore: instead, there is a backward jump (the @code{bnei} at
1510 the end of the loop).
1511
1512 Note that the program must remember the address for backward jumps;
1513 for forward jumps it is only required to remember the jump code,
1514 and call @code{patch} for the implicit label.
1515
1516 @node Reentrancy
1517 @chapter Re-entrant usage of @lightning{}
1518
1519 @lightning{} uses the special @code{_jit} identifier. To be able
1520 to be able to use multiple jit generation states at the same
1521 time, it is required to used code similar to:
1522
1523 @example
1524     struct jit_state lightning;
1525     #define lightning _jit
1526 @end example
1527
1528 This will cause the symbol defined to @code{_jit} to be passed as
1529 the first argument to the underlying @lightning{} implementation,
1530 that is usually a function with an @code{_} (underscode) prefix
1531 and with an argument named @code{_jit}, in the pattern:
1532
1533 @example
1534     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1535     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1536 @end example
1537
1538 The reason for this is to use the same syntax as the initial lightning
1539 implementation and to avoid needing the user to keep adding an extra
1540 argument to every call, as multiple jit states generating code in
1541 paralell should be very uncommon.
1542
1543 @node Registers
1544 @chapter Accessing the whole register file
1545
1546 As mentioned earlier in this chapter, all @lightning{} back-ends are
1547 guaranteed to have at least six general-purpose integer registers and
1548 six floating-point registers, but many back-ends will have more.
1549
1550 To access the entire register files, you can use the
1551 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1552 accept a parameter that identifies the register number, which
1553 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1554 and @code{JIT_F_NUM} respectively; the number need not be
1555 constant.  Of course, expressions like @code{JIT_R0} and
1556 @code{JIT_R(0)} denote the same register, and likewise for
1557 integer callee-saved, or floating-point, registers.
1558
1559 @section Scratch registers
1560
1561 For operations, @lightning{} does not support directly, like storing
1562 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1563 acquire and release a scratch register as in the following pattern:
1564
1565 @example
1566     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1567     jit_movi (reg, immediate);
1568     jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1569     jit_unget_reg (reg);
1570 @end example
1571
1572 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1573 reloads but don't follow branches, the code between both must be in
1574 the same basic block and must not contain any branches as in the
1575 following (bad) example.
1576
1577 @example
1578     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1579     jit_ldxi (reg, JIT_V0, offset);
1580     jump = jit_bnei (reg, V0);
1581     jit_movr (JIT_V1, reg);
1582     jit_patch (jump);
1583     jit_unget_reg (reg);
1584 @end example
1585
1586 @node Customizations
1587 @chapter Customizations
1588
1589 Frequently it is desirable to have more control over how code is
1590 generated or how memory is used during jit generation or execution.
1591
1592 @section Memory functions
1593 To aid in complete control of memory allocation and deallocation
1594 @lightning{} provides wrappers that default to standard @code{malloc},
1595 @code{realloc} and @code{free}. These are loosely based on the
1596 GNU GMP counterparts, with the difference that they use the same
1597 prototype of the system allocation functions, that is, no @code{size}
1598 for @code{free} or @code{old_size} for @code{realloc}.
1599
1600 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1601 @lightning{} guarantees that memory is only allocated or released
1602 using these wrapped functions, but you must note that if lightning
1603 was linked to GNU binutils, malloc is probably will be called multiple
1604 times from there when initializing the disassembler.
1605
1606 Because @code{init_jit} may call memory functions, if you need to call
1607 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1608 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1609 previous or default wrappers will be passed.
1610 @end deftypefun
1611
1612 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1613 Get the current memory allocation function. Also, unlike the GNU GMP
1614 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1615 @end deftypefun
1616
1617 @section Alternate code buffer
1618 To instruct @lightning{} to use an alternate code buffer it is required
1619 to call @code{jit_realize} before @code{jit_emit}, and then query states
1620 and customize as appropriate.
1621
1622 @deftypefun void jit_realize ()
1623 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1624 that no other @code{jit_xyz} call will be made.
1625 @end deftypefun
1626
1627 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1628 Returns NULL or the previous value set with @code{jit_set_code}, and
1629 sets the @var{code_size} argument to an appropriate value.
1630 If @code{jit_get_code} is called before @code{jit_emit}, the
1631 @var{code_size} argument is set to the expected amount of bytes
1632 required to generate code.
1633 If @code{jit_get_code} is called after @code{jit_emit}, the
1634 @var{code_size} argument is set to the exact amount of bytes used
1635 by the code.
1636 @end deftypefun
1637
1638 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1639 Instructs @lightning{} to output to the @var{code} argument and
1640 use @var{size} as a guard to not write to invalid memory. If during
1641 @code{jit_emit} @lightning{} finds out that the code would not fit
1642 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1643 @end deftypefun
1644
1645 A simple example of a loop using an alternate buffer is:
1646
1647 @example
1648   jit_uint8_t   *code;
1649   int           *(func)(int);      @rem{/* function pointer */}
1650   jit_word_t     code_size;
1651   jit_word_t     real_code_size;
1652   @rem{...}
1653   jit_realize();                   @rem{/* ready to generate code */}
1654   jit_get_code(&code_size);        @rem{/* get expected code size */}
1655   code_size = (code_size + 4095) & -4096;
1656   do (;;) @{
1657     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1658                 MAP_PRIVATE | MAP_ANON, -1, 0);
1659     jit_set_code(code, code_size);
1660     if ((func = jit_emit()) == NULL) @{
1661       munmap(code, code_size);
1662       code_size += 4096;
1663     @}
1664   @} while (func == NULL);
1665   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1666 @end example
1667
1668 The first call to @code{jit_get_code} should return @code{NULL} and set
1669 the @code{code_size} argument to the expected amount of bytes required
1670 to emit code.
1671 The second call to @code{jit_get_code} is after a successful call to
1672 @code{jit_emit}, and will return the value previously set with
1673 @code{jit_set_code} and set the @code{real_code_size} argument to the
1674 exact amount of bytes used to emit the code.
1675
1676 @section Alternate data buffer
1677 Sometimes it may be desirable to customize how, or to prevent
1678 @lightning{} from using an extra buffer for constants or debug
1679 annotation. Usually when also using an alternate code buffer.
1680
1681 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1682 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1683 and sets the @var{data_size} argument to how many bytes are required
1684 for the constants data buffer, and @var{note_size} to how many bytes
1685 are required to store the debug note information.
1686 Note that it always preallocate one debug note entry even if
1687 @code{jit_name} or @code{jit_note} are never called, but will return
1688 zero in the @var{data_size} argument if no constant is required;
1689 constants are only used for the @code{float} and @code{double} operations
1690 that have an immediate argument, and not in all @lightning{} ports.
1691 @end deftypefun
1692
1693 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1694
1695 @var{data} can be NULL if disabling constants and annotations, otherwise,
1696 a valid pointer must be passed. An assertion is done that the data will
1697 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1698 with @code{-DNDEBUG}).
1699
1700 @var{size} tells the space in bytes available in @var{data}.
1701
1702 @var{flags} can be zero to tell to just use the alternate data buffer,
1703 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1704
1705 @table @t
1706 @item JIT_DISABLE_DATA
1707 @cindex JIT_DISABLE_DATA
1708 Instructs @lightning{} to not use a constant table, but to use an
1709 alternate method to synthesize those, usually with a larger code
1710 sequence using stack space to transfer the value from a GPR to a
1711 FPR register.
1712
1713 @item JIT_DISABLE_NOTE
1714 @cindex JIT_DISABLE_NOTE
1715 Instructs @lightning{} to not store file or function name, and
1716 line numbers in the constant buffer.
1717 @end table
1718 @end deftypefun
1719
1720 A simple example of a preventing usage of a data buffer is:
1721
1722 @example
1723   @rem{...}
1724   jit_realize();                        @rem{/* ready to generate code */}
1725   jit_get_data(NULL, NULL);
1726   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1727   @rem{...}
1728 @end example
1729
1730 Or to only use a data buffer, if required:
1731
1732 @example
1733   jit_uint8_t   *data;
1734   jit_word_t     data_size;
1735   @rem{...}
1736   jit_realize();                        @rem{/* ready to generate code */}
1737   jit_get_data(&data_size, NULL);
1738   if (data_size)
1739     data = malloc(data_size);
1740   else
1741     data = NULL;
1742   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1743   @rem{...}
1744   if (data)
1745     free(data);
1746   @rem{...}
1747 @end example
1748
1749 @node Acknowledgements
1750 @chapter Acknowledgements
1751
1752 As far as I know, the first general-purpose portable dynamic code
1753 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1754 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1755 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1756 directly inspired @lightning{}.
1757
1758 Thanks go to Ian Piumarta, who kindly accepted to release his own
1759 program @sc{ccg} under the GNU General Public License, thereby allowing
1760 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1761 @sc{ccg} provides a way of dynamically assemble programs written in the
1762 underlying architecture's assembly language.  So it is not portable,
1763 yet very interesting.
1764
1765 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1766 was first developed as a tool to be used in GNU Smalltalk's dynamic
1767 translator from bytecodes to native code.