deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Customizations::          Advanced code generation customizations
  34 * Acknowledgements::        Acknowledgements for GNU lightning
  35 @end menu
  36 @end ifnottex
  37
  38 @node Overview
  39 @chapter Introduction to @lightning{}
  40
  41 @iftex
  42 This document describes @value{TOPIC} the @lightning{} library for
  43 dynamic code generation.
  44 @end iftex
  45
  46 Dynamic code generation is the generation of machine code
  47 at runtime. It is typically used to strip a layer of interpretation
  48 by allowing compilation to occur at runtime.  One of the most
  49 well-known applications of dynamic code generation is perhaps that
  50 of interpreters that compile source code to an intermediate bytecode
  51 form, which is then recompiled to machine code at run-time: this
  52 approach effectively combines the portability of bytecode
  53 representations with the speed of machine code.  Another common
  54 application of dynamic code generation is in the field of hardware
  55 simulators and binary emulators, which can use the same techniques
  56 to translate simulated instructions to the instructions of the
  57 underlying machine.
  58
  59 Yet other applications come to mind: for example, windowing
  60 @dfn{bitblt} operations, matrix manipulations, and network packet
  61 filters.  Albeit very powerful and relatively well known within the
  62 compiler community, dynamic code generation techniques are rarely
  63 exploited to their full potential and, with the exception of the
  64 two applications described above, have remained curiosities because
  65 of their portability and functionality barriers: binary instructions
  66 are generated, so programs using dynamic code generation must be
  67 retargeted for each machine; in addition, coding a run-time code
  68 generator is a tedious and error-prone task more than a difficult one.
  69
  70 @lightning{} provides a portable, fast and easily retargetable dynamic
  71 code generation system.
  72
  73 To be portable, @lightning{} abstracts over current architectures'
  74 quirks and unorthogonalities.  The interface that it exposes to is that
  75 of a standardized RISC architecture loosely based on the SPARC and MIPS
  76 chips.  There are a few general-purpose registers (six, not including
  77 those used to receive and pass parameters between subroutines), and
  78 arithmetic operations involve three operands---either three registers
  79 or two registers and an arbitrarily sized immediate value.
  80
  81 On one hand, this architecture is general enough that it is possible to
  82 generate pretty efficient code even on CISC architectures such as the
  83 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  84 real architectures closely enough that, most of the time, the
  85 compiler's constant folding pass ends up generating code which
  86 assembles machine instructions without further tests.
  87
  88 @node Installation
  89 @chapter Configuring and installing @lightning{}
  90
  91 The first thing to do to use @lightning{} is to configure the
  92 program, picking the set of macros to be used on the host
  93 architecture; this configuration is automatically performed by
  94 the @file{configure} shell script; to run it, merely type:
  95 @example
  96      ./configure
  97 @end example
  98
  99 @lightning{} supports the @code{--enable-disassembler} option, that
 100 enables linking to GNU binutils and optionally print human readable
 101 disassembly of the jit code. This option can be disabled by the
 102 @code{--disable-disassembler} option.
 103
 104 Another option that @file{configure} accepts is
 105 @code{--enable-assertions}, which enables several consistency checks in
 106 the run-time assemblers.  These are not usually needed, so you can
 107 decide to simply forget about it; also remember that these consistency
 108 checks tend to slow down your code generator.
 109
 110 After you've configured @lightning{}, run @file{make} as usual.
 111
 112 @lightning{} has an extensive set of tests to validate it is working
 113 correctly in the build host. To test it run:
 114 @example
 115     make check
 116 @end example
 117
 118 The next important step is:
 119 @example
 120     make install
 121 @end example
 122
 123 This ends the process of installing @lightning{}.
 124
 125 @node The instruction set
 126 @chapter @lightning{}'s instruction set
 127
 128 @lightning{}'s instruction set was designed by deriving instructions
 129 that closely match those of most existing RISC architectures, or
 130 that can be easily syntesized if absent.  Each instruction is composed
 131 of:
 132 @itemize @bullet
 133 @item
 134 an operation, like @code{sub} or @code{mul}
 135
 136 @item
 137 most times, a register/immediate flag (@code{r} or @code{i})
 138
 139 @item
 140 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 141 @end itemize
 142
 143 Examples of legal mnemonics are @code{addr} (integer add, with three
 144 register operands) and @code{muli} (integer multiply, with two
 145 register operands and an immediate operand).  Each instruction takes
 146 two or three operands; in most cases, one of them can be an immediate
 147 value instead of a register.
 148
 149 Most @lightning{} integer operations are signed wordsize operations,
 150 with the exception of operations that convert types, or load or store
 151 values to/from memory. When applicable, the types and C types are as
 152 follow:
 153
 154 @example
 155      _c         @r{signed char}
 156      _uc        @r{unsigned char}
 157      _s         @r{short}
 158      _us        @r{unsigned short}
 159      _i         @r{int}
 160      _ui        @r{unsigned int}
 161      _l         @r{long}
 162      _f         @r{float}
 163      _d         @r{double}
 164 @end example
 165
 166 Most integer operations do not need a type modifier, and when loading or
 167 storing values to memory there is an alias to the proper operation
 168 using wordsize operands, that is, if ommited, the type is @r{int} on
 169 32-bit architectures and @r{long} on 64-bit architectures.  Note
 170 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 171
 172 When an unsigned operation result differs from the equivalent signed
 173 operation, there is a the @code{_u} modifier.
 174
 175 There are at least seven integer registers, of which six are
 176 general-purpose, while the last is used to contain the frame pointer
 177 (@code{FP}).  The frame pointer can be used to allocate and access local
 178 variables on the stack, using the @code{allocai} or @code{allocar}
 179 instruction.
 180
 181 Of the general-purpose registers, at least three are guaranteed to be
 182 preserved across function calls (@code{V0}, @code{V1} and
 183 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 184 @code{R2}).  Six registers are not very much, but this
 185 restriction was forced by the need to target CISC architectures
 186 which, like the x86, are poor of registers; anyway, backends can
 187 specify the actual number of available registers with the calls
 188 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 189 (for callee-save registers).
 190
 191 There are at least six floating-point registers, named @code{F0} to
 192 @code{F5}.  These are usually caller-save and are separate from the integer
 193 registers on the supported architectures; on Intel architectures,
 194 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 195 the register stack is mapped to a flat register file.  As for the
 196 integer registers, the macro @code{JIT_F_NUM} yields the number of
 197 floating-point registers.
 198
 199 The complete instruction set follows; as you can see, most non-memory
 200 operations only take integers (either signed or unsigned) as operands;
 201 this was done in order to reduce the instruction set, and because most
 202 architectures only provide word and long word operations on registers.
 203 There are instructions that allow operands to be extended to fit a larger
 204 data type, both in a signed and in an unsigned way.
 205
 206 @table @b
 207 @item Binary ALU operations
 208 These accept three operands; the last one can be an immediate.
 209 @code{addx} operations must directly follow @code{addc}, and
 210 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 211 Most, if not all, architectures do not support @r{float} or @r{double}
 212 immediate operands; lightning emulates those operations by moving the
 213 immediate to a temporary register and emiting the call with only
 214 register operands.
 215 @example
 216 addr         _f  _d  O1 = O2 + O3
 217 addi         _f  _d  O1 = O2 + O3
 218 addxr                O1 = O2 + (O3 + carry)
 219 addxi                O1 = O2 + (O3 + carry)
 220 addcr                O1 = O2 + O3, set carry
 221 addci                O1 = O2 + O3, set carry
 222 subr         _f  _d  O1 = O2 - O3
 223 subi         _f  _d  O1 = O2 - O3
 224 subxr                O1 = O2 - (O3 + carry)
 225 subxi                O1 = O2 - (O3 + carry)
 226 subcr                O1 = O2 - O3, set carry
 227 subci                O1 = O2 - O3, set carry
 228 rsbr         _f  _d  O1 = O3 - O1
 229 rsbi         _f  _d  O1 = O3 - O1
 230 mulr         _f  _d  O1 = O2 * O3
 231 muli         _f  _d  O1 = O2 * O3
 232 divr     _u  _f  _d  O1 = O2 / O3
 233 divi     _u  _f  _d  O1 = O2 / O3
 234 remr     _u          O1 = O2 % O3
 235 remi     _u          O1 = O2 % O3
 236 andr                 O1 = O2 & O3
 237 andi                 O1 = O2 & O3
 238 orr                  O1 = O2 | O3
 239 ori                  O1 = O2 | O3
 240 xorr                 O1 = O2 ^ O3
 241 xori                 O1 = O2 ^ O3
 242 lshr                 O1 = O2 << O3
 243 lshi                 O1 = O2 << O3
 244 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 245 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 246 @end example
 247
 248 @item Four operand binary ALU operations
 249 These accept two result registers, and two operands; the last one can
 250 be an immediate. The first two arguments cannot be the same register.
 251
 252 @code{qmul} stores the low word of the result in @code{O1} and the
 253 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 254 means there was no overflow. For signed multiplication, no overflow
 255 check is based on sign, and can be detected if @code{O2} is zero or
 256 minus one.
 257
 258 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 259 @code{O2}. It can be used as quick way to check if a division is
 260 exact, in which case the remainder is zero.
 261
 262 @example
 263 qmulr    _u       O1 O2 = O3 * O4
 264 qmuli    _u       O1 O2 = O3 * O4
 265 qdivr    _u       O1 O2 = O3 / O4
 266 qdivi    _u       O1 O2 = O3 / O4
 267 @end example
 268
 269 @item Unary ALU operations
 270 These accept two operands, both of which must be registers.
 271 @example
 272 negr         _f  _d  O1 = -O2
 273 comr                 O1 = ~O2
 274 @end example
 275
 276 These unary ALU operations are only defined for float operands.
 277 @example
 278 absr         _f  _d  O1 = fabs(O2)
 279 sqrtr                O1 = sqrt(O2)
 280 @end example
 281
 282 Besides requiring the @code{r} modifier, there are no unary operations
 283 with an immediate operand.
 284
 285 @item Compare instructions
 286 These accept three operands; again, the last can be an immediate.
 287 The last two operands are compared, and the first operand, that must be
 288 an integer register, is set to either 0 or 1, according to whether the
 289 given condition was met or not.
 290
 291 The conditions given below are for the standard behavior of C,
 292 where the ``unordered'' comparison result is mapped to false.
 293
 294 @example
 295 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 296 lti       _u  _f  _d  O1 =  (O2 <  O3)
 297 ler       _u  _f  _d  O1 =  (O2 <= O3)
 298 lei       _u  _f  _d  O1 =  (O2 <= O3)
 299 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 300 gti       _u  _f  _d  O1 =  (O2 >  O3)
 301 ger       _u  _f  _d  O1 =  (O2 >= O3)
 302 gei       _u  _f  _d  O1 =  (O2 >= O3)
 303 eqr           _f  _d  O1 =  (O2 == O3)
 304 eqi           _f  _d  O1 =  (O2 == O3)
 305 ner           _f  _d  O1 =  (O2 != O3)
 306 nei           _f  _d  O1 =  (O2 != O3)
 307 unltr         _f  _d  O1 = !(O2 >= O3)
 308 unler         _f  _d  O1 = !(O2 >  O3)
 309 ungtr         _f  _d  O1 = !(O2 <= O3)
 310 unger         _f  _d  O1 = !(O2 <  O3)
 311 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 312 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 313 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 314 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 315 @end example
 316
 317 @item Transfer operations
 318 These accept two operands; for @code{ext} both of them must be
 319 registers, while @code{mov} accepts an immediate value as the second
 320 operand.
 321
 322 Unlike @code{movr} and @code{movi}, the other instructions are used
 323 to truncate a wordsize operand to a smaller integer data type or to
 324 convert float data types. You can also use @code{extr} to convert an
 325 integer to a floating point value: the usual options are @code{extr_f}
 326 and @code{extr_d}.
 327
 328 @example
 329 movr                                 _f  _d  O1 = O2
 330 movi                                 _f  _d  O1 = O2
 331 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 332 truncr                               _f  _d  O1 = trunc(O2)
 333 @end example
 334
 335 In 64-bit architectures it may be required to use @code{truncr_f_i},
 336 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 337 the equivalent C code.  Only the @code{_i} modifier is available in
 338 32-bit architectures.
 339
 340 @example
 341 truncr_f_i    = <int> O1 = <float> O2
 342 truncr_f_l    = <long>O1 = <float> O2
 343 truncr_d_i    = <int> O1 = <double>O2
 344 truncr_d_l    = <long>O1 = <double>O2
 345 @end example
 346
 347 The float conversion operations are @emph{destination first,
 348 source second}, but the order of the types is reversed.  This happens
 349 for historical reasons.
 350
 351 @example
 352 extr_f_d    = <double>O1 = <float> O2
 353 extr_d_f    = <float> O1 = <double>O2
 354 @end example
 355
 356 @item Network extensions
 357 These accept two operands, both of which must be registers; these
 358 two instructions actually perform the same task, yet they are
 359 assigned to two mnemonics for the sake of convenience and
 360 completeness.  As usual, the first operand is the destination and
 361 the second is the source.
 362 The @code{_ul} variant is only available in 64-bit architectures.
 363 @example
 364 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 365 ntohr    _us _ui _ul @r{Network-to-host order }
 366 @end example
 367
 368 @item Load operations
 369 @code{ld} accepts two operands while @code{ldx} accepts three;
 370 in both cases, the last can be either a register or an immediate
 371 value. Values are extended (with or without sign, according to
 372 the data type specification) to fit a whole register.
 373 The @code{_ui} and @code{_l} types are only available in 64-bit
 374 architectures.  For convenience, there is a version without a
 375 type modifier for integer or pointer operands that uses the
 376 appropriate wordsize call.
 377 @example
 378 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 379 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 380 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 381 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 382 @end example
 383
 384 @item Store operations
 385 @code{st} accepts two operands while @code{stx} accepts three; in
 386 both cases, the first can be either a register or an immediate
 387 value. Values are sign-extended to fit a whole register.
 388 @example
 389 str     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 390 sti     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *O1 = O2
 391 stxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 392 stxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  *(O1+O2) = O3
 393 @end example
 394 As for the load operations, the @code{_ui} and @code{_l} types are
 395 only available in 64-bit architectures, and for convenience, there
 396 is a version without a type modifier for integer or pointer operands
 397 that uses the appropriate wordsize call.
 398
 399 @item Argument management
 400 These are:
 401 @example
 402 prepare     (not specified)
 403 va_start    (not specified)
 404 pushargr                                   _f  _d
 405 pushargi                                   _f  _d
 406 va_push     (not specified)
 407 arg                                        _f  _d
 408 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 409 va_arg                                         _d
 410 putargr                                    _f  _d
 411 putargi                                    _f  _d
 412 ret         (not specified)
 413 retr                                       _f  _d
 414 reti                                       _f  _d
 415 va_end      (not specified)
 416 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 417 epilog      (not specified)
 418 @end example
 419 As with other operations that use a type modifier, the @code{_ui} and
 420 @code{_l} types are only available in 64-bit architectures, but there
 421 are operations without a type modifier that alias to the appropriate
 422 integer operation with wordsize operands.
 423
 424 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 425 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 426 A code snippet that wants to call another procedure and has to pass
 427 arguments must, in order: use the @code{prepare} instruction and use
 428 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 429 left to right order}; and use @code{finish} or @code{call} (explained below)
 430 to perform the actual call.
 431
 432 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 433 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 434 @code{va_push} is required when passing a @code{va_list} to another function,
 435 because not all architectures expect it as a single pointer. Known case
 436 is DEC Alpha, that requires it as a structure passed by value.
 437
 438 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 439 @code{arg} is different from other instruction in that it does not
 440 actually generate any code: instead, it is a function which returns
 441 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 442 a value'' means that @lightning{} code that compile these
 443 instructions return a value when expanded.} You should call
 444 @code{arg} as soon as possible, before any function call or, more
 445 easily, right after the @code{prolog} instructions
 446 (which is treated later).
 447
 448 @code{getarg} accepts a register argument and a value returned by
 449 @code{arg}, and will move that argument to the register, extending
 450 it (with or without sign, according to the data type specification)
 451 to fit a whole register.  These instructions are more intimately
 452 related to the usage of the @lightning{} instruction set in code
 453 that generates other code, so they will be treated more
 454 specifically in @ref{GNU lightning examples, , Generating code at
 455 run-time}.
 456
 457 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 458 it accepts as first argument a register or immediate, and as
 459 second argument a value returned by @code{arg}. It allows changing,
 460 or restoring an argument to the current function, and is a
 461 construct required to implement tail call optimization. Note that
 462 arguments in registers are very cheap, but will be overwritten
 463 at any moment, including on some operations, for example division,
 464 that on several ports is implemented as a function call.
 465
 466 Finally, the @code{retval} instruction fetches the return value of a
 467 called function in a register.  The @code{retval} instruction takes a
 468 register argument and copies the return value of the previously called
 469 function in that register.  A function with a return value should use
 470 @code{retr} or @code{reti} to put the return value in the return register
 471 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 472
 473 @code{epilog} is an optional call, that marks the end of a function
 474 body. It is automatically generated by @lightning{} if starting a new
 475 function (what should be done after a @code{ret} call) or finishing
 476 generating jit.
 477 It is very important to note that the fact that @code{epilog} being
 478 optional may cause a common mistake. Consider this:
 479 @example
 480 fun1:
 481     prolog
 482     ...
 483     ret
 484 fun2:
 485     prolog
 486 @end example
 487 Because @code{epilog} is added when finding a new @code{prolog},
 488 this will cause the @code{fun2} label to actually be before the
 489 return from @code{fun1}. Because @lightning{} will actually
 490 understand it as:
 491 @example
 492 fun1:
 493     prolog
 494     ...
 495     ret
 496 fun2:
 497     epilog
 498     prolog
 499 @end example
 500
 501 You should observe a few rules when using these macros.  First of
 502 all, if calling a varargs function, you should use the @code{ellipsis}
 503 call to mark the position of the ellipsis in the C prototype.
 504
 505 You should not nest calls to @code{prepare} inside a
 506 @code{prepare/finish} block.  Doing this will result in undefined
 507 behavior. Note that for functions with zero arguments you can use
 508 just @code{call}.
 509
 510 @item Branch instructions
 511 Like @code{arg}, these also return a value which, in this case,
 512 is to be used to compile forward branches as explained in
 513 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 514 compared; of these, the last can be either a register or an immediate.
 515 They are:
 516 @example
 517 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 518 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 519 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 520 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 521 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 522 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 523 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 524 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 525 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 526 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 527 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 528 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 529
 530 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 531 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 532 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 533 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 534 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 535 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 536 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 537 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 538
 539 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 540 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 541 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 542 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 543 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 544 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 545 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 546 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 547 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 548 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 549 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 550 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 551 @end example
 552
 553 @item Jump and return operations
 554 These accept one argument except @code{ret} and @code{jmpi} which
 555 have none; the difference between @code{finishi} and @code{calli}
 556 is that the latter does not clean the stack from pushed parameters
 557 (if any) and the former must @strong{always} follow a @code{prepare}
 558 instruction.
 559 @example
 560 callr     (not specified)                @r{function call to register O1}
 561 calli     (not specified)                @r{function call to immediate O1}
 562 finishr   (not specified)                @r{function call to register O1}
 563 finishi   (not specified)                @r{function call to immediate O1}
 564 jmpr      (not specified)                @r{unconditional jump to register}
 565 jmpi      (not specified)                @r{unconditional jump}
 566 ret       (not specified)                @r{return from subroutine}
 567 retr      _c _uc _s _us _i _ui _l _f _d
 568 reti      _c _uc _s _us _i _ui _l _f _d
 569 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 570                                          @r{to register}
 571 @end example
 572
 573 Like branch instruction, @code{jmpi} also returns a value which is to
 574 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 575 numbers}.
 576
 577 @item Labels
 578 There are 3 @lightning{} instructions to create labels:
 579 @example
 580 label     (not specified)                @r{simple label}
 581 forward   (not specified)                @r{forward label}
 582 indirect  (not specified)                @r{special simple label}
 583 @end example
 584
 585 @code{label} is normally used as @code{patch_at} argument for backward
 586 jumps.
 587
 588 @example
 589         jit_node_t *jump, *label;
 590 label = jit_label();
 591         ...
 592         jump = jit_beqr(JIT_R0, JIT_R1);
 593         jit_patch_at(jump, label);
 594 @end example
 595
 596 @code{forward} is used to patch code generation before the actual
 597 position of the label is known.
 598
 599 @example
 600         jit_node_t *jump, *label;
 601 label = jit_forward();
 602         jump = jit_beqr(JIT_R0, JIT_R1);
 603         jit_patch_at(jump, label);
 604         ...
 605         jit_link(label);
 606 @end example
 607
 608 @code{indirect} is useful when creating jump tables, and tells
 609 @lightning{} to not optimize out a label that is not the target of
 610 any jump, because an indirect jump may land where it is defined.
 611
 612 @example
 613         jit_node_t *jump, *label;
 614         ...
 615         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 616         ...
 617 label = jit_indirect();
 618 @end example
 619
 620 @code{indirect} is an special case of @code{note} and @code{name}
 621 because it is a valid argument to @code{address}.
 622
 623 Note that the usual idiom to write the previous example is
 624 @example
 625         jit_node_t *addr, *jump;
 626 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 627         ...
 628         jmpr(JIT_R0);
 629         ...
 630         jit_patch(addr);                 @rem{/* implicit label added */}
 631 @end example
 632
 633 that automatically binds the implicit label added by @code{patch} with
 634 the @code{movi}, but on some special conditions it is required to create
 635 an "unbound" label.
 636
 637 @item Function prolog
 638
 639 These macros are used to set up a function prolog.  The @code{allocai}
 640 call accept a single integer argument and returns an offset value
 641 for stack storage access.  The @code{allocar} accepts two registers
 642 arguments, the first is set to the offset for stack access, and the
 643 second is the size in bytes argument.
 644
 645 @example
 646 prolog    (not specified)                @r{function prolog}
 647 allocai   (not specified)                @r{reserve space on the stack}
 648 allocar   (not specified)                @r{allocate space on the stack}
 649 @end example
 650
 651 @code{allocai} receives the number of bytes to allocate and returns
 652 the offset from the frame pointer register @code{FP} to the base of
 653 the area.
 654
 655 @code{allocar} receives two register arguments.  The first is where
 656 to store the offset from the frame pointer register @code{FP} to the
 657 base of the area.  The second argument is the size in bytes.  Note
 658 that @code{allocar} is dynamic allocation, and special attention
 659 should be taken when using it.  If called in a loop, every iteration
 660 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 661 depending on backend requirements, even if allocating only one byte.
 662 It is advisable to not use it with @code{frame} and @code{tramp}; it
 663 should work with @code{frame} with special care to call only once,
 664 but is not supported if used in @code{tramp}, even if called only
 665 once.
 666
 667 As a small appetizer, here is a small function that adds 1 to the input
 668 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 669 is a bit different from the one used when writing real subroutines with
 670 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 671 examples, , Generating code at run-time}.
 672
 673 @example
 674 incr:
 675      prolog
 676 in = arg                     @rem{! We have an integer argument}
 677      getarg    R0, in        @rem{! Move it to R0}
 678      addi      R0, R0, 1     @rem{! Add 1}
 679      retr      R0            @rem{! And return the result}
 680 @end example
 681
 682 And here is another function which uses the @code{printf} function from
 683 the standard C library to write a number in hexadecimal notation:
 684
 685 @example
 686 printhex:
 687      prolog
 688 in = arg                     @rem{! Same as above}
 689      getarg    R0, in
 690      prepare                 @rem{! Begin call sequence for printf}
 691      pushargi  "%x"          @rem{! Push format string}
 692      ellipsis                @rem{! Varargs start here}
 693      pushargr  R0            @rem{! Push second argument}
 694      finishi   printf        @rem{! Call printf}
 695      ret                     @rem{! Return to caller}
 696 @end example
 697
 698 @item Trampolines, continuations and tail call optimization
 699
 700 Frequently it is required to generate jit code that must jump to
 701 code generated later, possibly from another @code{jit_context_t}.
 702 These require compatible stack frames.
 703
 704 @lightning{} provides two primitives from where trampolines,
 705 continuations and tail call optimization can be implemented.
 706
 707 @example
 708 frame   (not specified)                  @r{create stack frame}
 709 tramp   (not specified)                  @r{assume stack frame}
 710 @end example
 711
 712 @code{frame} receives an integer argument@footnote{It is not
 713 automatically computed because it does not know about the
 714 requirement of later generated code.} that defines the size in
 715 bytes for the stack frame of the current, @code{C} callable,
 716 jit function. To calculate this value, a good formula is maximum
 717 number of arguments to any called native function times
 718 eight@footnote{Times eight so that it works for double arguments.
 719 And would not need conditionals for ports that pass arguments in
 720 the stack.}, plus the sum of the arguments to any call to
 721 @code{jit_allocai}. @lightning{} automatically adjusts this value
 722 for any backend specific stack memory it may need, or any
 723 alignment constraint.
 724
 725 @code{frame} also instructs @lightning{} to save all callee
 726 save registers in the prolog and reload in the epilog.
 727
 728 @example
 729 main:                        @rem{! jit entry point}
 730      prolog                  @rem{! function prolog}
 731      frame  256              @rem{! save all callee save registers and}
 732                              @rem{! reserve at least 256 bytes in stack}
 733 main_loop:
 734      ...
 735      jmpi   handler          @rem{! jumps to external code}
 736      ...
 737      ret                     @rem{! return to the caller}
 738 @end example
 739
 740 @code{tramp} differs from @code{frame} only that a prolog and epilog
 741 will not be generated. Note that @code{prolog} must still be used.
 742 The code under @code{tramp} must be ready to be entered with a jump
 743 at the prolog position, and instead of a return, it must end with
 744 a non conditional jump. @code{tramp} exists solely for the fact
 745 that it allows optimizing out prolog and epilog code that would
 746 never be executed.
 747
 748 @example
 749 handler:                     @rem{! handler entry point}
 750      prolog                  @rem{! function prolog}
 751      tramp  256              @rem{! assumes all callee save registers}
 752                              @rem{! are saved and there is at least}
 753                              @rem{! 256 bytes in stack}
 754      ...
 755      jmpi   main_loop        @rem{! return to the main loop}
 756 @end example
 757
 758 @lightning{} only supports Tail Call Optimization using the
 759 @code{tramp} construct. Any other way is not guaranteed to
 760 work on all ports.
 761
 762 An example of a simple (recursive) tail call optimization:
 763
 764 @example
 765 factorial:                   @rem{! Entry point of the factorial function}
 766      prolog
 767 in = arg                     @rem{! Receive an integer argument}
 768      getarg R0, in           @rem{! Move argument to RO}
 769      prepare
 770          pushargi 1          @rem{! This is the accumulator}
 771          pushargr R0         @rem{! This is the argument}
 772      finishi fact            @rem{! Call the tail call optimized function}
 773      retval R0               @rem{! Fetch the result}
 774      retr R0                 @rem{! Return it}
 775      epilog                  @rem{! Epilog *before* label before prolog}
 776
 777 fact:                        @rem{! Entry point of the helper function}
 778      prolog
 779      frame 16                @rem{! Reserve 16 bytes in the stack}
 780 fact_entry:                  @rem{! This is the tail call entry point}
 781 ac = arg                     @rem{! The accumulator is the first argument}
 782 in = arg                     @rem{! The factorial argument}
 783      getarg R0, ac           @rem{! Move the accumulator to R0}
 784      getarg R1, in           @rem{! Move the argument to R1}
 785      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
 786      mulr R0, R0, R1         @rem{! accumulator *= argument}
 787      putargr R0, ac          @rem{! Update the accumulator}
 788      subi R1, R1, 1          @rem{! argument -= 1}
 789      putargr R1, in          @rem{! Update the argument}
 790      jmpi fact_entry         @rem{! Tail Call Optimize it!}
 791 fact_out:
 792      retr R0                 @rem{! Return the accumulator}
 793 @end example
 794
 795 @item Predicates
 796 @example
 797 forward_p      (not specified)           @r{forward label predicate}
 798 indirect_p     (not specified)           @r{indirect label predicate}
 799 target_p       (not specified)           @r{used label predicate}
 800 arg_register_p (not specified)           @r{argument kind predicate}
 801 callee_save_p  (not specified)           @r{callee save predicate}
 802 pointer_p      (not specified)           @r{pointer predicate}
 803 @end example
 804
 805 @code{forward_p} expects a @code{jit_node_t*} argument, and
 806 returns non zero if it is a forward label reference, that is,
 807 a label returned by @code{forward}, that still needs a
 808 @code{link} call.
 809
 810 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
 811 non zero if it is an indirect label reference, that is, a label that
 812 was returned by @code{indirect}.
 813
 814 @code{target_p} expects a @code{jit_node_t*} argument, that is any
 815 kind of label, and will return non zero if there is at least one
 816 jump or move referencing it.
 817
 818 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
 819 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
 820 will return non zero if the argument lives in a register. This call
 821 is useful to know the live range of register arguments, as those
 822 are very fast to read and write, but have volatile values.
 823
 824 @code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
 825 @code{JIT_Fn}, and will return non zero if the register is callee
 826 save. This call is useful because on several ports, the @code{JIT_Rn}
 827 and @code{JIT_Fn} registers are actually callee save; no need
 828 to save and load the values when making function calls.
 829
 830 @code{pointer_p} expects a pointer argument, and will return non
 831 zero if the pointer is inside the generated jit code. Must be
 832 called after @code{jit_emit} and before @code{jit_destroy_state}.
 833 @end table
 834
 835 @node GNU lightning examples
 836 @chapter Generating code at run-time
 837
 838 To use @lightning{}, you should include the @file{lightning.h} file that
 839 is put in your include directory by the @samp{make install} command.
 840
 841 Each of the instructions above translates to a macro or function call.
 842 All you have to do is prepend @code{jit_} (lowercase) to opcode names
 843 and @code{JIT_} (uppercase) to register names.  Of course, parameters
 844 are to be put between parentheses.
 845
 846 This small tutorial presents three examples:
 847
 848 @iftex
 849 @itemize @bullet
 850 @item
 851 The @code{incr} function found in @ref{The instruction set, ,
 852 @lightning{}'s instruction set}:
 853
 854 @item
 855 A simple function call to @code{printf}
 856
 857 @item
 858 An RPN calculator.
 859
 860 @item
 861 Fibonacci numbers
 862 @end itemize
 863 @end iftex
 864 @ifnottex
 865 @menu
 866 * incr::             A function which increments a number by one
 867 * printf::           A simple function call to printf
 868 * RPN calculator::   A more complex example, an RPN calculator
 869 * Fibonacci::        Calculating Fibonacci numbers
 870 @end menu
 871 @end ifnottex
 872
 873 @node incr
 874 @section A function which increments a number by one
 875
 876 Let's see how to create and use the sample @code{incr} function created
 877 in @ref{The instruction set, , @lightning{}'s instruction set}:
 878
 879 @example
 880 #include <stdio.h>
 881 #include <lightning.h>
 882
 883 static jit_state_t *_jit;
 884
 885 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
 886
 887 int main(int argc, char *argv[])
 888 @{
 889   jit_node_t  *in;
 890   pifi         incr;
 891
 892   init_jit(argv[0]);
 893   _jit = jit_new_state();
 894
 895   jit_prolog();                    @rem{/* @t{     prolog             } */}
 896   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
 897   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
 898   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
 899   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
 900
 901   incr = jit_emit();
 902   jit_clear_state();
 903
 904   @rem{/* call the generated code@comma{} passing 5 as an argument */}
 905   printf("%d + 1 = %d\n", 5, incr(5));
 906
 907   jit_destroy_state();
 908   finish_jit();
 909   return 0;
 910 @}
 911 @end example
 912
 913 Let's examine the code line by line (well, almost@dots{}):
 914
 915 @table @t
 916 @item #include <lightning.h>
 917 You already know about this.  It defines all of @lightning{}'s macros.
 918
 919 @item static jit_state_t *_jit;
 920 You might wonder about what is @code{jit_state_t}.  It is a structure
 921 that stores jit code generation information.  The name @code{_jit} is
 922 special, because since multiple jit generators can run at the same
 923 time, you must either @r{#define _jit my_jit_state} or name it
 924 @code{_jit}.
 925
 926 @item typedef int (*pifi)(int);
 927 Just a handy typedef for a pointer to a function that takes an
 928 @code{int} and returns another.
 929
 930 @item jit_node_t  *in;
 931 Declares a variable to hold an identifier for a function argument. It
 932 is an opaque pointer, that will hold the return of a call to @code{arg}
 933 and be used as argument to @code{getarg}.
 934
 935 @item pifi         incr;
 936 Declares a function pointer variable to a function that receives an
 937 @code{int} and returns an @code{int}.
 938
 939 @item init_jit(argv[0]);
 940 You must call this function before creating a @code{jit_state_t}
 941 object. This function does global state initialization, and may need
 942 to detect CPU or Operating System features.  It receives a string
 943 argument that is later used to read symbols from a shared object using
 944 GNU binutils if disassembly was enabled at configure time. If no
 945 disassembly will be performed a NULL pointer can be used as argument.
 946
 947 @item _jit = jit_new_state();
 948 This call initializes a @lightning{} jit state.
 949
 950 @item jit_prolog();
 951 Ok, so we start generating code for our beloved function@dots{}
 952
 953 @item in = jit_arg();
 954 @itemx jit_getarg(JIT_R0, in);
 955 We retrieve the first (and only) argument, an integer, and store it
 956 into the general-purpose register @code{R0}.
 957
 958 @item jit_addi(JIT_R0, JIT_R0, 1);
 959 We add one to the content of the register.
 960
 961 @item jit_retr(JIT_R0);
 962 This instruction generates a standard function epilog that returns
 963 the contents of the @code{R0} register.
 964
 965 @item incr = jit_emit();
 966 This instruction is very important.  It actually translates the
 967 @lightning{} macros used before to machine code, flushes the generated
 968 code area out of the processor's instruction cache and return a
 969 pointer to the start of the code.
 970
 971 @item jit_clear_state();
 972 This call cleanups any data not required for jit execution. Note
 973 that it must be called after any call to @code{jit_print} or
 974 @code{jit_address}, as this call destroy the @lightning{}
 975 intermediate representation.
 976
 977 @item printf("%d + 1 = %d", 5, incr(5));
 978 Calling our function is this simple---it is not distinguishable from
 979 a normal C function call, the only difference being that @code{incr}
 980 is a variable.
 981
 982 @item jit_destroy_state();
 983 Releases all memory associated with the jit context. It should be
 984 called after known the jit will no longer be called.
 985
 986 @item finish_jit();
 987 This call cleanups any global state hold by @lightning{}, and is
 988 advisable to call it once jit code will no longer be generated.
 989 @end table
 990
 991 @lightning{} abstracts two phases of dynamic code generation: selecting
 992 instructions that map the standard representation, and emitting binary
 993 code for these instructions.  The client program has the responsibility
 994 of describing the code to be generated using the standard @lightning{}
 995 instruction set.
 996
 997 Let's examine the code generated for @code{incr} on the SPARC and x86_64
 998 architecture (on the right is the code that an assembly-language
 999 programmer would write):
1000
1001 @table @b
1002 @item SPARC
1003 @example
1004       save  %sp, -112, %sp
1005       mov  %i0, %g2                 retl
1006       inc  %g2                      inc %o0
1007       mov  %g2, %i0
1008       restore
1009       retl
1010       nop
1011 @end example
1012 In this case, @lightning{} introduces overhead to create a register
1013 window (not knowing that the procedure is a leaf procedure) and to
1014 move the argument to the general purpose register @code{R0} (which
1015 maps to @code{%g2} on the SPARC).
1016 @end table
1017
1018 @table @b
1019 @item x86_64
1020 @example
1021     sub   $0x30,%rsp
1022     mov   %rbp,(%rsp)
1023     mov   %rsp,%rbp
1024     sub   $0x18,%rsp
1025     mov   %rdi,%rax            mov %rdi, %rax
1026     add   $0x1,%rax            inc %rax
1027     mov   %rbp,%rsp
1028     mov   (%rsp),%rbp
1029     add   $0x30,%rsp
1030     retq                       retq
1031 @end example
1032 In this case, the main overhead is due to the function's prolog and
1033 epilog, and stack alignment after reserving stack space for word
1034 to/from float conversions or moving data from/to x87 to/from SSE.
1035 Note that besides allocating space to save callee saved registers,
1036 no registers are saved/restored because @lightning{} notices those
1037 registers are not modified. There is currently no logic to detect
1038 if it needs to allocate stack space for type conversions neither
1039 proper leaf function detection, but these are subject to change
1040 (FIXME).
1041 @end table
1042
1043 @node printf
1044 @section A simple function call to @code{printf}
1045
1046 Again, here is the code for the example:
1047
1048 @example
1049 #include <stdio.h>
1050 #include <lightning.h>
1051
1052 static jit_state_t *_jit;
1053
1054 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1055
1056 int main(int argc, char *argv[])
1057 @{
1058   pvfi          myFunction;             @rem{/* ptr to generated code */}
1059   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1060   jit_node_t    *in;                    @rem{/* to get the argument */}
1061
1062   init_jit(argv[0]);
1063   _jit = jit_new_state();
1064
1065   start = jit_note(__FILE__, __LINE__);
1066   jit_prolog();
1067   in = jit_arg();
1068   jit_getarg(JIT_R1, in);
1069   jit_prepare();
1070   jit_pushargi((jit_word_t)"generated %d bytes\n");
1071   jit_ellipsis();
1072   jit_pushargr(JIT_R1);
1073   jit_finishi(printf);
1074   jit_ret();
1075   jit_epilog();
1076   end = jit_note(__FILE__, __LINE__);
1077
1078   myFunction = jit_emit();
1079
1080   @rem{/* call the generated code@comma{} passing its size as argument */}
1081   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1082   jit_clear_state();
1083
1084   jit_disassemble();
1085
1086   jit_destroy_state();
1087   finish_jit();
1088   return 0;
1089 @}
1090 @end example
1091
1092 The function shows how many bytes were generated.  Most of the code
1093 is not very interesting, as it resembles very closely the program
1094 presented in @ref{incr, , A function which increments a number by one}.
1095
1096 For this reason, we're going to concentrate on just a few statements.
1097
1098 @table @t
1099 @item start = jit_note(__FILE__, __LINE__);
1100 @itemx @r{@dots{}}
1101 @itemx end = jit_note(__FILE__, __LINE__);
1102 These two instruction call the @code{jit_note} macro, which creates
1103 a note in the jit code; arguments to @code{jit_note} usually are a
1104 filename string and line number integer, but using NULL for the
1105 string argument is perfectly valid if only need to create a simple
1106 marker in the code.
1107
1108 @item jit_ellipsis();
1109 @code{ellipsis} usually is only required if calling varargs functions
1110 with double arguments, but it is a good practice to properly describe
1111 the @r{@dots{}} in the call sequence.
1112
1113 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1114 Note the use of the @code{(jit_word_t)} cast, that is used only
1115 to avoid a compiler warning, due to using a pointer where a
1116 wordsize integer type was expected.
1117
1118 @item jit_prepare();
1119 @itemx @r{@dots{}}
1120 @itemx jit_finishi(printf);
1121 Once the arguments to @code{printf} have been pushed, what means
1122 moving them to stack or register arguments, the @code{printf}
1123 function is called and the stack cleaned.  Note how @lightning{}
1124 abstracts the differences between different architectures and
1125 ABI's -- the client program does not know how parameter passing
1126 works on the host architecture.
1127
1128 @item jit_epilog();
1129 Usually it is not required to call @code{epilog}, but because it
1130 is implicitly called when noticing the end of a function, if the
1131 @code{end} variable was set with a @code{note} call after the
1132 @code{ret}, it would not consider the function epilog.
1133
1134 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1135 This calls the generate jit function passing as argument the offset
1136 difference from the @code{start} and @code{end} notes. The @code{address}
1137 call must be done after the @code{emit} call or either a fatal error
1138 will happen (if @lightning{} is built with assertions enable) or an
1139 undefined value will be returned.
1140
1141 @item jit_clear_state();
1142 Note that @code{jit_clear_state} was called after executing jit in
1143 this example. It was done because it must be called after any call
1144 to @code{jit_address} or @code{jit_print}.
1145
1146 @item jit_disassemble();
1147 @code{disassemble} will dump the generated code to standard output,
1148 unless @lightning{} was built with the disassembler disabled, in which
1149 case no output will be shown.
1150 @end table
1151
1152 @node RPN calculator
1153 @section A more complex example, an RPN calculator
1154
1155 We create a small stack-based RPN calculator which applies a series
1156 of operators to a given parameter and to other numeric operands.
1157 Unlike previous examples, the code generator is fully parameterized
1158 and is able to compile different formulas to different functions.
1159 Here is the code for the expression compiler; a sample usage will
1160 follow.
1161
1162 Since @lightning{} does not provide push/pop instruction, this
1163 example uses a stack-allocated area to store the data.  Such an
1164 area can be allocated using the macro @code{allocai}, which
1165 receives the number of bytes to allocate and returns the offset
1166 from the frame pointer register @code{FP} to the base of the
1167 area.
1168
1169 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1170 to access stack-allocated variables.  However, it is possible to
1171 use operations such as @code{add} to compute the address of the
1172 variables, and pass the address around.
1173
1174 @example
1175 #include <stdio.h>
1176 #include <lightning.h>
1177
1178 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1179
1180 static jit_state_t *_jit;
1181
1182 void stack_push(int reg, int *sp)
1183 @{
1184   jit_stxi_i (*sp, JIT_FP, reg);
1185   *sp += sizeof (int);
1186 @}
1187
1188 void stack_pop(int reg, int *sp)
1189 @{
1190   *sp -= sizeof (int);
1191   jit_ldxi_i (reg, JIT_FP, *sp);
1192 @}
1193
1194 jit_node_t *compile_rpn(char *expr)
1195 @{
1196   jit_node_t *in, *fn;
1197   int stack_base, stack_ptr;
1198
1199   fn = jit_note(NULL, 0);
1200   jit_prolog();
1201   in = jit_arg();
1202   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1203
1204   jit_getarg_i(JIT_R2, in);
1205
1206   while (*expr) @{
1207     char buf[32];
1208     int n;
1209     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1210       expr += n - 1;
1211       stack_push(JIT_R0, &stack_ptr);
1212       jit_movi(JIT_R0, atoi(buf));
1213     @} else if (*expr == 'x') @{
1214       stack_push(JIT_R0, &stack_ptr);
1215       jit_movr(JIT_R0, JIT_R2);
1216     @} else if (*expr == '+') @{
1217       stack_pop(JIT_R1, &stack_ptr);
1218       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1219     @} else if (*expr == '-') @{
1220       stack_pop(JIT_R1, &stack_ptr);
1221       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1222     @} else if (*expr == '*') @{
1223       stack_pop(JIT_R1, &stack_ptr);
1224       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1225     @} else if (*expr == '/') @{
1226       stack_pop(JIT_R1, &stack_ptr);
1227       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1228     @} else @{
1229       fprintf(stderr, "cannot compile: %s\n", expr);
1230       abort();
1231     @}
1232     ++expr;
1233   @}
1234   jit_retr(JIT_R0);
1235   jit_epilog();
1236   return fn;
1237 @}
1238 @end example
1239
1240 The principle on which the calculator is based is easy: the stack top
1241 is held in R0, while the remaining items of the stack are held in the
1242 memory area that we allocate with @code{allocai}.  Compiling a numeric
1243 operand or the argument @code{x} pushes the old stack top onto the
1244 stack and moves the operand into R0; compiling an operator pops the
1245 second operand off the stack into R1, and compiles the operation so
1246 that the result goes into R0, thus becoming the new stack top.
1247
1248 This example allocates a fixed area for 32 @code{int}s.  This is not
1249 a problem when the function is a leaf like in this case; in a full-blown
1250 compiler you will want to analyze the input and determine the number
1251 of needed stack slots---a very simple example of register allocation.
1252 The area is then managed like a stack using @code{stack_push} and
1253 @code{stack_pop}.
1254
1255 Source code for the client (which lies in the same source file) follows:
1256
1257 @example
1258 int main(int argc, char *argv[])
1259 @{
1260   jit_node_t *nc, *nf;
1261   pifi c2f, f2c;
1262   int i;
1263
1264   init_jit(argv[0]);
1265   _jit = jit_new_state();
1266
1267   nc = compile_rpn("32x9*5/+");
1268   nf = compile_rpn("x32-5*9/");
1269   (void)jit_emit();
1270   c2f = (pifi)jit_address(nc);
1271   f2c = (pifi)jit_address(nf);
1272   jit_clear_state();
1273
1274   printf("\nC:");
1275   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1276   printf("\nF:");
1277   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1278   printf("\n");
1279
1280   printf("\nF:");
1281   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1282   printf("\nC:");
1283   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1284   printf("\n");
1285
1286   jit_destroy_state();
1287   finish_jit();
1288   return 0;
1289 @}
1290 @end example
1291
1292 The client displays a conversion table between Celsius and Fahrenheit
1293 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1294 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1295 respectively.
1296
1297 Providing the formula as an argument to @code{compile_rpn} effectively
1298 parameterizes code generation, making it possible to use the same code
1299 to compile different functions; this is what makes dynamic code
1300 generation so powerful.
1301
1302 @node Fibonacci
1303 @section Fibonacci numbers
1304
1305 The code in this section calculates the Fibonacci sequence. That is
1306 modeled by the recurrence relation:
1307 @display
1308      f(0) = 0
1309      f(1) = f(2) = 1
1310      f(n) = f(n-1) + f(n-2)
1311 @end display
1312
1313 The purpose of this example is to introduce branches.  There are two
1314 kind of branches: backward branches and forward branches.  We'll
1315 present the calculation in a recursive and iterative form; the
1316 former only uses forward branches, while the latter uses both.
1317
1318 @example
1319 #include <stdio.h>
1320 #include <lightning.h>
1321
1322 static jit_state_t *_jit;
1323
1324 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1325
1326 int main(int argc, char *argv[])
1327 @{
1328   pifi       fib;
1329   jit_node_t *label;
1330   jit_node_t *call;
1331   jit_node_t *in;                 @rem{/* offset of the argument */}
1332   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1333   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1334
1335   init_jit(argv[0]);
1336   _jit = jit_new_state();
1337
1338   label = jit_label();
1339         jit_prolog   ();
1340   in =  jit_arg      ();
1341         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1342  zero = jit_beqi     (JIT_R0, 0);
1343         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1344         jit_movi     (JIT_R0, 1);
1345   ref = jit_blei     (JIT_V0, 2);
1346         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1347         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1348         jit_prepare();
1349           jit_pushargr(JIT_V1);
1350         call = jit_finishi(NULL);
1351         jit_patch_at(call, label);
1352         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1353         jit_prepare();
1354           jit_pushargr(JIT_V2);
1355         call = jit_finishi(NULL);
1356         jit_patch_at(call, label);
1357         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1358         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1359
1360   jit_patch(ref);                               @rem{/* patch jump */}
1361   jit_patch(zero);                              @rem{/* patch jump */}
1362         jit_retr(JIT_R0);
1363
1364   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1365   fib = jit_emit();
1366   jit_clear_state();
1367   printf("fib(%d) = %d\n", 32, fib(32));
1368   jit_destroy_state();
1369   finish_jit();
1370   return 0;
1371 @}
1372 @end example
1373
1374 As said above, this is the first example of dynamically compiling
1375 branches.  Branch instructions have two operands containing the
1376 values to be compared, and return a @code{jit_note_t *} object
1377 to be patched.
1378
1379 Because labels final address are only known after calling @code{emit},
1380 it is required to call @code{patch} or @code{patch_at}, what does
1381 tell @lightning{} that the target to patch is actually a pointer to
1382 a @code{jit_node_t *} object, otherwise, it would assume that is
1383 a pointer to a C function. Note that conditional branches do not
1384 receive a label argument, so they must be patched.
1385
1386 You need to call @code{patch_at} on the return of value @code{calli},
1387 @code{finishi}, and @code{calli} if it is actually referencing a label
1388 in the jit code. All branch instructions do not receive a label
1389 argument. Note that @code{movi} is an special case, and patching it
1390 is usually done to get the final address of a label, usually to later
1391 call @code{jmpr}.
1392
1393 Now, here is the iterative version:
1394
1395 @example
1396 #include <stdio.h>
1397 #include <lightning.h>
1398
1399 static jit_state_t *_jit;
1400
1401 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1402
1403 int main(int argc, char *argv[])
1404 @{
1405   pifi       fib;
1406   jit_node_t *in;               @rem{/* offset of the argument */}
1407   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1408   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1409   jit_node_t *jump;             @rem{/* jump to start of loop */}
1410   jit_node_t *loop;             @rem{/* start of the loop */}
1411
1412   init_jit(argv[0]);
1413   _jit = jit_new_state();
1414
1415         jit_prolog   ();
1416   in =  jit_arg      ();
1417         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1418  zero = jit_beqi     (JIT_R0, 0);
1419         jit_movr     (JIT_R1, JIT_R0);
1420         jit_movi     (JIT_R0, 1);
1421   ref = jit_blti     (JIT_R1, 2);
1422         jit_subi     (JIT_R2, JIT_R2, 2);
1423         jit_movr     (JIT_R1, JIT_R0);
1424
1425   loop= jit_label();
1426         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1427         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1428         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1429         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1430   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1431   jit_patch_at(jump, loop);
1432
1433   jit_patch(ref);                               @rem{/* patch forward jump */}
1434   jit_patch(zero);                              @rem{/* patch forward jump */}
1435         jit_retr     (JIT_R0);
1436
1437   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1438   fib = jit_emit();
1439   jit_clear_state();
1440   printf("fib(%d) = %d\n", 36, fib(36));
1441   jit_destroy_state();
1442   finish_jit();
1443   return 0;
1444 @}
1445 @end example
1446
1447 This code calculates the recurrence relation using iteration (a
1448 @code{for} loop in high-level languages).  There are no function
1449 calls anymore: instead, there is a backward jump (the @code{bnei} at
1450 the end of the loop).
1451
1452 Note that the program must remember the address for backward jumps;
1453 for forward jumps it is only required to remember the jump code,
1454 and call @code{patch} for the implicit label.
1455
1456 @node Reentrancy
1457 @chapter Re-entrant usage of @lightning{}
1458
1459 @lightning{} uses the special @code{_jit} identifier. To be able
1460 to be able to use multiple jit generation states at the same
1461 time, it is required to used code similar to:
1462
1463 @example
1464     struct jit_state lightning;
1465     #define lightning _jit
1466 @end example
1467
1468 This will cause the symbol defined to @code{_jit} to be passed as
1469 the first argument to the underlying @lightning{} implementation,
1470 that is usually a function with an @code{_} (underscode) prefix
1471 and with an argument named @code{_jit}, in the pattern:
1472
1473 @example
1474     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1475     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1476 @end example
1477
1478 The reason for this is to use the same syntax as the initial lightning
1479 implementation and to avoid needing the user to keep adding an extra
1480 argument to every call, as multiple jit states generating code in
1481 paralell should be very uncommon.
1482
1483 @section Registers
1484 @chapter Accessing the whole register file
1485
1486 As mentioned earlier in this chapter, all @lightning{} back-ends are
1487 guaranteed to have at least six general-purpose integer registers and
1488 six floating-point registers, but many back-ends will have more.
1489
1490 To access the entire register files, you can use the
1491 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1492 accept a parameter that identifies the register number, which
1493 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1494 and @code{JIT_F_NUM} respectively; the number need not be
1495 constant.  Of course, expressions like @code{JIT_R0} and
1496 @code{JIT_R(0)} denote the same register, and likewise for
1497 integer callee-saved, or floating-point, registers.
1498
1499 @node Customizations
1500 @chapter Customizations
1501
1502 Frequently it is desirable to have more control over how code is
1503 generated or how memory is used during jit generation or execution.
1504
1505 @section Memory functions
1506 To aid in complete control of memory allocation and deallocation
1507 @lightning{} provides wrappers that default to standard @code{malloc},
1508 @code{realloc} and @code{free}. These are loosely based on the
1509 GNU GMP counterparts, with the difference that they use the same
1510 prototype of the system allocation functions, that is, no @code{size}
1511 for @code{free} or @code{old_size} for @code{realloc}.
1512
1513 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1514 @lightning{} guarantees that memory is only allocated or released
1515 using these wrapped functions, but you must note that if lightning
1516 was linked to GNU binutils, malloc is probably will be called multiple
1517 times from there when initializing the disassembler.
1518
1519 Because @code{init_jit} may call memory functions, if you need to call
1520 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1521 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1522 previous or default wrappers will be passed.
1523 @end deftypefun
1524
1525 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1526 Get the current memory allocation function. Also, unlike the GNU GMP
1527 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1528 @end deftypefun
1529
1530 @section Alternate code buffer
1531 To instruct @lightning{} to use an alternate code buffer it is required
1532 to call @code{jit_realize} before @code{jit_emit}, and then query states
1533 and customize as appropriate.
1534
1535 @deftypefun void jit_realize ()
1536 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1537 that no other @code{jit_xyz} call will be made.
1538 @end deftypefun
1539
1540 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1541 Returns NULL or the previous value set with @code{jit_set_code}, and
1542 sets the @var{code_size} argument to an appropriate value.
1543 If @code{jit_get_code} is called before @code{jit_emit}, the
1544 @var{code_size} argument is set to the expected amount of bytes
1545 required to generate code.
1546 If @code{jit_get_code} is called after @code{jit_emit}, the
1547 @var{code_size} argument is set to the exact amount of bytes used
1548 by the code.
1549 @end deftypefun
1550
1551 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1552 Instructs @lightning{} to output to the @var{code} argument and
1553 use @var{size} as a guard to not write to invalid memory. If during
1554 @code{jit_emit} @lightning{} finds out that the code would not fit
1555 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1556 @end deftypefun
1557
1558 A simple example of a loop using an alternate buffer is:
1559
1560 @example
1561   jit_uint8_t   *code;
1562   int           *(func)(int);      @rem{/* function pointer */}
1563   jit_word_t     code_size;
1564   jit_word_t     real_code_size;
1565   @rem{...}
1566   jit_realize();                   @rem{/* ready to generate code */}
1567   jit_get_code(&code_size);        @rem{/* get expected code size */}
1568   code_size = (code_size + 4095) & -4096;
1569   do (;;) @{
1570     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1571                 MAP_PRIVATE | MAP_ANON, -1, 0);
1572     jit_set_code(code, code_size);
1573     if ((func = jit_emit()) == NULL) @{
1574       munmap(code, code_size);
1575       code_size += 4096;
1576     @}
1577   @} while (func == NULL);
1578   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1579 @end example
1580
1581 The first call to @code{jit_get_code} should return @code{NULL} and set
1582 the @code{code_size} argument to the expected amount of bytes required
1583 to emit code.
1584 The second call to @code{jit_get_code} is after a successful call to
1585 @code{jit_emit}, and will return the value previously set with
1586 @code{jit_set_code} and set the @code{real_code_size} argument to the
1587 exact amount of bytes used to emit the code.
1588
1589 @section Alternate data buffer
1590 Sometimes it may be desirable to customize how, or to prevent
1591 @lightning{} from using an extra buffer for constants or debug
1592 annotation. Usually when also using an alternate code buffer.
1593
1594 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1595 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1596 and sets the @var{data_size} argument to how many bytes are required
1597 for the constants data buffer, and @var{note_size} to how many bytes
1598 are required to store the debug note information.
1599 Note that it always preallocate one debug note entry even if
1600 @code{jit_name} or @code{jit_note} are never called, but will return
1601 zero in the @var{data_size} argument if no constant is required;
1602 constants are only used for the @code{float} and @code{double} operations
1603 that have an immediate argument, and not in all @lightning{} ports.
1604 @end deftypefun
1605
1606 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1607
1608 @var{data} can be NULL if disabling constants and annotations, otherwise,
1609 a valid pointer must be passed. An assertion is done that the data will
1610 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1611 with @code{-DNDEBUG}).
1612
1613 @var{size} tells the space in bytes available in @var{data}.
1614
1615 @var{flags} can be zero to tell to just use the alternate data buffer,
1616 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1617
1618 @table @t
1619 @item JIT_DISABLE_DATA
1620 @cindex JIT_DISABLE_DATA
1621 Instructs @lightning{} to not use a constant table, but to use an
1622 alternate method to synthesize those, usually with a larger code
1623 sequence using stack space to transfer the value from a GPR to a
1624 FPR register.
1625
1626 @item JIT_DISABLE_NOTE
1627 @cindex JIT_DISABLE_NOTE
1628 Instructs @lightning{} to not store file or function name, and
1629 line numbers in the constant buffer.
1630 @end table
1631 @end deftypefun
1632
1633 A simple example of a preventing usage of a data buffer is:
1634
1635 @example
1636   @rem{...}
1637   jit_realize();                        @rem{/* ready to generate code */}
1638   jit_get_data(NULL, NULL);
1639   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1640   @rem{...}
1641 @end example
1642
1643 Or to only use a data buffer, if required:
1644
1645 @example
1646   jit_uint8_t   *data;
1647   jit_word_t     data_size;
1648   @rem{...}
1649   jit_realize();                        @rem{/* ready to generate code */}
1650   jit_get_data(&data_size, NULL);
1651   if (data_size)
1652     data = malloc(data_size);
1653   else
1654     data = NULL;
1655   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1656   @rem{...}
1657   if (data)
1658     free(data);
1659   @rem{...}
1660 @end example
1661
1662 @node Acknowledgements
1663 @chapter Acknowledgements
1664
1665 As far as I know, the first general-purpose portable dynamic code
1666 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1667 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1668 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1669 directly inspired @lightning{}.
1670
1671 Thanks go to Ian Piumarta, who kindly accepted to release his own
1672 program @sc{ccg} under the GNU General Public License, thereby allowing
1673 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1674 @sc{ccg} provides a way of dynamically assemble programs written in the
1675 underlying architecture's assembly language.  So it is not portable,
1676 yet very interesting.
1677
1678 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1679 was first developed as a tool to be used in GNU Smalltalk's dynamic
1680 translator from bytecodes to native code.