deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Registers::               Accessing the whole register file
  34 * Customizations::          Advanced code generation customizations
  35 * Acknowledgements::        Acknowledgements for GNU lightning
  36 @end menu
  37 @end ifnottex
  38
  39 @node Overview
  40 @chapter Introduction to @lightning{}
  41
  42 @iftex
  43 This document describes @value{TOPIC} the @lightning{} library for
  44 dynamic code generation.
  45 @end iftex
  46
  47 Dynamic code generation is the generation of machine code
  48 at runtime. It is typically used to strip a layer of interpretation
  49 by allowing compilation to occur at runtime.  One of the most
  50 well-known applications of dynamic code generation is perhaps that
  51 of interpreters that compile source code to an intermediate bytecode
  52 form, which is then recompiled to machine code at run-time: this
  53 approach effectively combines the portability of bytecode
  54 representations with the speed of machine code.  Another common
  55 application of dynamic code generation is in the field of hardware
  56 simulators and binary emulators, which can use the same techniques
  57 to translate simulated instructions to the instructions of the
  58 underlying machine.
  59
  60 Yet other applications come to mind: for example, windowing
  61 @dfn{bitblt} operations, matrix manipulations, and network packet
  62 filters.  Albeit very powerful and relatively well known within the
  63 compiler community, dynamic code generation techniques are rarely
  64 exploited to their full potential and, with the exception of the
  65 two applications described above, have remained curiosities because
  66 of their portability and functionality barriers: binary instructions
  67 are generated, so programs using dynamic code generation must be
  68 retargeted for each machine; in addition, coding a run-time code
  69 generator is a tedious and error-prone task more than a difficult one.
  70
  71 @lightning{} provides a portable, fast and easily retargetable dynamic
  72 code generation system.
  73
  74 To be portable, @lightning{} abstracts over current architectures'
  75 quirks and unorthogonalities.  The interface that it exposes to is that
  76 of a standardized RISC architecture loosely based on the SPARC and MIPS
  77 chips.  There are a few general-purpose registers (six, not including
  78 those used to receive and pass parameters between subroutines), and
  79 arithmetic operations involve three operands---either three registers
  80 or two registers and an arbitrarily sized immediate value.
  81
  82 On one hand, this architecture is general enough that it is possible to
  83 generate pretty efficient code even on CISC architectures such as the
  84 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  85 real architectures closely enough that, most of the time, the
  86 compiler's constant folding pass ends up generating code which
  87 assembles machine instructions without further tests.
  88
  89 @node Installation
  90 @chapter Configuring and installing @lightning{}
  91
  92 Here we will assume that your system already has the dependencies
  93 necessary to build @lightning{}. For more on dependencies, see
  94 @lightning{}'s @file{README-hacking} file.
  95
  96 The first thing to do to build @lightning{} is to configure the
  97 program, picking the set of macros to be used on the host
  98 architecture; this configuration is automatically performed by
  99 the @file{configure} shell script; to run it, merely type:
 100 @example
 101      ./configure
 102 @end example
 103
 104 The @file{configure} accepts the @code{--enable-disassembler} option,
 105 hat enables linking to GNU binutils and optionally print human readable
 106 disassembly of the jit code. This option can be disabled by the
 107 @code{--disable-disassembler} option.
 108
 109 @file{configure} also accepts the  @code{--enable-devel-disassembler},
 110 option useful to check exactly hat machine instructions were generated
 111 for a @lightning{} instrction. Basically mixing @code{jit_print} and
 112 @code{jit_disassembly}.
 113
 114 The @code{--enable-assertions} option, which enables several consistency
 115 hecks in the run-time assemblers.  These are not usually needed, so you
 116 can decide to simply forget about it; also remember that these consistency
 117 checks tend to slow down your code generator.
 118
 119 The @code{--enable-devel-strong-type-checking} option that does extra type
 120 checking using @code{assert}. This option also enables the
 121 @code{--enable-assertions} unless it is explicitly disabled.
 122
 123 The option @code{--enable-devel-get-jit-size} should only be used
 124 when doing updates or maintenance to lightning. It regenerates the
 125 @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
 126 translating a @lightning{} instruction to machine code.
 127
 128 After you've configured @lightning{}, run @file{make} as usual.
 129
 130 @lightning{} has an extensive set of tests to validate it is working
 131 correctly in the build host. To test it run:
 132 @example
 133     make check
 134 @end example
 135
 136 The next important step is:
 137 @example
 138     make install
 139 @end example
 140
 141 This ends the process of installing @lightning{}.
 142
 143 @node The instruction set
 144 @chapter @lightning{}'s instruction set
 145
 146 @lightning{}'s instruction set was designed by deriving instructions
 147 that closely match those of most existing RISC architectures, or
 148 that can be easily syntesized if absent.  Each instruction is composed
 149 of:
 150 @itemize @bullet
 151 @item
 152 an operation, like @code{sub} or @code{mul}
 153
 154 @item
 155 most times, a register/immediate flag (@code{r} or @code{i})
 156
 157 @item
 158 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 159 @end itemize
 160
 161 Examples of legal mnemonics are @code{addr} (integer add, with three
 162 register operands) and @code{muli} (integer multiply, with two
 163 register operands and an immediate operand).  Each instruction takes
 164 two or three operands; in most cases, one of them can be an immediate
 165 value instead of a register.
 166
 167 Most @lightning{} integer operations are signed wordsize operations,
 168 with the exception of operations that convert types, or load or store
 169 values to/from memory. When applicable, the types and C types are as
 170 follow:
 171
 172 @example
 173      _c         @r{signed char}
 174      _uc        @r{unsigned char}
 175      _s         @r{short}
 176      _us        @r{unsigned short}
 177      _i         @r{int}
 178      _ui        @r{unsigned int}
 179      _l         @r{long}
 180      _f         @r{float}
 181      _d         @r{double}
 182 @end example
 183
 184 Most integer operations do not need a type modifier, and when loading or
 185 storing values to memory there is an alias to the proper operation
 186 using wordsize operands, that is, if ommited, the type is @r{int} on
 187 32-bit architectures and @r{long} on 64-bit architectures.  Note
 188 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 189
 190 When an unsigned operation result differs from the equivalent signed
 191 operation, there is a the @code{_u} modifier.
 192
 193 There are at least seven integer registers, of which six are
 194 general-purpose, while the last is used to contain the frame pointer
 195 (@code{FP}).  The frame pointer can be used to allocate and access local
 196 variables on the stack, using the @code{allocai} or @code{allocar}
 197 instruction.
 198
 199 Of the general-purpose registers, at least three are guaranteed to be
 200 preserved across function calls (@code{V0}, @code{V1} and
 201 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 202 @code{R2}).  Six registers are not very much, but this
 203 restriction was forced by the need to target CISC architectures
 204 which, like the x86, are poor of registers; anyway, backends can
 205 specify the actual number of available registers with the calls
 206 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 207 (for callee-save registers).
 208
 209 There are at least six floating-point registers, named @code{F0} to
 210 @code{F5}.  These are usually caller-save and are separate from the integer
 211 registers on the supported architectures; on Intel architectures,
 212 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 213 the register stack is mapped to a flat register file.  As for the
 214 integer registers, the macro @code{JIT_F_NUM} yields the number of
 215 floating-point registers.
 216
 217 The complete instruction set follows; as you can see, most non-memory
 218 operations only take integers (either signed or unsigned) as operands;
 219 this was done in order to reduce the instruction set, and because most
 220 architectures only provide word and long word operations on registers.
 221 There are instructions that allow operands to be extended to fit a larger
 222 data type, both in a signed and in an unsigned way.
 223
 224 @table @b
 225 @item Binary ALU operations
 226 These accept three operands; the last one can be an immediate.
 227 @code{addx} operations must directly follow @code{addc}, and
 228 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 229 Most, if not all, architectures do not support @r{float} or @r{double}
 230 immediate operands; lightning emulates those operations by moving the
 231 immediate to a temporary register and emiting the call with only
 232 register operands.
 233 @example
 234 addr         _f  _d  O1 = O2 + O3
 235 addi         _f  _d  O1 = O2 + O3
 236 addxr                O1 = O2 + (O3 + carry)
 237 addxi                O1 = O2 + (O3 + carry)
 238 addcr                O1 = O2 + O3, set carry
 239 addci                O1 = O2 + O3, set carry
 240 subr         _f  _d  O1 = O2 - O3
 241 subi         _f  _d  O1 = O2 - O3
 242 subxr                O1 = O2 - (O3 + carry)
 243 subxi                O1 = O2 - (O3 + carry)
 244 subcr                O1 = O2 - O3, set carry
 245 subci                O1 = O2 - O3, set carry
 246 rsbr         _f  _d  O1 = O3 - O1
 247 rsbi         _f  _d  O1 = O3 - O1
 248 mulr         _f  _d  O1 = O2 * O3
 249 muli         _f  _d  O1 = O2 * O3
 250 hmulr    _u          O1 = ((O2 * O3) >> WORDSIZE)
 251 hmuli    _u          O1 = ((O2 * O3) >> WORDSIZE)
 252 divr     _u  _f  _d  O1 = O2 / O3
 253 divi     _u  _f  _d  O1 = O2 / O3
 254 remr     _u          O1 = O2 % O3
 255 remi     _u          O1 = O2 % O3
 256 andr                 O1 = O2 & O3
 257 andi                 O1 = O2 & O3
 258 orr                  O1 = O2 | O3
 259 ori                  O1 = O2 | O3
 260 xorr                 O1 = O2 ^ O3
 261 xori                 O1 = O2 ^ O3
 262 lshr                 O1 = O2 << O3
 263 lshi                 O1 = O2 << O3
 264 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 265 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 266 lrotr                O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
 267 lroti                O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
 268 rrotr                O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
 269 rroti                O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
 270 movzr                O1 = O3 ? O1 : O2
 271 movnr                O1 = O3 ? O2 : O1
 272 @end example
 273
 274 Note that @code{lrotr}, @code{lroti}, @code{rrotr} and @code{rroti}
 275 are described as the fallback operation. These are bit shift/rotation
 276 operation.
 277
 278 @item Four operand binary ALU operations
 279 These accept two result registers, and two operands; the last one can
 280 be an immediate. The first two arguments cannot be the same register.
 281
 282 @code{qmul} stores the low word of the result in @code{O1} and the
 283 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 284 means there was no overflow. For signed multiplication, no overflow
 285 check is based on sign, and can be detected if @code{O2} is zero or
 286 minus one.
 287
 288 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 289 @code{O2}. It can be used as quick way to check if a division is
 290 exact, in which case the remainder is zero.
 291
 292 @code{qlsh} shifts from 0 to @emph{wordsize}, doing a normal left
 293 shift for the first result register and setting the second result
 294 resister to the overflow bits. @code{qlsh} can be used as a quick
 295 way to multiply by powers of two.
 296
 297 @code{qrsh} shifts from 0 to @emph{wordsize}, doing a normal right
 298 shift for the first result register and setting the second result
 299 register to the overflow bits. @code{qrsh} can be used as a quick
 300 way to divide by powers of two.
 301
 302 Note that @code{qlsh} and @code{qrsh} are basically implemented as
 303 two shifts. It is undefined behavior to pass a value not in the range
 304 0 to @emph{wordsize}. Most cpus will usually @code{and} the shift
 305 amount with @emph{wordsize} - 1, or possible use the @emph{remainder}.
 306 @lightning{} only generates code to specially handle 0 and @emph{wordsize}
 307 shifts. Since in a code generator for a @emph{safe language} should
 308 usually check the shift amount, these instructions usually should be
 309 used as a fast path to check for division without remainder or
 310 multiplication that does not overflow.
 311
 312 @example
 313 qmulr    _u       O1 O2 = O3 * O4
 314 qmuli    _u       O1 O2 = O3 * O4
 315 qdivr    _u       O1 O2 = O3 / O4
 316 qdivi    _u       O1 O2 = O3 / O4
 317 qlshr    _u       O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
 318 qlshi    _u       O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
 319 qrshr    _u       O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
 320 qrshi    _u       O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
 321 @end example
 322
 323 These four operand ALU operations are only defined for float operands.
 324
 325 @example
 326 fmar         _f  _d  O1 =  O2 * O3 + O4
 327 fmai         _f  _d  O1 =  O2 * O3 + O4
 328 fmsr         _f  _d  O1 =  O2 * O3 - O4
 329 fmsi         _f  _d  O1 =  O2 * O3 - O4
 330 fnmar        _f  _d  O1 = -O2 * O3 - O4
 331 fnmai        _f  _d  O1 = -O2 * O3 - O4
 332 fnmsr        _f  _d  O1 = -O2 * O3 + O4
 333 fnmsi        _f  _d  O1 = -O2 * O3 + O4
 334 @end example
 335
 336 These are a family of fused multiply-add instructions.
 337 Note that @lightning{} does not handle rounding modes nor math exceptions.
 338 Also note that not all backends provide a instruction for the equivalent
 339 @lightning{} instruction presented above. Some are completely implemented
 340 as fallbacks and some are composed of one or more instructions. For common
 341 input this should not cause major issues, but note that when implemented by
 342 the cpu, these are implemented as the multiplication calculated with infinite
 343 precision, and after the addition step rounding is done. Due to this, For
 344 specially crafted input different ports might show different output. When
 345 implemented by the CPU, it is also possible to have exceptions that do
 346 not happen if implemented as a fallback.
 347
 348 @item Unary ALU operations
 349 These accept two operands, the first must be a register and the
 350 second is a register if the @code{r} modifier is used, otherwise,
 351 the @code{i} modifier is used and the second argument is a constant.
 352
 353 @example
 354 negr         _f  _d  O1 = -O2
 355 negi         _f  _d  O1 = -O2
 356 comr                 O1 = ~O2
 357 comi                 O1 = ~O2
 358 clor                 O1 = number of leading one bits in O2
 359 cloi                 O1 = number of leading one bits in O2
 360 clzr                 O1 = number of leading zero bits in O2
 361 clzi                 O1 = number of leading zero bits in O2
 362 ctor                 O1 = number of trailing one bits in O2
 363 ctoi                 O1 = number of trailing one bits in O2
 364 ctzr                 O1 = number of trailing zero bits in O2
 365 ctzi                 O1 = number of trailing zero bits in O2
 366 rbitr                O1 = bits of O2 reversed
 367 rbiti                O1 = bits of O2 reversed
 368 popcntr              O1 = number of bits set in O2
 369 popcnti              O1 = number of bits set in O2
 370 @end example
 371
 372 Note that @code{ctzr} is basically equivalent of a @code{C} call
 373 @code{ffs} but indexed at bit zero, not one.
 374
 375 Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
 376 value of zero is not an error, it just returns the number of bits
 377 in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
 378 it returns 32.
 379
 380 The @code{clor} and @code{ctor} are just counterparts of the versions
 381 that search for zero bits.
 382
 383 These unary ALU operations are only defined for float operands.
 384
 385 @example
 386 absr         _f  _d  O1 = fabs(O2)
 387 absi         _f  _d  O1 = fabs(O2)
 388 sqrtr        _f  _d  O1 = sqrt(O2)
 389 sqrti        _f  _d  O1 = sqrt(O2)
 390 @end example
 391
 392 Note that for @code{float} and @code{double} unary operations, @lightning{}
 393 will generate code to actually execute the operation at runtime.
 394
 395 @item Compare instructions
 396 These accept three operands; again, the last can be an immediate.
 397 The last two operands are compared, and the first operand, that must be
 398 an integer register, is set to either 0 or 1, according to whether the
 399 given condition was met or not.
 400
 401 The conditions given below are for the standard behavior of C,
 402 where the ``unordered'' comparison result is mapped to false.
 403
 404 @example
 405 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 406 lti       _u  _f  _d  O1 =  (O2 <  O3)
 407 ler       _u  _f  _d  O1 =  (O2 <= O3)
 408 lei       _u  _f  _d  O1 =  (O2 <= O3)
 409 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 410 gti       _u  _f  _d  O1 =  (O2 >  O3)
 411 ger       _u  _f  _d  O1 =  (O2 >= O3)
 412 gei       _u  _f  _d  O1 =  (O2 >= O3)
 413 eqr           _f  _d  O1 =  (O2 == O3)
 414 eqi           _f  _d  O1 =  (O2 == O3)
 415 ner           _f  _d  O1 =  (O2 != O3)
 416 nei           _f  _d  O1 =  (O2 != O3)
 417 unltr         _f  _d  O1 = !(O2 >= O3)
 418 unler         _f  _d  O1 = !(O2 >  O3)
 419 ungtr         _f  _d  O1 = !(O2 <= O3)
 420 unger         _f  _d  O1 = !(O2 <  O3)
 421 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 422 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 423 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 424 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 425 @end example
 426
 427 @item Transfer operations
 428 These accept two operands; for @code{ext} both of them must be
 429 registers, while @code{mov} accepts an immediate value as the second
 430 operand.
 431
 432 Unlike @code{movr} and @code{movi}, the other instructions are used
 433 to truncate a wordsize operand to a smaller integer data type or to
 434 convert float data types. You can also use @code{extr} to convert an
 435 integer to a floating point value: the usual options are @code{extr_f}
 436 and @code{extr_d}.
 437
 438 @example
 439 movr                                 _f  _d  O1 = O2
 440 movi                                 _f  _d  O1 = O2
 441 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 442 truncr                               _f  _d  O1 = trunc(O2)
 443 extr                                         O1 = sign_extend(O2[O3:O3+04])
 444 extr_u                                       O1 = O2[O3:O3+04]
 445 depr                                         O1[O3:O3+O4] = O2
 446 @end example
 447
 448 @code{extr}, @code{extr_u} and @code{depr} are useful to access @code{C}
 449 compatible bit fields, provided that these are contained in a machine
 450 word. @code{extr} is used to @emph{extract} and signed extend a value
 451 from a bit field. @code{extr_u} is used to @emph{extract} and zero
 452 extend a value from a bit field. @code{depr} is used to @emph{deposit}
 453 a value into a bit field.
 454
 455 @example
 456 extr(result, source, offset, length)
 457 extr_u(result, source, offset, length)
 458 depr(result, source, offset, length)
 459 @end example
 460
 461 A common way to declare @code{C} and @lightning{} compatible bit fields is:
 462 @example
 463 union @{
 464     struct @{
 465         jit_word_t  signed_bits: @code{length};
 466         jit_uword_t unsigned_bits: @code{length};
 467         ...
 468     @} s;
 469     jit_word_t  signed_value;
 470     jit_uword_t unsigned_value;
 471 @} u;
 472 @end example
 473
 474 In 64-bit architectures it may be required to use @code{truncr_f_i},
 475 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 476 the equivalent C code.  Only the @code{_i} modifier is available in
 477 32-bit architectures.
 478
 479 @example
 480 truncr_f_i    <int> O1 = <float> O2
 481 truncr_f_l    <long>O1 = <float> O2
 482 truncr_d_i    <int> O1 = <double>O2
 483 truncr_d_l    <long>O1 = <double>O2
 484 @end example
 485
 486 The float conversion operations are @emph{destination first,
 487 source second}, but the order of the types is reversed.  This happens
 488 for historical reasons.
 489
 490 @example
 491 extr_f_d      <double>O1 = <float> O2
 492 extr_d_f      <float> O1 = <double>O2
 493 @end example
 494
 495 The float to/from integer transfer operations are also @emph{destination
 496 first, source second}. These were added later, but follow the pattern
 497 of historic patterns.
 498
 499 @example
 500 movr_w_f     <float>O1 = <int>O2
 501 movi_w_f     <float>O1 = <int>O2
 502 movr_f_w     <int>O1 = <float>O2
 503 movi_f_w     <int>O1 = <float>O2
 504 movr_w_d     <double>O1 = <long>O2
 505 movi_w_d     <double>O1 = <long>O2
 506 movr_d_w     <long>O1 = <double>O2
 507 movi_d_w     <long>O1 = <double>O2
 508 movr_ww_d    <double>O1 = [<int>O2:<int>O3]
 509 movi_ww_d    <double>O1 = [<int>O2:<int>O3]
 510 movr_d_ww    [<int>O1:<int>O2] = <double>O3
 511 movi_d_ww    [<int>O1:<int>O2] = <double>O3
 512 @end example
 513
 514 These are used to transfer bits to/from floats to/from integers, and are
 515 useful to access bits of floating point values.
 516
 517 @code{movr_w_d}, @code{movi_w_d}, @code{movr_d_w} and @code{movi_d_w} are
 518 only available in 64-bit. Conversely, @code{movr_ww_d}, @code{movi_ww_d},
 519 @code{movr_d_ww} and @code{movi_d_ww} are only available in 32-bit.
 520 For the int pair to/from double transfers, integer arguments must respect
 521 endianess, to match how the cpu handles the verbatim byte values.
 522
 523 @item Network extensions
 524 These accept two operands, both of which must be registers; these
 525 two instructions actually perform the same task, yet they are
 526 assigned to two mnemonics for the sake of convenience and
 527 completeness.  As usual, the first operand is the destination and
 528 the second is the source.
 529 The @code{_ul} variant is only available in 64-bit architectures.
 530 @example
 531 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 532 ntohr    _us _ui _ul @r{Network-to-host order }
 533 @end example
 534
 535 @code{bswapr} can be used to unconditionally byte-swap an operand.
 536 On little-endian architectures, @code{htonr} and @code{ntohr} resolve
 537 to this.
 538 The @code{_ul} variant is only available in 64-bit architectures.
 539 @example
 540 bswapr    _us _ui _ul  01 = byte_swap(02)
 541 @end example
 542
 543 @item Load operations
 544 @code{ld} accepts two operands while @code{ldx} accepts three;
 545 in both cases, the last can be either a register or an immediate
 546 value. Values are extended (with or without sign, according to
 547 the data type specification) to fit a whole register.
 548 The @code{_ui} and @code{_l} types are only available in 64-bit
 549 architectures.  For convenience, there is a version without a
 550 type modifier for integer or pointer operands that uses the
 551 appropriate wordsize call.
 552 @example
 553 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 554 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 555 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 556 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 557 @end example
 558
 559 @item Store operations
 560 @code{st} accepts two operands while @code{stx} accepts three; in
 561 both cases, the first can be either a register or an immediate
 562 value. Values are sign-extended to fit a whole register.
 563 @example
 564 str     _c       _s       _i       _l  _f  _d  *O1 = O2
 565 sti     _c       _s       _i       _l  _f  _d  *O1 = O2
 566 stxr    _c       _s       _i       _l  _f  _d  *(O1+O2) = O3
 567 stxi    _c       _s       _i       _l  _f  _d  *(O1+O2) = O3
 568 @end example
 569 Note that the unsigned type modifier is not available, as the store
 570 only writes to the 1, 2, 4 or 8 sized memory address.
 571 The @code{_l} type is only available in 64-bit architectures, and for
 572 convenience, there is a version without a type modifier for integer or
 573 pointer operands that uses the appropriate wordsize call.
 574
 575 @item Unaligned memory access
 576 These allow access to integers of size 3, in 32-bit, and extra sizes
 577 5, 6 and 7 in 64-bit.
 578 For floating point values only support for size 4 and 8 is provided.
 579 @example
 580 unldr       O1 = *(signed O3 byte integer)* = O2
 581 unldi       O1 = *(signed O3 byte integer)* = O2
 582 unldr_u     O1 = *(unsigned O3 byte integer)* = O2
 583 unldi_u     O1 = *(unsigned O3 byte integer)* = O2
 584 unldr_x     O1 = *(O3 byte float)* = O2
 585 unldi_x     O1 = *(O3 byte float)* = O2
 586 unstr       *(O3 byte integer)O1 = O2
 587 unsti       *(O3 byte integer)O1 = O2
 588 unstr_x     *(O3 byte float)O1 = O2
 589 unsti_x     *(O3 byte float)O1 = O2
 590 @end example
 591 With the exception of non standard sized integers, these might be
 592 implemented as normal loads and stores, if the processor supports
 593 unaligned memory access, or, mode can be chosen at jit initialization
 594 time, to generate or not generate, code that does trap on unaligned
 595 memory access. Letting the kernel trap means smaller code generation
 596 as it is required to check alignment at runtime@footnote{This requires changing jit_cpu.unaligned to 0 to disable or 1 to enable unaligned code generation. Not all ports have the C jit_cpu.unaligned value.}.
 597
 598 @item Argument management
 599 These are:
 600 @example
 601 prepare     (not specified)
 602 va_start    (not specified)
 603 pushargr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 604 pushargi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 605 va_push     (not specified)
 606 arg         _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 607 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 608 va_arg                                         _d
 609 putargr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 610 putargi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 611 ret         (not specified)
 612 retr        _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 613 reti        _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 614 reti                                       _f  _d
 615 va_end      (not specified)
 616 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 617 epilog      (not specified)
 618 @end example
 619 As with other operations that use a type modifier, the @code{_ui} and
 620 @code{_l} types are only available in 64-bit architectures, but there
 621 are operations without a type modifier that alias to the appropriate
 622 integer operation with wordsize operands.
 623
 624 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 625 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 626 A code snippet that wants to call another procedure and has to pass
 627 arguments must, in order: use the @code{prepare} instruction and use
 628 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 629 left to right order}; and use @code{finish} or @code{call} (explained below)
 630 to perform the actual call.
 631
 632 Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
 633 handling integer types can be used without a type modifier.
 634 It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
 635 and @code{getarg} otherwise problems will happen if generating jit for
 636 environments that require arguments to be truncated and zero or sign
 637 extended by the caller and/or excess arguments might be passed packed
 638 in the stack. Currently only Apple systems with @code{aarch64} cpus are
 639 known to have this restriction.
 640
 641 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 642 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 643 @code{va_push} is required when passing a @code{va_list} to another function,
 644 because not all architectures expect it as a single pointer. Known case
 645 is DEC Alpha, that requires it as a structure passed by value.
 646
 647 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 648 @code{arg} is different from other instruction in that it does not
 649 actually generate any code: instead, it is a function which returns
 650 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 651 a value'' means that @lightning{} code that compile these
 652 instructions return a value when expanded.} You should call
 653 @code{arg} as soon as possible, before any function call or, more
 654 easily, right after the @code{prolog} instructions
 655 (which is treated later).
 656
 657 @code{getarg} accepts a register argument and a value returned by
 658 @code{arg}, and will move that argument to the register, extending
 659 it (with or without sign, according to the data type specification)
 660 to fit a whole register.  These instructions are more intimately
 661 related to the usage of the @lightning{} instruction set in code
 662 that generates other code, so they will be treated more
 663 specifically in @ref{GNU lightning examples, , Generating code at
 664 run-time}.
 665
 666 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 667 it accepts as first argument a register or immediate, and as
 668 second argument a value returned by @code{arg}. It allows changing,
 669 or restoring an argument to the current function, and is a
 670 construct required to implement tail call optimization. Note that
 671 arguments in registers are very cheap, but will be overwritten
 672 at any moment, including on some operations, for example division,
 673 that on several ports is implemented as a function call.
 674
 675 Finally, the @code{retval} instruction fetches the return value of a
 676 called function in a register.  The @code{retval} instruction takes a
 677 register argument and copies the return value of the previously called
 678 function in that register.  A function with a return value should use
 679 @code{retr} or @code{reti} to put the return value in the return register
 680 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 681
 682 @code{epilog} is an optional call, that marks the end of a function
 683 body. It is automatically generated by @lightning{} if starting a new
 684 function (what should be done after a @code{ret} call) or finishing
 685 generating jit.
 686 It is very important to note that the fact that @code{epilog} being
 687 optional may cause a common mistake. Consider this:
 688 @example
 689 fun1:
 690     prolog
 691     ...
 692     ret
 693 fun2:
 694     prolog
 695 @end example
 696 Because @code{epilog} is added when finding a new @code{prolog},
 697 this will cause the @code{fun2} label to actually be before the
 698 return from @code{fun1}. Because @lightning{} will actually
 699 understand it as:
 700 @example
 701 fun1:
 702     prolog
 703     ...
 704     ret
 705 fun2:
 706     epilog
 707     prolog
 708 @end example
 709
 710 You should observe a few rules when using these macros.  First of
 711 all, if calling a varargs function, you should use the @code{ellipsis}
 712 call to mark the position of the ellipsis in the C prototype.
 713
 714 You should not nest calls to @code{prepare} inside a
 715 @code{prepare/finish} block.  Doing this will result in undefined
 716 behavior. Note that for functions with zero arguments you can use
 717 just @code{call}.
 718
 719 @item Branch instructions
 720 Like @code{arg}, these also return a value which, in this case,
 721 is to be used to compile forward branches as explained in
 722 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 723 compared; of these, the last can be either a register or an immediate.
 724 They are:
 725 @example
 726 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 727 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 728 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 729 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 730 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 731 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 732 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 733 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 734 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 735 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 736 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 737 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 738
 739 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 740 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 741 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 742 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 743 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 744 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 745 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 746 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 747
 748 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 749 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 750 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 751 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 752 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 753 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 754 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 755 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 756 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 757 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 758 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 759 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 760 @end example
 761
 762 Note that the @code{C} code does not have an @code{O1} argument. It is
 763 required to always use the return value as an argument to @code{patch},
 764 @code{patch_at} or @code{patch_abs}.
 765
 766 @item Jump and return operations
 767 These accept one argument except @code{ret} and @code{jmpi} which
 768 have none; the difference between @code{finishi} and @code{calli}
 769 is that the latter does not clean the stack from pushed parameters
 770 (if any) and the former must @strong{always} follow a @code{prepare}
 771 instruction.
 772 @example
 773 callr     (not specified)                @r{function call to register O1}
 774 calli     (not specified)                @r{function call to immediate O1}
 775 finishr   (not specified)                @r{function call to register O1}
 776 finishi   (not specified)                @r{function call to immediate O1}
 777 jmpr      (not specified)                @r{unconditional jump to register}
 778 jmpi      (not specified)                @r{unconditional jump}
 779 ret       (not specified)                @r{return from subroutine}
 780 retr      _c _uc _s _us _i _ui _l _f _d
 781 reti      _c _uc _s _us _i _ui _l _f _d
 782 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 783                                          @r{to register}
 784 @end example
 785
 786 Like branch instruction, @code{jmpi} also returns a value which is to
 787 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 788 numbers}.
 789
 790 @item Labels
 791 There are 3 @lightning{} instructions to create labels:
 792 @example
 793 label     (not specified)                @r{simple label}
 794 forward   (not specified)                @r{forward label}
 795 indirect  (not specified)                @r{special simple label}
 796 @end example
 797
 798 The following instruction is used to specify a minimal alignment for
 799 the next instruction, usually with a label:
 800 @example
 801 align     (not specified)                @r{align code}
 802 @end example
 803
 804 Similar to @code{align} is the next instruction, also usually used with
 805 a label:
 806 @example
 807 skip      (not specified)                @r{skip code}
 808 @end example
 809 It is used to specify a minimal number of bytes of nops to be inserted
 810 before the next instruction.
 811
 812 @code{label} is normally used as @code{patch_at} argument for backward
 813 jumps.
 814
 815 @example
 816         jit_node_t *jump, *label;
 817 label = jit_label();
 818         ...
 819         jump = jit_beqr(JIT_R0, JIT_R1);
 820         jit_patch_at(jump, label);
 821 @end example
 822
 823 @code{forward} is used to patch code generation before the actual
 824 position of the label is known.
 825
 826 @example
 827         jit_node_t *jump, *label;
 828 label = jit_forward();
 829         jump = jit_beqr(JIT_R0, JIT_R1);
 830         jit_patch_at(jump, label);
 831         ...
 832         jit_link(label);
 833 @end example
 834
 835 @code{indirect} is useful when creating jump tables, and tells
 836 @lightning{} to not optimize out a label that is not the target of
 837 any jump, because an indirect jump may land where it is defined.
 838
 839 @example
 840         jit_node_t *jump, *label;
 841         ...
 842         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 843         ...
 844 label = jit_indirect();
 845 @end example
 846
 847 @code{indirect} is an special case of @code{note} and @code{name}
 848 because it is a valid argument to @code{address}.
 849
 850 Note that the usual idiom to write the previous example is
 851 @example
 852         jit_node_t *addr, *jump;
 853 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 854         ...
 855         jmpr(JIT_R0);
 856         ...
 857         jit_patch(addr);                 @rem{/* implicit label added */}
 858 @end example
 859
 860 that automatically binds the implicit label added by @code{patch} with
 861 the @code{movi}, but on some special conditions it is required to create
 862 an "unbound" label.
 863
 864 @code{align} is useful for creating multiple entry points to a
 865 (trampoline) function that are all accessible through a single
 866 function pointer.  @code{align} receives an integer argument that
 867 defines the minimal alignment of the address of a label directly
 868 following the @code{align} instruction.  The integer argument must be
 869 a power of two and the effective alignment will be a power of two no
 870 less than the argument to @code{align}.  If the argument to
 871 @code{align} is 16 or more, the effective alignment will match the
 872 specified minimal alignment exactly.
 873
 874 @example
 875           jit_node_t *forward, *label1, *label2, *jump;
 876           unsigned char *addr1, *addr2;
 877 forward = jit_forward();
 878           jit_align(16);
 879 label1  = jit_indirect();                @rem{/* first entry point */}
 880 jump    = jit_jmpi();                    @rem{/* jump to first handler */}
 881           jit_patch_at(jump, forward);
 882           jit_align(16);
 883 label2  = jit_indirect();                @rem{/* second entry point */}
 884           ...                            @rem{/* second handler */}
 885           jit_jmpr(...);
 886           jit_link(forward);
 887           ...                            @rem{/* first handler /*}
 888           jit_jmpr(...);
 889           ...
 890           jit_emit();
 891           addr1 = jit_address(label1);
 892           addr2 = jit_address(label2);
 893           assert(addr2 - addr1 == 16);   @rem{/* only one of the addresses needs to be remembered */}
 894 @end example
 895
 896 @code{skip} is useful for reserving space in the code buffer that can
 897 later be filled (possibly with the help of the pair of functions
 898 @code{jit_unprotect} and @code{jit_protect}).
 899
 900 @item Function prolog
 901
 902 These macros are used to set up a function prolog.  The @code{allocai}
 903 call accept a single integer argument and returns an offset value
 904 for stack storage access.  The @code{allocar} accepts two registers
 905 arguments, the first is set to the offset for stack access, and the
 906 second is the size in bytes argument.
 907
 908 @example
 909 prolog    (not specified)                @r{function prolog}
 910 allocai   (not specified)                @r{reserve space on the stack}
 911 allocar   (not specified)                @r{allocate space on the stack}
 912 @end example
 913
 914 @code{allocai} receives the number of bytes to allocate and returns
 915 the offset from the frame pointer register @code{FP} to the base of
 916 the area.
 917
 918 @code{allocar} receives two register arguments.  The first is where
 919 to store the offset from the frame pointer register @code{FP} to the
 920 base of the area.  The second argument is the size in bytes.  Note
 921 that @code{allocar} is dynamic allocation, and special attention
 922 should be taken when using it.  If called in a loop, every iteration
 923 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 924 depending on backend requirements, even if allocating only one byte.
 925 It is advisable to not use it with @code{frame} and @code{tramp}; it
 926 should work with @code{frame} with special care to call only once,
 927 but is not supported if used in @code{tramp}, even if called only
 928 once.
 929
 930 As a small appetizer, here is a small function that adds 1 to the input
 931 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 932 is a bit different from the one used when writing real subroutines with
 933 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 934 examples, , Generating code at run-time}.
 935
 936 @example
 937 incr:
 938      prolog
 939 in = arg                     @rem{! We have an integer argument}
 940      getarg    R0, in        @rem{! Move it to R0}
 941      addi      R0, R0, 1     @rem{! Add 1}
 942      retr      R0            @rem{! And return the result}
 943 @end example
 944
 945 And here is another function which uses the @code{printf} function from
 946 the standard C library to write a number in hexadecimal notation:
 947
 948 @example
 949 printhex:
 950      prolog
 951 in = arg                     @rem{! Same as above}
 952      getarg    R0, in
 953      prepare                 @rem{! Begin call sequence for printf}
 954      pushargi  "%x"          @rem{! Push format string}
 955      ellipsis                @rem{! Varargs start here}
 956      pushargr  R0            @rem{! Push second argument}
 957      finishi   printf        @rem{! Call printf}
 958      ret                     @rem{! Return to caller}
 959 @end example
 960
 961 @item Register liveness
 962
 963 During code generation, @lightning{} occasionally needs scratch registers
 964 or needs to use architecture-defined registers.  For that, @lightning{}
 965 internally maintains register liveness information.
 966
 967 In the following example, @code{qdivr} will need special registers like
 968 @code{R0} on some architectures.  As @lightning{} understands that
 969 @code{R0} is used in the subsequent instruction, it will create
 970 save/restore code for @code{R0} in case.
 971
 972 @example
 973 ...
 974 qdivr V0, V1, V2, V3
 975 movr  V3, R0
 976 ...
 977 @end example
 978
 979 The same is not true in the example that follows.  Here, @code{R0} is
 980 not alive after the division operation because @code{R0} is neither an
 981 argument register nor a callee-save register.  Thus, no save/restore
 982 code for @code{R0} will be created in case.
 983
 984 @example
 985 ...
 986 qdivr V0, V1, V2, V3
 987 jmpr  R1
 988 ...
 989 @end example
 990
 991 The @code{live} instruction can be used to mark a register as live after
 992 it as in the following example.  Here, @code{R0} will be preserved
 993 across the division.
 994
 995 @example
 996 ...
 997 qdivr V0, V1, V2, V3
 998 live R0
 999 jmpr R1
1000 ...
1001 @end example
1002
1003 The @code{live} instruction is useful at code entry and exit points,
1004 like after and before a @code{callr} instruction.
1005
1006 @item Trampolines, continuations and tail call optimization
1007
1008 Frequently it is required to generate jit code that must jump to
1009 code generated later, possibly from another @code{jit_context_t}.
1010 These require compatible stack frames.
1011
1012 @lightning{} provides two primitives from where trampolines,
1013 continuations and tail call optimization can be implemented.
1014
1015 @example
1016 frame   (not specified)                  @r{create stack frame}
1017 tramp   (not specified)                  @r{assume stack frame}
1018 @end example
1019
1020 @code{frame} receives an integer argument@footnote{It is not
1021 automatically computed because it does not know about the
1022 requirement of later generated code.} that defines the size in
1023 bytes for the stack frame of the current, @code{C} callable,
1024 jit function. To calculate this value, a good formula is maximum
1025 number of arguments to any called native function times
1026 eight@footnote{Times eight so that it works for double arguments.
1027 And would not need conditionals for ports that pass arguments in
1028 the stack.}, plus the sum of the arguments to any call to
1029 @code{jit_allocai}. @lightning{} automatically adjusts this value
1030 for any backend specific stack memory it may need, or any
1031 alignment constraint.
1032
1033 @code{frame} also instructs @lightning{} to save all callee
1034 save registers in the prolog and reload in the epilog.
1035
1036 @example
1037 main:                        @rem{! jit entry point}
1038      prolog                  @rem{! function prolog}
1039      frame  256              @rem{! save all callee save registers and}
1040                              @rem{! reserve at least 256 bytes in stack}
1041 main_loop:
1042      ...
1043      jmpi   handler          @rem{! jumps to external code}
1044      ...
1045      ret                     @rem{! return to the caller}
1046 @end example
1047
1048 @code{tramp} differs from @code{frame} only that a prolog and epilog
1049 will not be generated. Note that @code{prolog} must still be used.
1050 The code under @code{tramp} must be ready to be entered with a jump
1051 at the prolog position, and instead of a return, it must end with
1052 a non conditional jump. @code{tramp} exists solely for the fact
1053 that it allows optimizing out prolog and epilog code that would
1054 never be executed.
1055
1056 @example
1057 handler:                     @rem{! handler entry point}
1058      prolog                  @rem{! function prolog}
1059      tramp  256              @rem{! assumes all callee save registers}
1060                              @rem{! are saved and there is at least}
1061                              @rem{! 256 bytes in stack}
1062      ...
1063      jmpi   main_loop        @rem{! return to the main loop}
1064 @end example
1065
1066 @lightning{} only supports Tail Call Optimization using the
1067 @code{tramp} construct. Any other way is not guaranteed to
1068 work on all ports.
1069
1070 An example of a simple (recursive) tail call optimization:
1071
1072 @example
1073 factorial:                   @rem{! Entry point of the factorial function}
1074      prolog
1075 in = arg                     @rem{! Receive an integer argument}
1076      getarg R0, in           @rem{! Move argument to RO}
1077      prepare
1078          pushargi 1          @rem{! This is the accumulator}
1079          pushargr R0         @rem{! This is the argument}
1080      finishi fact            @rem{! Call the tail call optimized function}
1081      retval R0               @rem{! Fetch the result}
1082      retr R0                 @rem{! Return it}
1083      epilog                  @rem{! Epilog *before* label before prolog}
1084
1085 fact:                        @rem{! Entry point of the helper function}
1086      prolog
1087      frame 16                @rem{! Reserve 16 bytes in the stack}
1088 fact_entry:                  @rem{! This is the tail call entry point}
1089 ac = arg                     @rem{! The accumulator is the first argument}
1090 in = arg                     @rem{! The factorial argument}
1091      getarg R0, ac           @rem{! Move the accumulator to R0}
1092      getarg R1, in           @rem{! Move the argument to R1}
1093      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
1094      mulr R0, R0, R1         @rem{! accumulator *= argument}
1095      putargr R0, ac          @rem{! Update the accumulator}
1096      subi R1, R1, 1          @rem{! argument -= 1}
1097      putargr R1, in          @rem{! Update the argument}
1098      jmpi fact_entry         @rem{! Tail Call Optimize it!}
1099 fact_out:
1100      retr R0                 @rem{! Return the accumulator}
1101 @end example
1102
1103 @item Predicates
1104 @example
1105 forward_p      (not specified)           @r{forward label predicate}
1106 indirect_p     (not specified)           @r{indirect label predicate}
1107 target_p       (not specified)           @r{used label predicate}
1108 arg_register_p (not specified)           @r{argument kind predicate}
1109 callee_save_p  (not specified)           @r{callee save predicate}
1110 pointer_p      (not specified)           @r{pointer predicate}
1111 @end example
1112
1113 @code{forward_p} expects a @code{jit_node_t*} argument, and
1114 returns non zero if it is a forward label reference, that is,
1115 a label returned by @code{forward}, that still needs a
1116 @code{link} call.
1117
1118 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
1119 non zero if it is an indirect label reference, that is, a label that
1120 was returned by @code{indirect}.
1121
1122 @code{target_p} expects a @code{jit_node_t*} argument, that is any
1123 kind of label, and will return non zero if there is at least one
1124 jump or move referencing it.
1125
1126 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
1127 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
1128 will return non zero if the argument lives in a register. This call
1129 is useful to know the live range of register arguments, as those
1130 are very fast to read and write, but have volatile values.
1131
1132 @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
1133 @code{JIT_Fn}, and will return non zero if the register is callee
1134 save. This call is useful because on several ports, the @code{JIT_Rn}
1135 and @code{JIT_Fn} registers are actually callee save; no need
1136 to save and load the values when making function calls.
1137
1138 @code{pointer_p} expects a pointer argument, and will return non
1139 zero if the pointer is inside the generated jit code. Must be
1140 called after @code{jit_emit} and before @code{jit_destroy_state}.
1141
1142 @item Atomic operations
1143 Only compare-and-swap is implemented. It accepts four operands;
1144 the second can be an immediate.
1145
1146 The first argument is set with a boolean value telling if the operation
1147 did succeed.
1148
1149 Arguments must be different, cannot use the result register to also pass
1150 an argument.
1151
1152 The second argument is the address of a machine word.
1153
1154 The third argument is the old value.
1155
1156 The fourth argument is the new value.
1157
1158 @example
1159 casr                                  01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1160 casi                                  01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1161 @end example
1162
1163 If value at the address in the second argument is equal to the third
1164 argument, the address value is atomically modified to the value of the
1165 fourth argument and the first argument is set to a non zero value.
1166
1167 If the value at the address in the second argument is not equal to the
1168 third argument nothing is done and the first argument is set to zero.
1169 @end table
1170
1171 @node GNU lightning examples
1172 @chapter Generating code at run-time
1173
1174 To use @lightning{}, you should include the @file{lightning.h} file that
1175 is put in your include directory by the @samp{make install} command.
1176
1177 Each of the instructions above translates to a macro or function call.
1178 All you have to do is prepend @code{jit_} (lowercase) to opcode names
1179 and @code{JIT_} (uppercase) to register names.  Of course, parameters
1180 are to be put between parentheses.
1181
1182 This small tutorial presents three examples:
1183
1184 @iftex
1185 @itemize @bullet
1186 @item
1187 The @code{incr} function found in @ref{The instruction set, ,
1188 @lightning{}'s instruction set}:
1189
1190 @item
1191 A simple function call to @code{printf}
1192
1193 @item
1194 An RPN calculator.
1195
1196 @item
1197 Fibonacci numbers
1198 @end itemize
1199 @end iftex
1200 @ifnottex
1201 @menu
1202 * incr::             A function which increments a number by one
1203 * printf::           A simple function call to printf
1204 * RPN calculator::   A more complex example, an RPN calculator
1205 * Fibonacci::        Calculating Fibonacci numbers
1206 @end menu
1207 @end ifnottex
1208
1209 @node incr
1210 @section A function which increments a number by one
1211
1212 Let's see how to create and use the sample @code{incr} function created
1213 in @ref{The instruction set, , @lightning{}'s instruction set}:
1214
1215 @example
1216 #include <stdio.h>
1217 #include <lightning.h>
1218
1219 static jit_state_t *_jit;
1220
1221 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
1222
1223 int main(int argc, char *argv[])
1224 @{
1225   jit_node_t  *in;
1226   pifi         incr;
1227
1228   init_jit(argv[0]);
1229   _jit = jit_new_state();
1230
1231   jit_prolog();                    @rem{/* @t{     prolog             } */}
1232   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
1233   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
1234   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
1235   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
1236
1237   incr = jit_emit();
1238   jit_clear_state();
1239
1240   @rem{/* call the generated code@comma{} passing 5 as an argument */}
1241   printf("%d + 1 = %d\n", 5, incr(5));
1242
1243   jit_destroy_state();
1244   finish_jit();
1245   return 0;
1246 @}
1247 @end example
1248
1249 Let's examine the code line by line (well, almost@dots{}):
1250
1251 @table @t
1252 @item #include <lightning.h>
1253 You already know about this.  It defines all of @lightning{}'s macros.
1254
1255 @item static jit_state_t *_jit;
1256 You might wonder about what is @code{jit_state_t}.  It is a structure
1257 that stores jit code generation information.  The name @code{_jit} is
1258 special, because since multiple jit generators can run at the same
1259 time, you must either @r{#define _jit my_jit_state} or name it
1260 @code{_jit}.
1261
1262 @item typedef int (*pifi)(int);
1263 Just a handy typedef for a pointer to a function that takes an
1264 @code{int} and returns another.
1265
1266 @item jit_node_t  *in;
1267 Declares a variable to hold an identifier for a function argument. It
1268 is an opaque pointer, that will hold the return of a call to @code{arg}
1269 and be used as argument to @code{getarg}.
1270
1271 @item pifi         incr;
1272 Declares a function pointer variable to a function that receives an
1273 @code{int} and returns an @code{int}.
1274
1275 @item init_jit(argv[0]);
1276 You must call this function before creating a @code{jit_state_t}
1277 object. This function does global state initialization, and may need
1278 to detect CPU or Operating System features.  It receives a string
1279 argument that is later used to read symbols from a shared object using
1280 GNU binutils if disassembly was enabled at configure time. If no
1281 disassembly will be performed a NULL pointer can be used as argument.
1282
1283 @item _jit = jit_new_state();
1284 This call initializes a @lightning{} jit state.
1285
1286 @item jit_prolog();
1287 Ok, so we start generating code for our beloved function@dots{}
1288
1289 @item in = jit_arg();
1290 @itemx jit_getarg(JIT_R0, in);
1291 We retrieve the first (and only) argument, an integer, and store it
1292 into the general-purpose register @code{R0}.
1293
1294 @item jit_addi(JIT_R0, JIT_R0, 1);
1295 We add one to the content of the register.
1296
1297 @item jit_retr(JIT_R0);
1298 This instruction generates a standard function epilog that returns
1299 the contents of the @code{R0} register.
1300
1301 @item incr = jit_emit();
1302 This instruction is very important.  It actually translates the
1303 @lightning{} macros used before to machine code, flushes the generated
1304 code area out of the processor's instruction cache and return a
1305 pointer to the start of the code.
1306
1307 @item jit_clear_state();
1308 This call cleanups any data not required for jit execution. Note
1309 that it must be called after any call to @code{jit_print} or
1310 @code{jit_address}, as this call destroy the @lightning{}
1311 intermediate representation.
1312
1313 @item printf("%d + 1 = %d", 5, incr(5));
1314 Calling our function is this simple---it is not distinguishable from
1315 a normal C function call, the only difference being that @code{incr}
1316 is a variable.
1317
1318 @item jit_destroy_state();
1319 Releases all memory associated with the jit context. It should be
1320 called after known the jit will no longer be called.
1321
1322 @item finish_jit();
1323 This call cleanups any global state hold by @lightning{}, and is
1324 advisable to call it once jit code will no longer be generated.
1325 @end table
1326
1327 @lightning{} abstracts two phases of dynamic code generation: selecting
1328 instructions that map the standard representation, and emitting binary
1329 code for these instructions.  The client program has the responsibility
1330 of describing the code to be generated using the standard @lightning{}
1331 instruction set.
1332
1333 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1334 architecture (on the right is the code that an assembly-language
1335 programmer would write):
1336
1337 @table @b
1338 @item SPARC
1339 @example
1340       save  %sp, -112, %sp
1341       mov  %i0, %g2                 retl
1342       inc  %g2                      inc %o0
1343       mov  %g2, %i0
1344       restore
1345       retl
1346       nop
1347 @end example
1348 In this case, @lightning{} introduces overhead to create a register
1349 window (not knowing that the procedure is a leaf procedure) and to
1350 move the argument to the general purpose register @code{R0} (which
1351 maps to @code{%g2} on the SPARC).
1352 @end table
1353
1354 @table @b
1355 @item x86_64
1356 @example
1357     mov   %rdi,%rax
1358     add   $0x1,%rax
1359     ret
1360 @end example
1361 In this case, for the x86 port, @lightning{} has simple optimizations
1362 to understand it is a leaf function, and that it is not required to
1363 create a stack frame nor update the stack pointer.
1364 @end table
1365
1366 @node printf
1367 @section A simple function call to @code{printf}
1368
1369 Again, here is the code for the example:
1370
1371 @example
1372 #include <stdio.h>
1373 #include <lightning.h>
1374
1375 static jit_state_t *_jit;
1376
1377 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1378
1379 int main(int argc, char *argv[])
1380 @{
1381   pvfi          myFunction;             @rem{/* ptr to generated code */}
1382   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1383   jit_node_t    *in;                    @rem{/* to get the argument */}
1384
1385   init_jit(argv[0]);
1386   _jit = jit_new_state();
1387
1388   start = jit_note(__FILE__, __LINE__);
1389   jit_prolog();
1390   in = jit_arg();
1391   jit_getarg(JIT_R1, in);
1392   jit_prepare();
1393   jit_pushargi((jit_word_t)"generated %d bytes\n");
1394   jit_ellipsis();
1395   jit_pushargr(JIT_R1);
1396   jit_finishi(printf);
1397   jit_ret();
1398   jit_epilog();
1399   end = jit_note(__FILE__, __LINE__);
1400
1401   myFunction = jit_emit();
1402
1403   @rem{/* call the generated code@comma{} passing its size as argument */}
1404   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1405   jit_clear_state();
1406
1407   jit_disassemble();
1408
1409   jit_destroy_state();
1410   finish_jit();
1411   return 0;
1412 @}
1413 @end example
1414
1415 The function shows how many bytes were generated.  Most of the code
1416 is not very interesting, as it resembles very closely the program
1417 presented in @ref{incr, , A function which increments a number by one}.
1418
1419 For this reason, we're going to concentrate on just a few statements.
1420
1421 @table @t
1422 @item start = jit_note(__FILE__, __LINE__);
1423 @itemx @r{@dots{}}
1424 @itemx end = jit_note(__FILE__, __LINE__);
1425 These two instruction call the @code{jit_note} macro, which creates
1426 a note in the jit code; arguments to @code{jit_note} usually are a
1427 filename string and line number integer, but using NULL for the
1428 string argument is perfectly valid if only need to create a simple
1429 marker in the code.
1430
1431 @item jit_ellipsis();
1432 @code{ellipsis} usually is only required if calling varargs functions
1433 with double arguments, but it is a good practice to properly describe
1434 the @r{@dots{}} in the call sequence.
1435
1436 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1437 Note the use of the @code{(jit_word_t)} cast, that is used only
1438 to avoid a compiler warning, due to using a pointer where a
1439 wordsize integer type was expected.
1440
1441 @item jit_prepare();
1442 @itemx @r{@dots{}}
1443 @itemx jit_finishi(printf);
1444 Once the arguments to @code{printf} have been pushed, what means
1445 moving them to stack or register arguments, the @code{printf}
1446 function is called and the stack cleaned.  Note how @lightning{}
1447 abstracts the differences between different architectures and
1448 ABI's -- the client program does not know how parameter passing
1449 works on the host architecture.
1450
1451 @item jit_epilog();
1452 Usually it is not required to call @code{epilog}, but because it
1453 is implicitly called when noticing the end of a function, if the
1454 @code{end} variable was set with a @code{note} call after the
1455 @code{ret}, it would not consider the function epilog.
1456
1457 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1458 This calls the generate jit function passing as argument the offset
1459 difference from the @code{start} and @code{end} notes. The @code{address}
1460 call must be done after the @code{emit} call or either a fatal error
1461 will happen (if @lightning{} is built with assertions enable) or an
1462 undefined value will be returned.
1463
1464 @item jit_clear_state();
1465 Note that @code{jit_clear_state} was called after executing jit in
1466 this example. It was done because it must be called after any call
1467 to @code{jit_address} or @code{jit_print}.
1468
1469 @item jit_disassemble();
1470 @code{disassemble} will dump the generated code to standard output,
1471 unless @lightning{} was built with the disassembler disabled, in which
1472 case no output will be shown.
1473 @end table
1474
1475 @node RPN calculator
1476 @section A more complex example, an RPN calculator
1477
1478 We create a small stack-based RPN calculator which applies a series
1479 of operators to a given parameter and to other numeric operands.
1480 Unlike previous examples, the code generator is fully parameterized
1481 and is able to compile different formulas to different functions.
1482 Here is the code for the expression compiler; a sample usage will
1483 follow.
1484
1485 Since @lightning{} does not provide push/pop instruction, this
1486 example uses a stack-allocated area to store the data.  Such an
1487 area can be allocated using the macro @code{allocai}, which
1488 receives the number of bytes to allocate and returns the offset
1489 from the frame pointer register @code{FP} to the base of the
1490 area.
1491
1492 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1493 to access stack-allocated variables.  However, it is possible to
1494 use operations such as @code{add} to compute the address of the
1495 variables, and pass the address around.
1496
1497 @example
1498 #include <stdio.h>
1499 #include <lightning.h>
1500
1501 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1502
1503 static jit_state_t *_jit;
1504
1505 void stack_push(int reg, int *sp)
1506 @{
1507   jit_stxi_i (*sp, JIT_FP, reg);
1508   *sp += sizeof (int);
1509 @}
1510
1511 void stack_pop(int reg, int *sp)
1512 @{
1513   *sp -= sizeof (int);
1514   jit_ldxi_i (reg, JIT_FP, *sp);
1515 @}
1516
1517 jit_node_t *compile_rpn(char *expr)
1518 @{
1519   jit_node_t *in, *fn;
1520   int stack_base, stack_ptr;
1521
1522   fn = jit_note(NULL, 0);
1523   jit_prolog();
1524   in = jit_arg();
1525   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1526
1527   jit_getarg(JIT_R2, in);
1528
1529   while (*expr) @{
1530     char buf[32];
1531     int n;
1532     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1533       expr += n - 1;
1534       stack_push(JIT_R0, &stack_ptr);
1535       jit_movi(JIT_R0, atoi(buf));
1536     @} else if (*expr == 'x') @{
1537       stack_push(JIT_R0, &stack_ptr);
1538       jit_movr(JIT_R0, JIT_R2);
1539     @} else if (*expr == '+') @{
1540       stack_pop(JIT_R1, &stack_ptr);
1541       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1542     @} else if (*expr == '-') @{
1543       stack_pop(JIT_R1, &stack_ptr);
1544       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1545     @} else if (*expr == '*') @{
1546       stack_pop(JIT_R1, &stack_ptr);
1547       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1548     @} else if (*expr == '/') @{
1549       stack_pop(JIT_R1, &stack_ptr);
1550       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1551     @} else @{
1552       fprintf(stderr, "cannot compile: %s\n", expr);
1553       abort();
1554     @}
1555     ++expr;
1556   @}
1557   jit_retr(JIT_R0);
1558   jit_epilog();
1559   return fn;
1560 @}
1561 @end example
1562
1563 The principle on which the calculator is based is easy: the stack top
1564 is held in R0, while the remaining items of the stack are held in the
1565 memory area that we allocate with @code{allocai}.  Compiling a numeric
1566 operand or the argument @code{x} pushes the old stack top onto the
1567 stack and moves the operand into R0; compiling an operator pops the
1568 second operand off the stack into R1, and compiles the operation so
1569 that the result goes into R0, thus becoming the new stack top.
1570
1571 This example allocates a fixed area for 32 @code{int}s.  This is not
1572 a problem when the function is a leaf like in this case; in a full-blown
1573 compiler you will want to analyze the input and determine the number
1574 of needed stack slots---a very simple example of register allocation.
1575 The area is then managed like a stack using @code{stack_push} and
1576 @code{stack_pop}.
1577
1578 Source code for the client (which lies in the same source file) follows:
1579
1580 @example
1581 int main(int argc, char *argv[])
1582 @{
1583   jit_node_t *nc, *nf;
1584   pifi c2f, f2c;
1585   int i;
1586
1587   init_jit(argv[0]);
1588   _jit = jit_new_state();
1589
1590   nc = compile_rpn("32x9*5/+");
1591   nf = compile_rpn("x32-5*9/");
1592   (void)jit_emit();
1593   c2f = (pifi)jit_address(nc);
1594   f2c = (pifi)jit_address(nf);
1595   jit_clear_state();
1596
1597   printf("\nC:");
1598   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1599   printf("\nF:");
1600   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1601   printf("\n");
1602
1603   printf("\nF:");
1604   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1605   printf("\nC:");
1606   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1607   printf("\n");
1608
1609   jit_destroy_state();
1610   finish_jit();
1611   return 0;
1612 @}
1613 @end example
1614
1615 The client displays a conversion table between Celsius and Fahrenheit
1616 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1617 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1618 respectively.
1619
1620 Providing the formula as an argument to @code{compile_rpn} effectively
1621 parameterizes code generation, making it possible to use the same code
1622 to compile different functions; this is what makes dynamic code
1623 generation so powerful.
1624
1625 @node Fibonacci
1626 @section Fibonacci numbers
1627
1628 The code in this section calculates the Fibonacci sequence. That is
1629 modeled by the recurrence relation:
1630 @display
1631      f(0) = 0
1632      f(1) = f(2) = 1
1633      f(n) = f(n-1) + f(n-2)
1634 @end display
1635
1636 The purpose of this example is to introduce branches.  There are two
1637 kind of branches: backward branches and forward branches.  We'll
1638 present the calculation in a recursive and iterative form; the
1639 former only uses forward branches, while the latter uses both.
1640
1641 @example
1642 #include <stdio.h>
1643 #include <lightning.h>
1644
1645 static jit_state_t *_jit;
1646
1647 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1648
1649 int main(int argc, char *argv[])
1650 @{
1651   pifi       fib;
1652   jit_node_t *label;
1653   jit_node_t *call;
1654   jit_node_t *in;                 @rem{/* offset of the argument */}
1655   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1656   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1657
1658   init_jit(argv[0]);
1659   _jit = jit_new_state();
1660
1661   label = jit_label();
1662         jit_prolog   ();
1663   in =  jit_arg      ();
1664         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1665  zero = jit_beqi     (JIT_R0, 0);
1666         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1667         jit_movi     (JIT_R0, 1);
1668   ref = jit_blei     (JIT_V0, 2);
1669         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1670         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1671         jit_prepare();
1672           jit_pushargr(JIT_V1);
1673         call = jit_finishi(NULL);
1674         jit_patch_at(call, label);
1675         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1676         jit_prepare();
1677           jit_pushargr(JIT_V2);
1678         call = jit_finishi(NULL);
1679         jit_patch_at(call, label);
1680         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1681         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1682
1683   jit_patch(ref);                               @rem{/* patch jump */}
1684   jit_patch(zero);                              @rem{/* patch jump */}
1685         jit_retr(JIT_R0);
1686
1687   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1688   fib = jit_emit();
1689   jit_clear_state();
1690   printf("fib(%d) = %d\n", 32, fib(32));
1691   jit_destroy_state();
1692   finish_jit();
1693   return 0;
1694 @}
1695 @end example
1696
1697 As said above, this is the first example of dynamically compiling
1698 branches.  Branch instructions have two operands containing the
1699 values to be compared, and return a @code{jit_note_t *} object
1700 to be patched.
1701
1702 Because labels final address are only known after calling @code{emit},
1703 it is required to call @code{patch} or @code{patch_at}, what does
1704 tell @lightning{} that the target to patch is actually a pointer to
1705 a @code{jit_node_t *} object, otherwise, it would assume that is
1706 a pointer to a C function. Note that conditional branches do not
1707 receive a label argument, so they must be patched.
1708
1709 You need to call @code{patch_at} on the return of value @code{calli},
1710 @code{finishi}, and @code{calli} if it is actually referencing a label
1711 in the jit code. All branch instructions do not receive a label
1712 argument. Note that @code{movi} is an special case, and patching it
1713 is usually done to get the final address of a label, usually to later
1714 call @code{jmpr}.
1715
1716 Now, here is the iterative version:
1717
1718 @example
1719 #include <stdio.h>
1720 #include <lightning.h>
1721
1722 static jit_state_t *_jit;
1723
1724 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1725
1726 int main(int argc, char *argv[])
1727 @{
1728   pifi       fib;
1729   jit_node_t *in;               @rem{/* offset of the argument */}
1730   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1731   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1732   jit_node_t *jump;             @rem{/* jump to start of loop */}
1733   jit_node_t *loop;             @rem{/* start of the loop */}
1734
1735   init_jit(argv[0]);
1736   _jit = jit_new_state();
1737
1738         jit_prolog   ();
1739   in =  jit_arg      ();
1740         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1741  zero = jit_beqi     (JIT_R0, 0);
1742         jit_movr     (JIT_R1, JIT_R0);
1743         jit_movi     (JIT_R0, 1);
1744   ref = jit_blti     (JIT_R1, 2);
1745         jit_subi     (JIT_R2, JIT_R2, 2);
1746         jit_movr     (JIT_R1, JIT_R0);
1747
1748   loop= jit_label();
1749         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1750         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1751         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1752         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1753   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1754   jit_patch_at(jump, loop);
1755
1756   jit_patch(ref);                               @rem{/* patch forward jump */}
1757   jit_patch(zero);                              @rem{/* patch forward jump */}
1758         jit_retr     (JIT_R0);
1759
1760   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1761   fib = jit_emit();
1762   jit_clear_state();
1763   printf("fib(%d) = %d\n", 36, fib(36));
1764   jit_destroy_state();
1765   finish_jit();
1766   return 0;
1767 @}
1768 @end example
1769
1770 This code calculates the recurrence relation using iteration (a
1771 @code{for} loop in high-level languages).  There are no function
1772 calls anymore: instead, there is a backward jump (the @code{bnei} at
1773 the end of the loop).
1774
1775 Note that the program must remember the address for backward jumps;
1776 for forward jumps it is only required to remember the jump code,
1777 and call @code{patch} for the implicit label.
1778
1779 @node Reentrancy
1780 @chapter Re-entrant usage of @lightning{}
1781
1782 @lightning{} uses the special @code{_jit} identifier. To be able
1783 to be able to use multiple jit generation states at the same
1784 time, it is required to used code similar to:
1785
1786 @example
1787     struct jit_state lightning;
1788     #define lightning _jit
1789 @end example
1790
1791 This will cause the symbol defined to @code{_jit} to be passed as
1792 the first argument to the underlying @lightning{} implementation,
1793 that is usually a function with an @code{_} (underscode) prefix
1794 and with an argument named @code{_jit}, in the pattern:
1795
1796 @example
1797     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1798     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1799 @end example
1800
1801 The reason for this is to use the same syntax as the initial lightning
1802 implementation and to avoid needing the user to keep adding an extra
1803 argument to every call, as multiple jit states generating code in
1804 paralell should be very uncommon.
1805
1806 @node Registers
1807 @chapter Accessing the whole register file
1808
1809 As mentioned earlier in this chapter, all @lightning{} back-ends are
1810 guaranteed to have at least six general-purpose integer registers and
1811 six floating-point registers, but many back-ends will have more.
1812
1813 To access the entire register files, you can use the
1814 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1815 accept a parameter that identifies the register number, which
1816 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1817 and @code{JIT_F_NUM} respectively; the number need not be
1818 constant.  Of course, expressions like @code{JIT_R0} and
1819 @code{JIT_R(0)} denote the same register, and likewise for
1820 integer callee-saved, or floating-point, registers.
1821
1822 @section Scratch registers
1823
1824 For operations, @lightning{} does not support directly, like storing
1825 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1826 acquire and release a scratch register as in the following pattern:
1827
1828 @example
1829     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1830     jit_movi (reg, immediate);
1831     jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1832     jit_unget_reg (reg);
1833 @end example
1834
1835 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1836 reloads but don't follow branches, the code between both must be in
1837 the same basic block and must not contain any branches as in the
1838 following (bad) example.
1839
1840 @example
1841     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1842     jit_ldxi (reg, JIT_V0, offset);
1843     jump = jit_bnei (reg, V0);
1844     jit_movr (JIT_V1, reg);
1845     jit_patch (jump);
1846     jit_unget_reg (reg);
1847 @end example
1848
1849 @node Customizations
1850 @chapter Customizations
1851
1852 Frequently it is desirable to have more control over how code is
1853 generated or how memory is used during jit generation or execution.
1854
1855 @section Memory functions
1856 To aid in complete control of memory allocation and deallocation
1857 @lightning{} provides wrappers that default to standard @code{malloc},
1858 @code{realloc} and @code{free}. These are loosely based on the
1859 GNU GMP counterparts, with the difference that they use the same
1860 prototype of the system allocation functions, that is, no @code{size}
1861 for @code{free} or @code{old_size} for @code{realloc}.
1862
1863 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1864 @lightning{} guarantees that memory is only allocated or released
1865 using these wrapped functions, but you must note that if lightning
1866 was linked to GNU binutils, malloc is probably will be called multiple
1867 times from there when initializing the disassembler.
1868
1869 Because @code{init_jit} may call memory functions, if you need to call
1870 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1871 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1872 previous or default wrappers will be passed.
1873 @end deftypefun
1874
1875 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1876 Get the current memory allocation function. Also, unlike the GNU GMP
1877 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1878 @end deftypefun
1879
1880 @section Protection
1881 Unless an alternate code buffer is used (see below), @code{jit_emit}
1882 set the access protections that the code buffer's memory can be read and
1883 executed, but not modified.  One can use the following functions after
1884 @code{jit_emit} but before @code{jit_clear} to temporarily lift the
1885 protection:
1886
1887 @deftypefun void jit_unprotect ()
1888 Changes the access protection that the code buffer's memory can be read and
1889 modified.  Before the emitted code can be invoked, @code{jit_protect}
1890 has to be called to reset the change.
1891
1892 This procedure has no effect when an alternate code buffer (see below) is used.
1893 @end deftypefun
1894
1895 @deftypefun void jit_protect ()
1896 Changes the access protection that the code buffer's memory can be read and
1897 executed.
1898
1899 This procedure has no effect when an alternate code buffer (see below) is used.
1900 @end deftypefun
1901
1902 @section Alternate code buffer
1903 To instruct @lightning{} to use an alternate code buffer it is required
1904 to call @code{jit_realize} before @code{jit_emit}, and then query states
1905 and customize as appropriate.
1906
1907 @deftypefun void jit_realize ()
1908 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1909 that no other @code{jit_xyz} call will be made.
1910 @end deftypefun
1911
1912 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1913 Returns NULL or the previous value set with @code{jit_set_code}, and
1914 sets the @var{code_size} argument to an appropriate value.
1915 If @code{jit_get_code} is called before @code{jit_emit}, the
1916 @var{code_size} argument is set to the expected amount of bytes
1917 required to generate code.
1918 If @code{jit_get_code} is called after @code{jit_emit}, the
1919 @var{code_size} argument is set to the exact amount of bytes used
1920 by the code.
1921 @end deftypefun
1922
1923 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1924 Instructs @lightning{} to output to the @var{code} argument and
1925 use @var{size} as a guard to not write to invalid memory. If during
1926 @code{jit_emit} @lightning{} finds out that the code would not fit
1927 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1928 @end deftypefun
1929
1930 A simple example of a loop using an alternate buffer is:
1931
1932 @example
1933   jit_uint8_t   *code;
1934   int           *(func)(int);      @rem{/* function pointer */}
1935   jit_word_t     code_size;
1936   jit_word_t     real_code_size;
1937   @rem{...}
1938   jit_realize();                   @rem{/* ready to generate code */}
1939   jit_get_code(&code_size);        @rem{/* get expected code size */}
1940   code_size = (code_size + 4095) & -4096;
1941   do (;;) @{
1942     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1943                 MAP_PRIVATE | MAP_ANON, -1, 0);
1944     jit_set_code(code, code_size);
1945     if ((func = jit_emit()) == NULL) @{
1946       munmap(code, code_size);
1947       code_size += 4096;
1948     @}
1949   @} while (func == NULL);
1950   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1951 @end example
1952
1953 The first call to @code{jit_get_code} should return @code{NULL} and set
1954 the @code{code_size} argument to the expected amount of bytes required
1955 to emit code.
1956 The second call to @code{jit_get_code} is after a successful call to
1957 @code{jit_emit}, and will return the value previously set with
1958 @code{jit_set_code} and set the @code{real_code_size} argument to the
1959 exact amount of bytes used to emit the code.
1960
1961 @section Alternate data buffer
1962 Sometimes it may be desirable to customize how, or to prevent
1963 @lightning{} from using an extra buffer for constants or debug
1964 annotation. Usually when also using an alternate code buffer.
1965
1966 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1967 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1968 and sets the @var{data_size} argument to how many bytes are required
1969 for the constants data buffer, and @var{note_size} to how many bytes
1970 are required to store the debug note information.
1971 Note that it always preallocate one debug note entry even if
1972 @code{jit_name} or @code{jit_note} are never called, but will return
1973 zero in the @var{data_size} argument if no constant is required;
1974 constants are only used for the @code{float} and @code{double} operations
1975 that have an immediate argument, and not in all @lightning{} ports.
1976 @end deftypefun
1977
1978 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1979
1980 @var{data} can be NULL if disabling constants and annotations, otherwise,
1981 a valid pointer must be passed. An assertion is done that the data will
1982 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1983 with @code{-DNDEBUG}).
1984
1985 @var{size} tells the space in bytes available in @var{data}.
1986
1987 @var{flags} can be zero to tell to just use the alternate data buffer,
1988 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1989
1990 @table @t
1991 @item JIT_DISABLE_DATA
1992 @cindex JIT_DISABLE_DATA
1993 Instructs @lightning{} to not use a constant table, but to use an
1994 alternate method to synthesize those, usually with a larger code
1995 sequence using stack space to transfer the value from a GPR to a
1996 FPR register.
1997
1998 @item JIT_DISABLE_NOTE
1999 @cindex JIT_DISABLE_NOTE
2000 Instructs @lightning{} to not store file or function name, and
2001 line numbers in the constant buffer.
2002 @end table
2003 @end deftypefun
2004
2005 A simple example of a preventing usage of a data buffer is:
2006
2007 @example
2008   @rem{...}
2009   jit_realize();                        @rem{/* ready to generate code */}
2010   jit_get_data(NULL, NULL);
2011   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
2012   @rem{...}
2013 @end example
2014
2015 Or to only use a data buffer, if required:
2016
2017 @example
2018   jit_uint8_t   *data;
2019   jit_word_t     data_size;
2020   @rem{...}
2021   jit_realize();                        @rem{/* ready to generate code */}
2022   jit_get_data(&data_size, NULL);
2023   if (data_size)
2024     data = malloc(data_size);
2025   else
2026     data = NULL;
2027   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
2028   @rem{...}
2029   if (data)
2030     free(data);
2031   @rem{...}
2032 @end example
2033
2034 @node Acknowledgements
2035 @chapter Acknowledgements
2036
2037 As far as I know, the first general-purpose portable dynamic code
2038 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
2039 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
2040 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
2041 directly inspired @lightning{}.
2042
2043 Thanks go to Ian Piumarta, who kindly accepted to release his own
2044 program @sc{ccg} under the GNU General Public License, thereby allowing
2045 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
2046 @sc{ccg} provides a way of dynamically assemble programs written in the
2047 underlying architecture's assembly language.  So it is not portable,
2048 yet very interesting.
2049
2050 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
2051 was first developed as a tool to be used in GNU Smalltalk's dynamic
2052 translator from bytecodes to native code.