deps/lightning/doc/body.texi

   1 @ifnottex
   2 @dircategory Software development
   3 @direntry
   4 * lightning: (lightning).       Library for dynamic code generation.
   5 @end direntry
   6 @end ifnottex
   7
   8 @ifnottex
   9 @node Top
  10 @top @lightning{}
  11
  12 @iftex
  13 @macro comma
  14 @verbatim{|,|}
  15 @end macro
  16 @end iftex
  17
  18 @ifnottex
  19 @macro comma
  20 @verb{|,|}
  21 @end macro
  22 @end ifnottex
  23
  24 This document describes @value{TOPIC} the @lightning{} library for
  25 dynamic code generation.
  26
  27 @menu
  28 * Overview::                What GNU lightning is
  29 * Installation::            Configuring and installing GNU lightning
  30 * The instruction set::     The RISC instruction set used in GNU lightning
  31 * GNU lightning examples::  GNU lightning's examples
  32 * Reentrancy::              Re-entrant usage of GNU lightning
  33 * Registers::               Accessing the whole register file
  34 * Customizations::          Advanced code generation customizations
  35 * Acknowledgements::        Acknowledgements for GNU lightning
  36 @end menu
  37 @end ifnottex
  38
  39 @node Overview
  40 @chapter Introduction to @lightning{}
  41
  42 @iftex
  43 This document describes @value{TOPIC} the @lightning{} library for
  44 dynamic code generation.
  45 @end iftex
  46
  47 Dynamic code generation is the generation of machine code
  48 at runtime. It is typically used to strip a layer of interpretation
  49 by allowing compilation to occur at runtime.  One of the most
  50 well-known applications of dynamic code generation is perhaps that
  51 of interpreters that compile source code to an intermediate bytecode
  52 form, which is then recompiled to machine code at run-time: this
  53 approach effectively combines the portability of bytecode
  54 representations with the speed of machine code.  Another common
  55 application of dynamic code generation is in the field of hardware
  56 simulators and binary emulators, which can use the same techniques
  57 to translate simulated instructions to the instructions of the
  58 underlying machine.
  59
  60 Yet other applications come to mind: for example, windowing
  61 @dfn{bitblt} operations, matrix manipulations, and network packet
  62 filters.  Albeit very powerful and relatively well known within the
  63 compiler community, dynamic code generation techniques are rarely
  64 exploited to their full potential and, with the exception of the
  65 two applications described above, have remained curiosities because
  66 of their portability and functionality barriers: binary instructions
  67 are generated, so programs using dynamic code generation must be
  68 retargeted for each machine; in addition, coding a run-time code
  69 generator is a tedious and error-prone task more than a difficult one.
  70
  71 @lightning{} provides a portable, fast and easily retargetable dynamic
  72 code generation system.
  73
  74 To be portable, @lightning{} abstracts over current architectures'
  75 quirks and unorthogonalities.  The interface that it exposes to is that
  76 of a standardized RISC architecture loosely based on the SPARC and MIPS
  77 chips.  There are a few general-purpose registers (six, not including
  78 those used to receive and pass parameters between subroutines), and
  79 arithmetic operations involve three operands---either three registers
  80 or two registers and an arbitrarily sized immediate value.
  81
  82 On one hand, this architecture is general enough that it is possible to
  83 generate pretty efficient code even on CISC architectures such as the
  84 Intel x86 or the Motorola 68k families.  On the other hand, it matches
  85 real architectures closely enough that, most of the time, the
  86 compiler's constant folding pass ends up generating code which
  87 assembles machine instructions without further tests.
  88
  89 @node Installation
  90 @chapter Configuring and installing @lightning{}
  91
  92 Here we will assume that your system already has the dependencies
  93 necessary to build @lightning{}. For more on dependencies, see
  94 @lightning{}'s @file{README-hacking} file.
  95
  96 The first thing to do to build @lightning{} is to configure the
  97 program, picking the set of macros to be used on the host
  98 architecture; this configuration is automatically performed by
  99 the @file{configure} shell script; to run it, merely type:
 100 @example
 101      ./configure
 102 @end example
 103
 104 The @file{configure} accepts the @code{--enable-disassembler} option,
 105 hat enables linking to GNU binutils and optionally print human readable
 106 disassembly of the jit code. This option can be disabled by the
 107 @code{--disable-disassembler} option.
 108
 109 @file{configure} also accepts the  @code{--enable-devel-disassembler},
 110 option useful to check exactly hat machine instructions were generated
 111 for a @lightning{} instrction. Basically mixing @code{jit_print} and
 112 @code{jit_disassembly}.
 113
 114 The @code{--enable-assertions} option, which enables several consistency
 115 hecks in the run-time assemblers.  These are not usually needed, so you
 116 can decide to simply forget about it; also remember that these consistency
 117 checks tend to slow down your code generator.
 118
 119 The @code{--enable-devel-strong-type-checking} option that does extra type
 120 checking using @code{assert}. This option also enables the
 121 @code{--enable-assertions} unless it is explicitly disabled.
 122
 123 The option @code{--enable-devel-get-jit-size} should only be used
 124 when doing updates or maintenance to lightning. It regenerates the
 125 @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
 126 translating a @lightning{} instruction to machine code.
 127
 128 After you've configured @lightning{}, run @file{make} as usual.
 129
 130 @lightning{} has an extensive set of tests to validate it is working
 131 correctly in the build host. To test it run:
 132 @example
 133     make check
 134 @end example
 135
 136 The next important step is:
 137 @example
 138     make install
 139 @end example
 140
 141 This ends the process of installing @lightning{}.
 142
 143 @node The instruction set
 144 @chapter @lightning{}'s instruction set
 145
 146 @lightning{}'s instruction set was designed by deriving instructions
 147 that closely match those of most existing RISC architectures, or
 148 that can be easily syntesized if absent.  Each instruction is composed
 149 of:
 150 @itemize @bullet
 151 @item
 152 an operation, like @code{sub} or @code{mul}
 153
 154 @item
 155 most times, a register/immediate flag (@code{r} or @code{i})
 156
 157 @item
 158 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
 159 @end itemize
 160
 161 Examples of legal mnemonics are @code{addr} (integer add, with three
 162 register operands) and @code{muli} (integer multiply, with two
 163 register operands and an immediate operand).  Each instruction takes
 164 two or three operands; in most cases, one of them can be an immediate
 165 value instead of a register.
 166
 167 Most @lightning{} integer operations are signed wordsize operations,
 168 with the exception of operations that convert types, or load or store
 169 values to/from memory. When applicable, the types and C types are as
 170 follow:
 171
 172 @example
 173      _c         @r{signed char}
 174      _uc        @r{unsigned char}
 175      _s         @r{short}
 176      _us        @r{unsigned short}
 177      _i         @r{int}
 178      _ui        @r{unsigned int}
 179      _l         @r{long}
 180      _f         @r{float}
 181      _d         @r{double}
 182 @end example
 183
 184 Most integer operations do not need a type modifier, and when loading or
 185 storing values to memory there is an alias to the proper operation
 186 using wordsize operands, that is, if ommited, the type is @r{int} on
 187 32-bit architectures and @r{long} on 64-bit architectures.  Note
 188 that lightning also expects @code{sizeof(void*)} to match the wordsize.
 189
 190 When an unsigned operation result differs from the equivalent signed
 191 operation, there is a the @code{_u} modifier.
 192
 193 There are at least seven integer registers, of which six are
 194 general-purpose, while the last is used to contain the frame pointer
 195 (@code{FP}).  The frame pointer can be used to allocate and access local
 196 variables on the stack, using the @code{allocai} or @code{allocar}
 197 instruction.
 198
 199 Of the general-purpose registers, at least three are guaranteed to be
 200 preserved across function calls (@code{V0}, @code{V1} and
 201 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
 202 @code{R2}).  Six registers are not very much, but this
 203 restriction was forced by the need to target CISC architectures
 204 which, like the x86, are poor of registers; anyway, backends can
 205 specify the actual number of available registers with the calls
 206 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
 207 (for callee-save registers).
 208
 209 There are at least six floating-point registers, named @code{F0} to
 210 @code{F5}.  These are usually caller-save and are separate from the integer
 211 registers on the supported architectures; on Intel architectures,
 212 in 32 bit mode if SSE2 is not available or use of X87 is forced,
 213 the register stack is mapped to a flat register file.  As for the
 214 integer registers, the macro @code{JIT_F_NUM} yields the number of
 215 floating-point registers.
 216
 217 The complete instruction set follows; as you can see, most non-memory
 218 operations only take integers (either signed or unsigned) as operands;
 219 this was done in order to reduce the instruction set, and because most
 220 architectures only provide word and long word operations on registers.
 221 There are instructions that allow operands to be extended to fit a larger
 222 data type, both in a signed and in an unsigned way.
 223
 224 @table @b
 225 @item Binary ALU operations
 226 These accept three operands; the last one can be an immediate.
 227 @code{addx} operations must directly follow @code{addc}, and
 228 @code{subx} must follow @code{subc}; otherwise, results are undefined.
 229 Most, if not all, architectures do not support @r{float} or @r{double}
 230 immediate operands; lightning emulates those operations by moving the
 231 immediate to a temporary register and emiting the call with only
 232 register operands.
 233 @example
 234 addr         _f  _d  O1 = O2 + O3
 235 addi         _f  _d  O1 = O2 + O3
 236 addxr                O1 = O2 + (O3 + carry)
 237 addxi                O1 = O2 + (O3 + carry)
 238 addcr                O1 = O2 + O3, set carry
 239 addci                O1 = O2 + O3, set carry
 240 subr         _f  _d  O1 = O2 - O3
 241 subi         _f  _d  O1 = O2 - O3
 242 subxr                O1 = O2 - (O3 + carry)
 243 subxi                O1 = O2 - (O3 + carry)
 244 subcr                O1 = O2 - O3, set carry
 245 subci                O1 = O2 - O3, set carry
 246 rsbr         _f  _d  O1 = O3 - O1
 247 rsbi         _f  _d  O1 = O3 - O1
 248 mulr         _f  _d  O1 = O2 * O3
 249 muli         _f  _d  O1 = O2 * O3
 250 divr     _u  _f  _d  O1 = O2 / O3
 251 divi     _u  _f  _d  O1 = O2 / O3
 252 remr     _u          O1 = O2 % O3
 253 remi     _u          O1 = O2 % O3
 254 andr                 O1 = O2 & O3
 255 andi                 O1 = O2 & O3
 256 orr                  O1 = O2 | O3
 257 ori                  O1 = O2 | O3
 258 xorr                 O1 = O2 ^ O3
 259 xori                 O1 = O2 ^ O3
 260 lshr                 O1 = O2 << O3
 261 lshi                 O1 = O2 << O3
 262 rshr     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 263 rshi     _u          O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
 264 movzr                O1 = O3 ? O1 : O2
 265 movnr                O1 = O3 ? O2 : O1
 266 @end example
 267
 268 @item Four operand binary ALU operations
 269 These accept two result registers, and two operands; the last one can
 270 be an immediate. The first two arguments cannot be the same register.
 271
 272 @code{qmul} stores the low word of the result in @code{O1} and the
 273 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
 274 means there was no overflow. For signed multiplication, no overflow
 275 check is based on sign, and can be detected if @code{O2} is zero or
 276 minus one.
 277
 278 @code{qdiv} stores the quotient in @code{O1} and the remainder in
 279 @code{O2}. It can be used as quick way to check if a division is
 280 exact, in which case the remainder is zero.
 281
 282 @example
 283 qmulr    _u       O1 O2 = O3 * O4
 284 qmuli    _u       O1 O2 = O3 * O4
 285 qdivr    _u       O1 O2 = O3 / O4
 286 qdivi    _u       O1 O2 = O3 / O4
 287 @end example
 288
 289 @item Unary ALU operations
 290 These accept two operands, both of which must be registers.
 291 @example
 292 negr         _f  _d  O1 = -O2
 293 comr                 O1 = ~O2
 294 clor                 O1 = number of leading one bits
 295 clzr                 O1 = number of leading zero bits
 296 ctor                 O1 = number of trailing one bits
 297 ctzr                 O1 = number of trailing zero bits
 298 @end example
 299
 300 Note that @code{ctzr} is basically equivalent of a @code{C} call
 301 @code{ffs} but indexed at bit zero, not one.
 302
 303 Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
 304 value of zero is not an error, it just returns the number of bits
 305 in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
 306 it returns 32.
 307
 308 The @code{clor} and @code{ctor} are just counterparts of the versions
 309 that search for zero bits.
 310
 311 These unary ALU operations are only defined for float operands.
 312 @example
 313 absr         _f  _d  O1 = fabs(O2)
 314 sqrtr        _f  _d  O1 = sqrt(O2)
 315 @end example
 316
 317 Besides requiring the @code{r} modifier, there are no unary operations
 318 with an immediate operand.
 319
 320 @item Compare instructions
 321 These accept three operands; again, the last can be an immediate.
 322 The last two operands are compared, and the first operand, that must be
 323 an integer register, is set to either 0 or 1, according to whether the
 324 given condition was met or not.
 325
 326 The conditions given below are for the standard behavior of C,
 327 where the ``unordered'' comparison result is mapped to false.
 328
 329 @example
 330 ltr       _u  _f  _d  O1 =  (O2 <  O3)
 331 lti       _u  _f  _d  O1 =  (O2 <  O3)
 332 ler       _u  _f  _d  O1 =  (O2 <= O3)
 333 lei       _u  _f  _d  O1 =  (O2 <= O3)
 334 gtr       _u  _f  _d  O1 =  (O2 >  O3)
 335 gti       _u  _f  _d  O1 =  (O2 >  O3)
 336 ger       _u  _f  _d  O1 =  (O2 >= O3)
 337 gei       _u  _f  _d  O1 =  (O2 >= O3)
 338 eqr           _f  _d  O1 =  (O2 == O3)
 339 eqi           _f  _d  O1 =  (O2 == O3)
 340 ner           _f  _d  O1 =  (O2 != O3)
 341 nei           _f  _d  O1 =  (O2 != O3)
 342 unltr         _f  _d  O1 = !(O2 >= O3)
 343 unler         _f  _d  O1 = !(O2 >  O3)
 344 ungtr         _f  _d  O1 = !(O2 <= O3)
 345 unger         _f  _d  O1 = !(O2 <  O3)
 346 uneqr         _f  _d  O1 = !(O2 <  O3) && !(O2 >  O3)
 347 ltgtr         _f  _d  O1 = !(O2 >= O3) || !(O2 <= O3)
 348 ordr          _f  _d  O1 =  (O2 == O2) &&  (O3 == O3)
 349 unordr        _f  _d  O1 =  (O2 != O2) ||  (O3 != O3)
 350 @end example
 351
 352 @item Transfer operations
 353 These accept two operands; for @code{ext} both of them must be
 354 registers, while @code{mov} accepts an immediate value as the second
 355 operand.
 356
 357 Unlike @code{movr} and @code{movi}, the other instructions are used
 358 to truncate a wordsize operand to a smaller integer data type or to
 359 convert float data types. You can also use @code{extr} to convert an
 360 integer to a floating point value: the usual options are @code{extr_f}
 361 and @code{extr_d}.
 362
 363 @example
 364 movr                                 _f  _d  O1 = O2
 365 movi                                 _f  _d  O1 = O2
 366 extr      _c  _uc  _s  _us  _i  _ui  _f  _d  O1 = O2
 367 truncr                               _f  _d  O1 = trunc(O2)
 368 @end example
 369
 370 In 64-bit architectures it may be required to use @code{truncr_f_i},
 371 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
 372 the equivalent C code.  Only the @code{_i} modifier is available in
 373 32-bit architectures.
 374
 375 @example
 376 truncr_f_i    = <int> O1 = <float> O2
 377 truncr_f_l    = <long>O1 = <float> O2
 378 truncr_d_i    = <int> O1 = <double>O2
 379 truncr_d_l    = <long>O1 = <double>O2
 380 @end example
 381
 382 The float conversion operations are @emph{destination first,
 383 source second}, but the order of the types is reversed.  This happens
 384 for historical reasons.
 385
 386 @example
 387 extr_f_d    = <double>O1 = <float> O2
 388 extr_d_f    = <float> O1 = <double>O2
 389 @end example
 390
 391 @item Network extensions
 392 These accept two operands, both of which must be registers; these
 393 two instructions actually perform the same task, yet they are
 394 assigned to two mnemonics for the sake of convenience and
 395 completeness.  As usual, the first operand is the destination and
 396 the second is the source.
 397 The @code{_ul} variant is only available in 64-bit architectures.
 398 @example
 399 htonr    _us _ui _ul @r{Host-to-network (big endian) order}
 400 ntohr    _us _ui _ul @r{Network-to-host order }
 401 @end example
 402
 403 @code{bswapr} can be used to unconditionally byte-swap an operand.
 404 On little-endian architectures, @code{htonr} and @code{ntohr} resolve
 405 to this.
 406 The @code{_ul} variant is only available in 64-bit architectures.
 407 @example
 408 bswapr    _us _ui _ul  01 = byte_swap(02)
 409 @end example
 410
 411 @item Load operations
 412 @code{ld} accepts two operands while @code{ldx} accepts three;
 413 in both cases, the last can be either a register or an immediate
 414 value. Values are extended (with or without sign, according to
 415 the data type specification) to fit a whole register.
 416 The @code{_ui} and @code{_l} types are only available in 64-bit
 417 architectures.  For convenience, there is a version without a
 418 type modifier for integer or pointer operands that uses the
 419 appropriate wordsize call.
 420 @example
 421 ldr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 422 ldi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *O2
 423 ldxr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 424 ldxi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d  O1 = *(O2+O3)
 425 @end example
 426
 427 @item Store operations
 428 @code{st} accepts two operands while @code{stx} accepts three; in
 429 both cases, the first can be either a register or an immediate
 430 value. Values are sign-extended to fit a whole register.
 431 @example
 432 str     _c       _s       _i       _l  _f  _d  *O1 = O2
 433 sti     _c       _s       _i       _l  _f  _d  *O1 = O2
 434 stxr    _c       _s       _i       _l  _f  _d  *(O1+O2) = O3
 435 stxi    _c       _s       _i       _l  _f  _d  *(O1+O2) = O3
 436 @end example
 437 Note that the unsigned type modifier is not available, as the store
 438 only writes to the 1, 2, 4 or 8 sized memory address.
 439 The @code{_l} type is only available in 64-bit architectures, and for
 440 convenience, there is a version without a type modifier for integer or
 441 pointer operands that uses the appropriate wordsize call.
 442
 443 @item Argument management
 444 These are:
 445 @example
 446 prepare     (not specified)
 447 va_start    (not specified)
 448 pushargr    _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 449 pushargi    _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 450 va_push     (not specified)
 451 arg         _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 452 getarg      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 453 va_arg                                         _d
 454 putargr     _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 455 putargi     _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 456 ret         (not specified)
 457 retr        _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 458 reti        _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 459 reti                                       _f  _d
 460 va_end      (not specified)
 461 retval      _c  _uc  _s  _us  _i  _ui  _l  _f  _d
 462 epilog      (not specified)
 463 @end example
 464 As with other operations that use a type modifier, the @code{_ui} and
 465 @code{_l} types are only available in 64-bit architectures, but there
 466 are operations without a type modifier that alias to the appropriate
 467 integer operation with wordsize operands.
 468
 469 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
 470 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
 471 A code snippet that wants to call another procedure and has to pass
 472 arguments must, in order: use the @code{prepare} instruction and use
 473 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
 474 left to right order}; and use @code{finish} or @code{call} (explained below)
 475 to perform the actual call.
 476
 477 Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
 478 handling integer types can be used without a type modifier.
 479 It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
 480 and @code{getarg} otherwise problems will happen if generating jit for
 481 environments that require arguments to be truncated and zero or sign
 482 extended by the caller and/or excess arguments might be passed packed
 483 in the stack. Currently only Apple systems with @code{aarch64} cpus are
 484 known to have this restriction.
 485
 486 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
 487 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
 488 @code{va_push} is required when passing a @code{va_list} to another function,
 489 because not all architectures expect it as a single pointer. Known case
 490 is DEC Alpha, that requires it as a structure passed by value.
 491
 492 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
 493 @code{arg} is different from other instruction in that it does not
 494 actually generate any code: instead, it is a function which returns
 495 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
 496 a value'' means that @lightning{} code that compile these
 497 instructions return a value when expanded.} You should call
 498 @code{arg} as soon as possible, before any function call or, more
 499 easily, right after the @code{prolog} instructions
 500 (which is treated later).
 501
 502 @code{getarg} accepts a register argument and a value returned by
 503 @code{arg}, and will move that argument to the register, extending
 504 it (with or without sign, according to the data type specification)
 505 to fit a whole register.  These instructions are more intimately
 506 related to the usage of the @lightning{} instruction set in code
 507 that generates other code, so they will be treated more
 508 specifically in @ref{GNU lightning examples, , Generating code at
 509 run-time}.
 510
 511 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
 512 it accepts as first argument a register or immediate, and as
 513 second argument a value returned by @code{arg}. It allows changing,
 514 or restoring an argument to the current function, and is a
 515 construct required to implement tail call optimization. Note that
 516 arguments in registers are very cheap, but will be overwritten
 517 at any moment, including on some operations, for example division,
 518 that on several ports is implemented as a function call.
 519
 520 Finally, the @code{retval} instruction fetches the return value of a
 521 called function in a register.  The @code{retval} instruction takes a
 522 register argument and copies the return value of the previously called
 523 function in that register.  A function with a return value should use
 524 @code{retr} or @code{reti} to put the return value in the return register
 525 before returning.  @xref{Fibonacci, the Fibonacci numbers}, for an example.
 526
 527 @code{epilog} is an optional call, that marks the end of a function
 528 body. It is automatically generated by @lightning{} if starting a new
 529 function (what should be done after a @code{ret} call) or finishing
 530 generating jit.
 531 It is very important to note that the fact that @code{epilog} being
 532 optional may cause a common mistake. Consider this:
 533 @example
 534 fun1:
 535     prolog
 536     ...
 537     ret
 538 fun2:
 539     prolog
 540 @end example
 541 Because @code{epilog} is added when finding a new @code{prolog},
 542 this will cause the @code{fun2} label to actually be before the
 543 return from @code{fun1}. Because @lightning{} will actually
 544 understand it as:
 545 @example
 546 fun1:
 547     prolog
 548     ...
 549     ret
 550 fun2:
 551     epilog
 552     prolog
 553 @end example
 554
 555 You should observe a few rules when using these macros.  First of
 556 all, if calling a varargs function, you should use the @code{ellipsis}
 557 call to mark the position of the ellipsis in the C prototype.
 558
 559 You should not nest calls to @code{prepare} inside a
 560 @code{prepare/finish} block.  Doing this will result in undefined
 561 behavior. Note that for functions with zero arguments you can use
 562 just @code{call}.
 563
 564 @item Branch instructions
 565 Like @code{arg}, these also return a value which, in this case,
 566 is to be used to compile forward branches as explained in
 567 @ref{Fibonacci, , Fibonacci numbers}.  They accept two operands to be
 568 compared; of these, the last can be either a register or an immediate.
 569 They are:
 570 @example
 571 bltr      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 572 blti      _u  _f  _d  @r{if }(O2 <  O3)@r{ goto }O1
 573 bler      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 574 blei      _u  _f  _d  @r{if }(O2 <= O3)@r{ goto }O1
 575 bgtr      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 576 bgti      _u  _f  _d  @r{if }(O2 >  O3)@r{ goto }O1
 577 bger      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 578 bgei      _u  _f  _d  @r{if }(O2 >= O3)@r{ goto }O1
 579 beqr          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 580 beqi          _f  _d  @r{if }(O2 == O3)@r{ goto }O1
 581 bner          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 582 bnei          _f  _d  @r{if }(O2 != O3)@r{ goto }O1
 583
 584 bunltr        _f  _d  @r{if }!(O2 >= O3)@r{ goto }O1
 585 bunler        _f  _d  @r{if }!(O2 >  O3)@r{ goto }O1
 586 bungtr        _f  _d  @r{if }!(O2 <= O3)@r{ goto }O1
 587 bunger        _f  _d  @r{if }!(O2 <  O3)@r{ goto }O1
 588 buneqr        _f  _d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
 589 bltgtr        _f  _d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
 590 bordr         _f  _d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
 591 bunordr       _f  _d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1
 592
 593 bmsr                  @r{if }O2 &  O3@r{ goto }O1
 594 bmsi                  @r{if }O2 &  O3@r{ goto }O1
 595 bmcr                  @r{if }!(O2 & O3)@r{ goto }O1
 596 bmci                  @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
 597 boaddr    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 598 boaddi    _u          O2 += O3@r{, goto }O1@r{ if overflow}
 599 bxaddr    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 600 bxaddi    _u          O2 += O3@r{, goto }O1@r{ if no overflow}
 601 bosubr    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 602 bosubi    _u          O2 -= O3@r{, goto }O1@r{ if overflow}
 603 bxsubr    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 604 bxsubi    _u          O2 -= O3@r{, goto }O1@r{ if no overflow}
 605 @end example
 606
 607 Note that the @code{C} code does not have an @code{O1} argument. It is
 608 required to always use the return value as an argument to @code{patch},
 609 @code{patch_at} or @code{patch_abs}.
 610
 611 @item Jump and return operations
 612 These accept one argument except @code{ret} and @code{jmpi} which
 613 have none; the difference between @code{finishi} and @code{calli}
 614 is that the latter does not clean the stack from pushed parameters
 615 (if any) and the former must @strong{always} follow a @code{prepare}
 616 instruction.
 617 @example
 618 callr     (not specified)                @r{function call to register O1}
 619 calli     (not specified)                @r{function call to immediate O1}
 620 finishr   (not specified)                @r{function call to register O1}
 621 finishi   (not specified)                @r{function call to immediate O1}
 622 jmpr      (not specified)                @r{unconditional jump to register}
 623 jmpi      (not specified)                @r{unconditional jump}
 624 ret       (not specified)                @r{return from subroutine}
 625 retr      _c _uc _s _us _i _ui _l _f _d
 626 reti      _c _uc _s _us _i _ui _l _f _d
 627 retval    _c _uc _s _us _i _ui _l _f _d  @r{move return value}
 628                                          @r{to register}
 629 @end example
 630
 631 Like branch instruction, @code{jmpi} also returns a value which is to
 632 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
 633 numbers}.
 634
 635 @item Labels
 636 There are 3 @lightning{} instructions to create labels:
 637 @example
 638 label     (not specified)                @r{simple label}
 639 forward   (not specified)                @r{forward label}
 640 indirect  (not specified)                @r{special simple label}
 641 @end example
 642
 643 The following instruction is used to specify a minimal alignment for
 644 the next instruction, usually with a label:
 645 @example
 646 align     (not specified)                @r{align code}
 647 @end example
 648
 649 Similar to @code{align} is the next instruction, also usually used with
 650 a label:
 651 @example
 652 skip      (not specified)                @r{skip code}
 653 @end example
 654 It is used to specify a minimal number of bytes of nops to be inserted
 655 before the next instruction.
 656
 657 @code{label} is normally used as @code{patch_at} argument for backward
 658 jumps.
 659
 660 @example
 661         jit_node_t *jump, *label;
 662 label = jit_label();
 663         ...
 664         jump = jit_beqr(JIT_R0, JIT_R1);
 665         jit_patch_at(jump, label);
 666 @end example
 667
 668 @code{forward} is used to patch code generation before the actual
 669 position of the label is known.
 670
 671 @example
 672         jit_node_t *jump, *label;
 673 label = jit_forward();
 674         jump = jit_beqr(JIT_R0, JIT_R1);
 675         jit_patch_at(jump, label);
 676         ...
 677         jit_link(label);
 678 @end example
 679
 680 @code{indirect} is useful when creating jump tables, and tells
 681 @lightning{} to not optimize out a label that is not the target of
 682 any jump, because an indirect jump may land where it is defined.
 683
 684 @example
 685         jit_node_t *jump, *label;
 686         ...
 687         jmpr(JIT_R0);                    @rem{/* may jump to label */}
 688         ...
 689 label = jit_indirect();
 690 @end example
 691
 692 @code{indirect} is an special case of @code{note} and @code{name}
 693 because it is a valid argument to @code{address}.
 694
 695 Note that the usual idiom to write the previous example is
 696 @example
 697         jit_node_t *addr, *jump;
 698 addr  = jit_movi(JIT_R0, 0);             @rem{/* immediate is ignored */}
 699         ...
 700         jmpr(JIT_R0);
 701         ...
 702         jit_patch(addr);                 @rem{/* implicit label added */}
 703 @end example
 704
 705 that automatically binds the implicit label added by @code{patch} with
 706 the @code{movi}, but on some special conditions it is required to create
 707 an "unbound" label.
 708
 709 @code{align} is useful for creating multiple entry points to a
 710 (trampoline) function that are all accessible through a single
 711 function pointer.  @code{align} receives an integer argument that
 712 defines the minimal alignment of the address of a label directly
 713 following the @code{align} instruction.  The integer argument must be
 714 a power of two and the effective alignment will be a power of two no
 715 less than the argument to @code{align}.  If the argument to
 716 @code{align} is 16 or more, the effective alignment will match the
 717 specified minimal alignment exactly.
 718
 719 @example
 720           jit_node_t *forward, *label1, *label2, *jump;
 721           unsigned char *addr1, *addr2;
 722 forward = jit_forward();
 723           jit_align(16);
 724 label1  = jit_indirect();                @rem{/* first entry point */}
 725 jump    = jit_jmpi();                    @rem{/* jump to first handler */}
 726           jit_patch_at(jump, forward);
 727           jit_align(16);
 728 label2  = jit_indirect();                @rem{/* second entry point */}
 729           ...                            @rem{/* second handler */}
 730           jit_jmpr(...);
 731           jit_link(forward);
 732           ...                            @rem{/* first handler /*}
 733           jit_jmpr(...);
 734           ...
 735           jit_emit();
 736           addr1 = jit_address(label1);
 737           addr2 = jit_address(label2);
 738           assert(addr2 - addr1 == 16);   @rem{/* only one of the addresses needs to be remembered */}
 739 @end example
 740
 741 @code{skip} is useful for reserving space in the code buffer that can
 742 later be filled (possibly with the help of the pair of functions
 743 @code{jit_unprotect} and @code{jit_protect}).
 744
 745 @item Function prolog
 746
 747 These macros are used to set up a function prolog.  The @code{allocai}
 748 call accept a single integer argument and returns an offset value
 749 for stack storage access.  The @code{allocar} accepts two registers
 750 arguments, the first is set to the offset for stack access, and the
 751 second is the size in bytes argument.
 752
 753 @example
 754 prolog    (not specified)                @r{function prolog}
 755 allocai   (not specified)                @r{reserve space on the stack}
 756 allocar   (not specified)                @r{allocate space on the stack}
 757 @end example
 758
 759 @code{allocai} receives the number of bytes to allocate and returns
 760 the offset from the frame pointer register @code{FP} to the base of
 761 the area.
 762
 763 @code{allocar} receives two register arguments.  The first is where
 764 to store the offset from the frame pointer register @code{FP} to the
 765 base of the area.  The second argument is the size in bytes.  Note
 766 that @code{allocar} is dynamic allocation, and special attention
 767 should be taken when using it.  If called in a loop, every iteration
 768 will allocate stack space.  Stack space is aligned from 8 to 64 bytes
 769 depending on backend requirements, even if allocating only one byte.
 770 It is advisable to not use it with @code{frame} and @code{tramp}; it
 771 should work with @code{frame} with special care to call only once,
 772 but is not supported if used in @code{tramp}, even if called only
 773 once.
 774
 775 As a small appetizer, here is a small function that adds 1 to the input
 776 parameter (an @code{int}).  I'm using an assembly-like syntax here which
 777 is a bit different from the one used when writing real subroutines with
 778 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
 779 examples, , Generating code at run-time}.
 780
 781 @example
 782 incr:
 783      prolog
 784 in = arg                     @rem{! We have an integer argument}
 785      getarg    R0, in        @rem{! Move it to R0}
 786      addi      R0, R0, 1     @rem{! Add 1}
 787      retr      R0            @rem{! And return the result}
 788 @end example
 789
 790 And here is another function which uses the @code{printf} function from
 791 the standard C library to write a number in hexadecimal notation:
 792
 793 @example
 794 printhex:
 795      prolog
 796 in = arg                     @rem{! Same as above}
 797      getarg    R0, in
 798      prepare                 @rem{! Begin call sequence for printf}
 799      pushargi  "%x"          @rem{! Push format string}
 800      ellipsis                @rem{! Varargs start here}
 801      pushargr  R0            @rem{! Push second argument}
 802      finishi   printf        @rem{! Call printf}
 803      ret                     @rem{! Return to caller}
 804 @end example
 805
 806 @item Register liveness
 807
 808 During code generation, @lightning{} occasionally needs scratch registers
 809 or needs to use architecture-defined registers.  For that, @lightning{}
 810 internally maintains register liveness information.
 811
 812 In the following example, @code{qdivr} will need special registers like
 813 @code{R0} on some architectures.  As @lightning{} understands that
 814 @code{R0} is used in the subsequent instruction, it will create
 815 save/restore code for @code{R0} in case.
 816
 817 @example
 818 ...
 819 qdivr V0, V1, V2, V3
 820 movr  V3, R0
 821 ...
 822 @end example
 823
 824 The same is not true in the example that follows.  Here, @code{R0} is
 825 not alive after the division operation because @code{R0} is neither an
 826 argument register nor a callee-save register.  Thus, no save/restore
 827 code for @code{R0} will be created in case.
 828
 829 @example
 830 ...
 831 qdivr V0, V1, V2, V3
 832 jmpr  R1
 833 ...
 834 @end example
 835
 836 The @code{live} instruction can be used to mark a register as live after
 837 it as in the following example.  Here, @code{R0} will be preserved
 838 across the division.
 839
 840 @example
 841 ...
 842 qdivr V0, V1, V2, V3
 843 live R0
 844 jmpr R1
 845 ...
 846 @end example
 847
 848 The @code{live} instruction is useful at code entry and exit points,
 849 like after and before a @code{callr} instruction.
 850
 851 @item Trampolines, continuations and tail call optimization
 852
 853 Frequently it is required to generate jit code that must jump to
 854 code generated later, possibly from another @code{jit_context_t}.
 855 These require compatible stack frames.
 856
 857 @lightning{} provides two primitives from where trampolines,
 858 continuations and tail call optimization can be implemented.
 859
 860 @example
 861 frame   (not specified)                  @r{create stack frame}
 862 tramp   (not specified)                  @r{assume stack frame}
 863 @end example
 864
 865 @code{frame} receives an integer argument@footnote{It is not
 866 automatically computed because it does not know about the
 867 requirement of later generated code.} that defines the size in
 868 bytes for the stack frame of the current, @code{C} callable,
 869 jit function. To calculate this value, a good formula is maximum
 870 number of arguments to any called native function times
 871 eight@footnote{Times eight so that it works for double arguments.
 872 And would not need conditionals for ports that pass arguments in
 873 the stack.}, plus the sum of the arguments to any call to
 874 @code{jit_allocai}. @lightning{} automatically adjusts this value
 875 for any backend specific stack memory it may need, or any
 876 alignment constraint.
 877
 878 @code{frame} also instructs @lightning{} to save all callee
 879 save registers in the prolog and reload in the epilog.
 880
 881 @example
 882 main:                        @rem{! jit entry point}
 883      prolog                  @rem{! function prolog}
 884      frame  256              @rem{! save all callee save registers and}
 885                              @rem{! reserve at least 256 bytes in stack}
 886 main_loop:
 887      ...
 888      jmpi   handler          @rem{! jumps to external code}
 889      ...
 890      ret                     @rem{! return to the caller}
 891 @end example
 892
 893 @code{tramp} differs from @code{frame} only that a prolog and epilog
 894 will not be generated. Note that @code{prolog} must still be used.
 895 The code under @code{tramp} must be ready to be entered with a jump
 896 at the prolog position, and instead of a return, it must end with
 897 a non conditional jump. @code{tramp} exists solely for the fact
 898 that it allows optimizing out prolog and epilog code that would
 899 never be executed.
 900
 901 @example
 902 handler:                     @rem{! handler entry point}
 903      prolog                  @rem{! function prolog}
 904      tramp  256              @rem{! assumes all callee save registers}
 905                              @rem{! are saved and there is at least}
 906                              @rem{! 256 bytes in stack}
 907      ...
 908      jmpi   main_loop        @rem{! return to the main loop}
 909 @end example
 910
 911 @lightning{} only supports Tail Call Optimization using the
 912 @code{tramp} construct. Any other way is not guaranteed to
 913 work on all ports.
 914
 915 An example of a simple (recursive) tail call optimization:
 916
 917 @example
 918 factorial:                   @rem{! Entry point of the factorial function}
 919      prolog
 920 in = arg                     @rem{! Receive an integer argument}
 921      getarg R0, in           @rem{! Move argument to RO}
 922      prepare
 923          pushargi 1          @rem{! This is the accumulator}
 924          pushargr R0         @rem{! This is the argument}
 925      finishi fact            @rem{! Call the tail call optimized function}
 926      retval R0               @rem{! Fetch the result}
 927      retr R0                 @rem{! Return it}
 928      epilog                  @rem{! Epilog *before* label before prolog}
 929
 930 fact:                        @rem{! Entry point of the helper function}
 931      prolog
 932      frame 16                @rem{! Reserve 16 bytes in the stack}
 933 fact_entry:                  @rem{! This is the tail call entry point}
 934 ac = arg                     @rem{! The accumulator is the first argument}
 935 in = arg                     @rem{! The factorial argument}
 936      getarg R0, ac           @rem{! Move the accumulator to R0}
 937      getarg R1, in           @rem{! Move the argument to R1}
 938      blei fact_out, R1, 1    @rem{! Done if argument is one or less}
 939      mulr R0, R0, R1         @rem{! accumulator *= argument}
 940      putargr R0, ac          @rem{! Update the accumulator}
 941      subi R1, R1, 1          @rem{! argument -= 1}
 942      putargr R1, in          @rem{! Update the argument}
 943      jmpi fact_entry         @rem{! Tail Call Optimize it!}
 944 fact_out:
 945      retr R0                 @rem{! Return the accumulator}
 946 @end example
 947
 948 @item Predicates
 949 @example
 950 forward_p      (not specified)           @r{forward label predicate}
 951 indirect_p     (not specified)           @r{indirect label predicate}
 952 target_p       (not specified)           @r{used label predicate}
 953 arg_register_p (not specified)           @r{argument kind predicate}
 954 callee_save_p  (not specified)           @r{callee save predicate}
 955 pointer_p      (not specified)           @r{pointer predicate}
 956 @end example
 957
 958 @code{forward_p} expects a @code{jit_node_t*} argument, and
 959 returns non zero if it is a forward label reference, that is,
 960 a label returned by @code{forward}, that still needs a
 961 @code{link} call.
 962
 963 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
 964 non zero if it is an indirect label reference, that is, a label that
 965 was returned by @code{indirect}.
 966
 967 @code{target_p} expects a @code{jit_node_t*} argument, that is any
 968 kind of label, and will return non zero if there is at least one
 969 jump or move referencing it.
 970
 971 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
 972 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
 973 will return non zero if the argument lives in a register. This call
 974 is useful to know the live range of register arguments, as those
 975 are very fast to read and write, but have volatile values.
 976
 977 @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
 978 @code{JIT_Fn}, and will return non zero if the register is callee
 979 save. This call is useful because on several ports, the @code{JIT_Rn}
 980 and @code{JIT_Fn} registers are actually callee save; no need
 981 to save and load the values when making function calls.
 982
 983 @code{pointer_p} expects a pointer argument, and will return non
 984 zero if the pointer is inside the generated jit code. Must be
 985 called after @code{jit_emit} and before @code{jit_destroy_state}.
 986
 987 @item Atomic operations
 988 Only compare-and-swap is implemented. It accepts four operands;
 989 the second can be an immediate.
 990
 991 The first argument is set with a boolean value telling if the operation
 992 did succeed.
 993
 994 Arguments must be different, cannot use the result register to also pass
 995 an argument.
 996
 997 The second argument is the address of a machine word.
 998
 999 The third argument is the old value.
1000
1001 The fourth argument is the new value.
1002
1003 @example
1004 casr                                  01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1005 casi                                  01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1006 @end example
1007
1008 If value at the address in the second argument is equal to the third
1009 argument, the address value is atomically modified to the value of the
1010 fourth argument and the first argument is set to a non zero value.
1011
1012 If the value at the address in the second argument is not equal to the
1013 third argument nothing is done and the first argument is set to zero.
1014 @end table
1015
1016 @node GNU lightning examples
1017 @chapter Generating code at run-time
1018
1019 To use @lightning{}, you should include the @file{lightning.h} file that
1020 is put in your include directory by the @samp{make install} command.
1021
1022 Each of the instructions above translates to a macro or function call.
1023 All you have to do is prepend @code{jit_} (lowercase) to opcode names
1024 and @code{JIT_} (uppercase) to register names.  Of course, parameters
1025 are to be put between parentheses.
1026
1027 This small tutorial presents three examples:
1028
1029 @iftex
1030 @itemize @bullet
1031 @item
1032 The @code{incr} function found in @ref{The instruction set, ,
1033 @lightning{}'s instruction set}:
1034
1035 @item
1036 A simple function call to @code{printf}
1037
1038 @item
1039 An RPN calculator.
1040
1041 @item
1042 Fibonacci numbers
1043 @end itemize
1044 @end iftex
1045 @ifnottex
1046 @menu
1047 * incr::             A function which increments a number by one
1048 * printf::           A simple function call to printf
1049 * RPN calculator::   A more complex example, an RPN calculator
1050 * Fibonacci::        Calculating Fibonacci numbers
1051 @end menu
1052 @end ifnottex
1053
1054 @node incr
1055 @section A function which increments a number by one
1056
1057 Let's see how to create and use the sample @code{incr} function created
1058 in @ref{The instruction set, , @lightning{}'s instruction set}:
1059
1060 @example
1061 #include <stdio.h>
1062 #include <lightning.h>
1063
1064 static jit_state_t *_jit;
1065
1066 typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}
1067
1068 int main(int argc, char *argv[])
1069 @{
1070   jit_node_t  *in;
1071   pifi         incr;
1072
1073   init_jit(argv[0]);
1074   _jit = jit_new_state();
1075
1076   jit_prolog();                    @rem{/* @t{     prolog             } */}
1077   in = jit_arg();                  @rem{/* @t{     in = arg           } */}
1078   jit_getarg(JIT_R0, in);          @rem{/* @t{     getarg R0          } */}
1079   jit_addi(JIT_R0, JIT_R0, 1);     @rem{/* @t{     addi   R0@comma{} R0@comma{} 1   } */}
1080   jit_retr(JIT_R0);                @rem{/* @t{     retr   R0          } */}
1081
1082   incr = jit_emit();
1083   jit_clear_state();
1084
1085   @rem{/* call the generated code@comma{} passing 5 as an argument */}
1086   printf("%d + 1 = %d\n", 5, incr(5));
1087
1088   jit_destroy_state();
1089   finish_jit();
1090   return 0;
1091 @}
1092 @end example
1093
1094 Let's examine the code line by line (well, almost@dots{}):
1095
1096 @table @t
1097 @item #include <lightning.h>
1098 You already know about this.  It defines all of @lightning{}'s macros.
1099
1100 @item static jit_state_t *_jit;
1101 You might wonder about what is @code{jit_state_t}.  It is a structure
1102 that stores jit code generation information.  The name @code{_jit} is
1103 special, because since multiple jit generators can run at the same
1104 time, you must either @r{#define _jit my_jit_state} or name it
1105 @code{_jit}.
1106
1107 @item typedef int (*pifi)(int);
1108 Just a handy typedef for a pointer to a function that takes an
1109 @code{int} and returns another.
1110
1111 @item jit_node_t  *in;
1112 Declares a variable to hold an identifier for a function argument. It
1113 is an opaque pointer, that will hold the return of a call to @code{arg}
1114 and be used as argument to @code{getarg}.
1115
1116 @item pifi         incr;
1117 Declares a function pointer variable to a function that receives an
1118 @code{int} and returns an @code{int}.
1119
1120 @item init_jit(argv[0]);
1121 You must call this function before creating a @code{jit_state_t}
1122 object. This function does global state initialization, and may need
1123 to detect CPU or Operating System features.  It receives a string
1124 argument that is later used to read symbols from a shared object using
1125 GNU binutils if disassembly was enabled at configure time. If no
1126 disassembly will be performed a NULL pointer can be used as argument.
1127
1128 @item _jit = jit_new_state();
1129 This call initializes a @lightning{} jit state.
1130
1131 @item jit_prolog();
1132 Ok, so we start generating code for our beloved function@dots{}
1133
1134 @item in = jit_arg();
1135 @itemx jit_getarg(JIT_R0, in);
1136 We retrieve the first (and only) argument, an integer, and store it
1137 into the general-purpose register @code{R0}.
1138
1139 @item jit_addi(JIT_R0, JIT_R0, 1);
1140 We add one to the content of the register.
1141
1142 @item jit_retr(JIT_R0);
1143 This instruction generates a standard function epilog that returns
1144 the contents of the @code{R0} register.
1145
1146 @item incr = jit_emit();
1147 This instruction is very important.  It actually translates the
1148 @lightning{} macros used before to machine code, flushes the generated
1149 code area out of the processor's instruction cache and return a
1150 pointer to the start of the code.
1151
1152 @item jit_clear_state();
1153 This call cleanups any data not required for jit execution. Note
1154 that it must be called after any call to @code{jit_print} or
1155 @code{jit_address}, as this call destroy the @lightning{}
1156 intermediate representation.
1157
1158 @item printf("%d + 1 = %d", 5, incr(5));
1159 Calling our function is this simple---it is not distinguishable from
1160 a normal C function call, the only difference being that @code{incr}
1161 is a variable.
1162
1163 @item jit_destroy_state();
1164 Releases all memory associated with the jit context. It should be
1165 called after known the jit will no longer be called.
1166
1167 @item finish_jit();
1168 This call cleanups any global state hold by @lightning{}, and is
1169 advisable to call it once jit code will no longer be generated.
1170 @end table
1171
1172 @lightning{} abstracts two phases of dynamic code generation: selecting
1173 instructions that map the standard representation, and emitting binary
1174 code for these instructions.  The client program has the responsibility
1175 of describing the code to be generated using the standard @lightning{}
1176 instruction set.
1177
1178 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1179 architecture (on the right is the code that an assembly-language
1180 programmer would write):
1181
1182 @table @b
1183 @item SPARC
1184 @example
1185       save  %sp, -112, %sp
1186       mov  %i0, %g2                 retl
1187       inc  %g2                      inc %o0
1188       mov  %g2, %i0
1189       restore
1190       retl
1191       nop
1192 @end example
1193 In this case, @lightning{} introduces overhead to create a register
1194 window (not knowing that the procedure is a leaf procedure) and to
1195 move the argument to the general purpose register @code{R0} (which
1196 maps to @code{%g2} on the SPARC).
1197 @end table
1198
1199 @table @b
1200 @item x86_64
1201 @example
1202     mov   %rdi,%rax
1203     add   $0x1,%rax
1204     ret
1205 @end example
1206 In this case, for the x86 port, @lightning{} has simple optimizations
1207 to understand it is a leaf function, and that it is not required to
1208 create a stack frame nor update the stack pointer.
1209 @end table
1210
1211 @node printf
1212 @section A simple function call to @code{printf}
1213
1214 Again, here is the code for the example:
1215
1216 @example
1217 #include <stdio.h>
1218 #include <lightning.h>
1219
1220 static jit_state_t *_jit;
1221
1222 typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}
1223
1224 int main(int argc, char *argv[])
1225 @{
1226   pvfi          myFunction;             @rem{/* ptr to generated code */}
1227   jit_node_t    *start, *end;           @rem{/* a couple of labels */}
1228   jit_node_t    *in;                    @rem{/* to get the argument */}
1229
1230   init_jit(argv[0]);
1231   _jit = jit_new_state();
1232
1233   start = jit_note(__FILE__, __LINE__);
1234   jit_prolog();
1235   in = jit_arg();
1236   jit_getarg(JIT_R1, in);
1237   jit_prepare();
1238   jit_pushargi((jit_word_t)"generated %d bytes\n");
1239   jit_ellipsis();
1240   jit_pushargr(JIT_R1);
1241   jit_finishi(printf);
1242   jit_ret();
1243   jit_epilog();
1244   end = jit_note(__FILE__, __LINE__);
1245
1246   myFunction = jit_emit();
1247
1248   @rem{/* call the generated code@comma{} passing its size as argument */}
1249   myFunction((char*)jit_address(end) - (char*)jit_address(start));
1250   jit_clear_state();
1251
1252   jit_disassemble();
1253
1254   jit_destroy_state();
1255   finish_jit();
1256   return 0;
1257 @}
1258 @end example
1259
1260 The function shows how many bytes were generated.  Most of the code
1261 is not very interesting, as it resembles very closely the program
1262 presented in @ref{incr, , A function which increments a number by one}.
1263
1264 For this reason, we're going to concentrate on just a few statements.
1265
1266 @table @t
1267 @item start = jit_note(__FILE__, __LINE__);
1268 @itemx @r{@dots{}}
1269 @itemx end = jit_note(__FILE__, __LINE__);
1270 These two instruction call the @code{jit_note} macro, which creates
1271 a note in the jit code; arguments to @code{jit_note} usually are a
1272 filename string and line number integer, but using NULL for the
1273 string argument is perfectly valid if only need to create a simple
1274 marker in the code.
1275
1276 @item jit_ellipsis();
1277 @code{ellipsis} usually is only required if calling varargs functions
1278 with double arguments, but it is a good practice to properly describe
1279 the @r{@dots{}} in the call sequence.
1280
1281 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1282 Note the use of the @code{(jit_word_t)} cast, that is used only
1283 to avoid a compiler warning, due to using a pointer where a
1284 wordsize integer type was expected.
1285
1286 @item jit_prepare();
1287 @itemx @r{@dots{}}
1288 @itemx jit_finishi(printf);
1289 Once the arguments to @code{printf} have been pushed, what means
1290 moving them to stack or register arguments, the @code{printf}
1291 function is called and the stack cleaned.  Note how @lightning{}
1292 abstracts the differences between different architectures and
1293 ABI's -- the client program does not know how parameter passing
1294 works on the host architecture.
1295
1296 @item jit_epilog();
1297 Usually it is not required to call @code{epilog}, but because it
1298 is implicitly called when noticing the end of a function, if the
1299 @code{end} variable was set with a @code{note} call after the
1300 @code{ret}, it would not consider the function epilog.
1301
1302 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1303 This calls the generate jit function passing as argument the offset
1304 difference from the @code{start} and @code{end} notes. The @code{address}
1305 call must be done after the @code{emit} call or either a fatal error
1306 will happen (if @lightning{} is built with assertions enable) or an
1307 undefined value will be returned.
1308
1309 @item jit_clear_state();
1310 Note that @code{jit_clear_state} was called after executing jit in
1311 this example. It was done because it must be called after any call
1312 to @code{jit_address} or @code{jit_print}.
1313
1314 @item jit_disassemble();
1315 @code{disassemble} will dump the generated code to standard output,
1316 unless @lightning{} was built with the disassembler disabled, in which
1317 case no output will be shown.
1318 @end table
1319
1320 @node RPN calculator
1321 @section A more complex example, an RPN calculator
1322
1323 We create a small stack-based RPN calculator which applies a series
1324 of operators to a given parameter and to other numeric operands.
1325 Unlike previous examples, the code generator is fully parameterized
1326 and is able to compile different formulas to different functions.
1327 Here is the code for the expression compiler; a sample usage will
1328 follow.
1329
1330 Since @lightning{} does not provide push/pop instruction, this
1331 example uses a stack-allocated area to store the data.  Such an
1332 area can be allocated using the macro @code{allocai}, which
1333 receives the number of bytes to allocate and returns the offset
1334 from the frame pointer register @code{FP} to the base of the
1335 area.
1336
1337 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1338 to access stack-allocated variables.  However, it is possible to
1339 use operations such as @code{add} to compute the address of the
1340 variables, and pass the address around.
1341
1342 @example
1343 #include <stdio.h>
1344 #include <lightning.h>
1345
1346 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1347
1348 static jit_state_t *_jit;
1349
1350 void stack_push(int reg, int *sp)
1351 @{
1352   jit_stxi_i (*sp, JIT_FP, reg);
1353   *sp += sizeof (int);
1354 @}
1355
1356 void stack_pop(int reg, int *sp)
1357 @{
1358   *sp -= sizeof (int);
1359   jit_ldxi_i (reg, JIT_FP, *sp);
1360 @}
1361
1362 jit_node_t *compile_rpn(char *expr)
1363 @{
1364   jit_node_t *in, *fn;
1365   int stack_base, stack_ptr;
1366
1367   fn = jit_note(NULL, 0);
1368   jit_prolog();
1369   in = jit_arg();
1370   stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1371
1372   jit_getarg(JIT_R2, in);
1373
1374   while (*expr) @{
1375     char buf[32];
1376     int n;
1377     if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1378       expr += n - 1;
1379       stack_push(JIT_R0, &stack_ptr);
1380       jit_movi(JIT_R0, atoi(buf));
1381     @} else if (*expr == 'x') @{
1382       stack_push(JIT_R0, &stack_ptr);
1383       jit_movr(JIT_R0, JIT_R2);
1384     @} else if (*expr == '+') @{
1385       stack_pop(JIT_R1, &stack_ptr);
1386       jit_addr(JIT_R0, JIT_R1, JIT_R0);
1387     @} else if (*expr == '-') @{
1388       stack_pop(JIT_R1, &stack_ptr);
1389       jit_subr(JIT_R0, JIT_R1, JIT_R0);
1390     @} else if (*expr == '*') @{
1391       stack_pop(JIT_R1, &stack_ptr);
1392       jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1393     @} else if (*expr == '/') @{
1394       stack_pop(JIT_R1, &stack_ptr);
1395       jit_divr(JIT_R0, JIT_R1, JIT_R0);
1396     @} else @{
1397       fprintf(stderr, "cannot compile: %s\n", expr);
1398       abort();
1399     @}
1400     ++expr;
1401   @}
1402   jit_retr(JIT_R0);
1403   jit_epilog();
1404   return fn;
1405 @}
1406 @end example
1407
1408 The principle on which the calculator is based is easy: the stack top
1409 is held in R0, while the remaining items of the stack are held in the
1410 memory area that we allocate with @code{allocai}.  Compiling a numeric
1411 operand or the argument @code{x} pushes the old stack top onto the
1412 stack and moves the operand into R0; compiling an operator pops the
1413 second operand off the stack into R1, and compiles the operation so
1414 that the result goes into R0, thus becoming the new stack top.
1415
1416 This example allocates a fixed area for 32 @code{int}s.  This is not
1417 a problem when the function is a leaf like in this case; in a full-blown
1418 compiler you will want to analyze the input and determine the number
1419 of needed stack slots---a very simple example of register allocation.
1420 The area is then managed like a stack using @code{stack_push} and
1421 @code{stack_pop}.
1422
1423 Source code for the client (which lies in the same source file) follows:
1424
1425 @example
1426 int main(int argc, char *argv[])
1427 @{
1428   jit_node_t *nc, *nf;
1429   pifi c2f, f2c;
1430   int i;
1431
1432   init_jit(argv[0]);
1433   _jit = jit_new_state();
1434
1435   nc = compile_rpn("32x9*5/+");
1436   nf = compile_rpn("x32-5*9/");
1437   (void)jit_emit();
1438   c2f = (pifi)jit_address(nc);
1439   f2c = (pifi)jit_address(nf);
1440   jit_clear_state();
1441
1442   printf("\nC:");
1443   for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1444   printf("\nF:");
1445   for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1446   printf("\n");
1447
1448   printf("\nF:");
1449   for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1450   printf("\nC:");
1451   for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1452   printf("\n");
1453
1454   jit_destroy_state();
1455   finish_jit();
1456   return 0;
1457 @}
1458 @end example
1459
1460 The client displays a conversion table between Celsius and Fahrenheit
1461 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1462 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1463 respectively.
1464
1465 Providing the formula as an argument to @code{compile_rpn} effectively
1466 parameterizes code generation, making it possible to use the same code
1467 to compile different functions; this is what makes dynamic code
1468 generation so powerful.
1469
1470 @node Fibonacci
1471 @section Fibonacci numbers
1472
1473 The code in this section calculates the Fibonacci sequence. That is
1474 modeled by the recurrence relation:
1475 @display
1476      f(0) = 0
1477      f(1) = f(2) = 1
1478      f(n) = f(n-1) + f(n-2)
1479 @end display
1480
1481 The purpose of this example is to introduce branches.  There are two
1482 kind of branches: backward branches and forward branches.  We'll
1483 present the calculation in a recursive and iterative form; the
1484 former only uses forward branches, while the latter uses both.
1485
1486 @example
1487 #include <stdio.h>
1488 #include <lightning.h>
1489
1490 static jit_state_t *_jit;
1491
1492 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1493
1494 int main(int argc, char *argv[])
1495 @{
1496   pifi       fib;
1497   jit_node_t *label;
1498   jit_node_t *call;
1499   jit_node_t *in;                 @rem{/* offset of the argument */}
1500   jit_node_t *ref;                @rem{/* to patch the forward reference */}
1501   jit_node_t *zero;               @rem{/* to patch the forward reference */}
1502
1503   init_jit(argv[0]);
1504   _jit = jit_new_state();
1505
1506   label = jit_label();
1507         jit_prolog   ();
1508   in =  jit_arg      ();
1509         jit_getarg   (JIT_V0, in);              @rem{/* R0 = n */}
1510  zero = jit_beqi     (JIT_R0, 0);
1511         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1512         jit_movi     (JIT_R0, 1);
1513   ref = jit_blei     (JIT_V0, 2);
1514         jit_subi     (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
1515         jit_subi     (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
1516         jit_prepare();
1517           jit_pushargr(JIT_V1);
1518         call = jit_finishi(NULL);
1519         jit_patch_at(call, label);
1520         jit_retval(JIT_V1);                     @rem{/* V1 = fib(n-1) */}
1521         jit_prepare();
1522           jit_pushargr(JIT_V2);
1523         call = jit_finishi(NULL);
1524         jit_patch_at(call, label);
1525         jit_retval(JIT_R0);                     @rem{/* R0 = fib(n-2) */}
1526         jit_addr(JIT_R0, JIT_R0, JIT_V1);       @rem{/* R0 = R0 + V1 */}
1527
1528   jit_patch(ref);                               @rem{/* patch jump */}
1529   jit_patch(zero);                              @rem{/* patch jump */}
1530         jit_retr(JIT_R0);
1531
1532   @rem{/* call the generated code@comma{} passing 32 as an argument */}
1533   fib = jit_emit();
1534   jit_clear_state();
1535   printf("fib(%d) = %d\n", 32, fib(32));
1536   jit_destroy_state();
1537   finish_jit();
1538   return 0;
1539 @}
1540 @end example
1541
1542 As said above, this is the first example of dynamically compiling
1543 branches.  Branch instructions have two operands containing the
1544 values to be compared, and return a @code{jit_note_t *} object
1545 to be patched.
1546
1547 Because labels final address are only known after calling @code{emit},
1548 it is required to call @code{patch} or @code{patch_at}, what does
1549 tell @lightning{} that the target to patch is actually a pointer to
1550 a @code{jit_node_t *} object, otherwise, it would assume that is
1551 a pointer to a C function. Note that conditional branches do not
1552 receive a label argument, so they must be patched.
1553
1554 You need to call @code{patch_at} on the return of value @code{calli},
1555 @code{finishi}, and @code{calli} if it is actually referencing a label
1556 in the jit code. All branch instructions do not receive a label
1557 argument. Note that @code{movi} is an special case, and patching it
1558 is usually done to get the final address of a label, usually to later
1559 call @code{jmpr}.
1560
1561 Now, here is the iterative version:
1562
1563 @example
1564 #include <stdio.h>
1565 #include <lightning.h>
1566
1567 static jit_state_t *_jit;
1568
1569 typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}
1570
1571 int main(int argc, char *argv[])
1572 @{
1573   pifi       fib;
1574   jit_node_t *in;               @rem{/* offset of the argument */}
1575   jit_node_t *ref;              @rem{/* to patch the forward reference */}
1576   jit_node_t *zero;             @rem{/* to patch the forward reference */}
1577   jit_node_t *jump;             @rem{/* jump to start of loop */}
1578   jit_node_t *loop;             @rem{/* start of the loop */}
1579
1580   init_jit(argv[0]);
1581   _jit = jit_new_state();
1582
1583         jit_prolog   ();
1584   in =  jit_arg      ();
1585         jit_getarg   (JIT_R0, in);              @rem{/* R0 = n */}
1586  zero = jit_beqi     (JIT_R0, 0);
1587         jit_movr     (JIT_R1, JIT_R0);
1588         jit_movi     (JIT_R0, 1);
1589   ref = jit_blti     (JIT_R1, 2);
1590         jit_subi     (JIT_R2, JIT_R2, 2);
1591         jit_movr     (JIT_R1, JIT_R0);
1592
1593   loop= jit_label();
1594         jit_subi     (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
1595         jit_movr     (JIT_V0, JIT_R0);          /* V0 = R0 */
1596         jit_addr     (JIT_R0, JIT_R0, JIT_R1);  /* R0 = R0 + R1 */
1597         jit_movr     (JIT_R1, JIT_V0);          /* R1 = V0 */
1598   jump= jit_bnei     (JIT_R2, 0);               /* if (R2) goto loop; */
1599   jit_patch_at(jump, loop);
1600
1601   jit_patch(ref);                               @rem{/* patch forward jump */}
1602   jit_patch(zero);                              @rem{/* patch forward jump */}
1603         jit_retr     (JIT_R0);
1604
1605   @rem{/* call the generated code@comma{} passing 36 as an argument */}
1606   fib = jit_emit();
1607   jit_clear_state();
1608   printf("fib(%d) = %d\n", 36, fib(36));
1609   jit_destroy_state();
1610   finish_jit();
1611   return 0;
1612 @}
1613 @end example
1614
1615 This code calculates the recurrence relation using iteration (a
1616 @code{for} loop in high-level languages).  There are no function
1617 calls anymore: instead, there is a backward jump (the @code{bnei} at
1618 the end of the loop).
1619
1620 Note that the program must remember the address for backward jumps;
1621 for forward jumps it is only required to remember the jump code,
1622 and call @code{patch} for the implicit label.
1623
1624 @node Reentrancy
1625 @chapter Re-entrant usage of @lightning{}
1626
1627 @lightning{} uses the special @code{_jit} identifier. To be able
1628 to be able to use multiple jit generation states at the same
1629 time, it is required to used code similar to:
1630
1631 @example
1632     struct jit_state lightning;
1633     #define lightning _jit
1634 @end example
1635
1636 This will cause the symbol defined to @code{_jit} to be passed as
1637 the first argument to the underlying @lightning{} implementation,
1638 that is usually a function with an @code{_} (underscode) prefix
1639 and with an argument named @code{_jit}, in the pattern:
1640
1641 @example
1642     static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1643     #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1644 @end example
1645
1646 The reason for this is to use the same syntax as the initial lightning
1647 implementation and to avoid needing the user to keep adding an extra
1648 argument to every call, as multiple jit states generating code in
1649 paralell should be very uncommon.
1650
1651 @node Registers
1652 @chapter Accessing the whole register file
1653
1654 As mentioned earlier in this chapter, all @lightning{} back-ends are
1655 guaranteed to have at least six general-purpose integer registers and
1656 six floating-point registers, but many back-ends will have more.
1657
1658 To access the entire register files, you can use the
1659 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros.  They
1660 accept a parameter that identifies the register number, which
1661 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1662 and @code{JIT_F_NUM} respectively; the number need not be
1663 constant.  Of course, expressions like @code{JIT_R0} and
1664 @code{JIT_R(0)} denote the same register, and likewise for
1665 integer callee-saved, or floating-point, registers.
1666
1667 @section Scratch registers
1668
1669 For operations, @lightning{} does not support directly, like storing
1670 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1671 acquire and release a scratch register as in the following pattern:
1672
1673 @example
1674     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1675     jit_movi (reg, immediate);
1676     jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1677     jit_unget_reg (reg);
1678 @end example
1679
1680 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1681 reloads but don't follow branches, the code between both must be in
1682 the same basic block and must not contain any branches as in the
1683 following (bad) example.
1684
1685 @example
1686     jit_int32_t reg = jit_get_reg (jit_class_gpr);
1687     jit_ldxi (reg, JIT_V0, offset);
1688     jump = jit_bnei (reg, V0);
1689     jit_movr (JIT_V1, reg);
1690     jit_patch (jump);
1691     jit_unget_reg (reg);
1692 @end example
1693
1694 @node Customizations
1695 @chapter Customizations
1696
1697 Frequently it is desirable to have more control over how code is
1698 generated or how memory is used during jit generation or execution.
1699
1700 @section Memory functions
1701 To aid in complete control of memory allocation and deallocation
1702 @lightning{} provides wrappers that default to standard @code{malloc},
1703 @code{realloc} and @code{free}. These are loosely based on the
1704 GNU GMP counterparts, with the difference that they use the same
1705 prototype of the system allocation functions, that is, no @code{size}
1706 for @code{free} or @code{old_size} for @code{realloc}.
1707
1708 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1709 @lightning{} guarantees that memory is only allocated or released
1710 using these wrapped functions, but you must note that if lightning
1711 was linked to GNU binutils, malloc is probably will be called multiple
1712 times from there when initializing the disassembler.
1713
1714 Because @code{init_jit} may call memory functions, if you need to call
1715 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1716 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1717 previous or default wrappers will be passed.
1718 @end deftypefun
1719
1720 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1721 Get the current memory allocation function. Also, unlike the GNU GMP
1722 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1723 @end deftypefun
1724
1725 @section Protection
1726 Unless an alternate code buffer is used (see below), @code{jit_emit}
1727 set the access protections that the code buffer's memory can be read and
1728 executed, but not modified.  One can use the following functions after
1729 @code{jit_emit} but before @code{jit_clear} to temporarily lift the
1730 protection:
1731
1732 @deftypefun void jit_unprotect ()
1733 Changes the access protection that the code buffer's memory can be read and
1734 modified.  Before the emitted code can be invoked, @code{jit_protect}
1735 has to be called to reset the change.
1736
1737 This procedure has no effect when an alternate code buffer (see below) is used.
1738 @end deftypefun
1739
1740 @deftypefun void jit_protect ()
1741 Changes the access protection that the code buffer's memory can be read and
1742 executed.
1743
1744 This procedure has no effect when an alternate code buffer (see below) is used.
1745 @end deftypefun
1746
1747 @section Alternate code buffer
1748 To instruct @lightning{} to use an alternate code buffer it is required
1749 to call @code{jit_realize} before @code{jit_emit}, and then query states
1750 and customize as appropriate.
1751
1752 @deftypefun void jit_realize ()
1753 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1754 that no other @code{jit_xyz} call will be made.
1755 @end deftypefun
1756
1757 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1758 Returns NULL or the previous value set with @code{jit_set_code}, and
1759 sets the @var{code_size} argument to an appropriate value.
1760 If @code{jit_get_code} is called before @code{jit_emit}, the
1761 @var{code_size} argument is set to the expected amount of bytes
1762 required to generate code.
1763 If @code{jit_get_code} is called after @code{jit_emit}, the
1764 @var{code_size} argument is set to the exact amount of bytes used
1765 by the code.
1766 @end deftypefun
1767
1768 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1769 Instructs @lightning{} to output to the @var{code} argument and
1770 use @var{size} as a guard to not write to invalid memory. If during
1771 @code{jit_emit} @lightning{} finds out that the code would not fit
1772 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1773 @end deftypefun
1774
1775 A simple example of a loop using an alternate buffer is:
1776
1777 @example
1778   jit_uint8_t   *code;
1779   int           *(func)(int);      @rem{/* function pointer */}
1780   jit_word_t     code_size;
1781   jit_word_t     real_code_size;
1782   @rem{...}
1783   jit_realize();                   @rem{/* ready to generate code */}
1784   jit_get_code(&code_size);        @rem{/* get expected code size */}
1785   code_size = (code_size + 4095) & -4096;
1786   do (;;) @{
1787     code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1788                 MAP_PRIVATE | MAP_ANON, -1, 0);
1789     jit_set_code(code, code_size);
1790     if ((func = jit_emit()) == NULL) @{
1791       munmap(code, code_size);
1792       code_size += 4096;
1793     @}
1794   @} while (func == NULL);
1795   jit_get_code(&real_code_size);   @rem{/* query exact size of the code */}
1796 @end example
1797
1798 The first call to @code{jit_get_code} should return @code{NULL} and set
1799 the @code{code_size} argument to the expected amount of bytes required
1800 to emit code.
1801 The second call to @code{jit_get_code} is after a successful call to
1802 @code{jit_emit}, and will return the value previously set with
1803 @code{jit_set_code} and set the @code{real_code_size} argument to the
1804 exact amount of bytes used to emit the code.
1805
1806 @section Alternate data buffer
1807 Sometimes it may be desirable to customize how, or to prevent
1808 @lightning{} from using an extra buffer for constants or debug
1809 annotation. Usually when also using an alternate code buffer.
1810
1811 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1812 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1813 and sets the @var{data_size} argument to how many bytes are required
1814 for the constants data buffer, and @var{note_size} to how many bytes
1815 are required to store the debug note information.
1816 Note that it always preallocate one debug note entry even if
1817 @code{jit_name} or @code{jit_note} are never called, but will return
1818 zero in the @var{data_size} argument if no constant is required;
1819 constants are only used for the @code{float} and @code{double} operations
1820 that have an immediate argument, and not in all @lightning{} ports.
1821 @end deftypefun
1822
1823 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1824
1825 @var{data} can be NULL if disabling constants and annotations, otherwise,
1826 a valid pointer must be passed. An assertion is done that the data will
1827 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1828 with @code{-DNDEBUG}).
1829
1830 @var{size} tells the space in bytes available in @var{data}.
1831
1832 @var{flags} can be zero to tell to just use the alternate data buffer,
1833 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1834
1835 @table @t
1836 @item JIT_DISABLE_DATA
1837 @cindex JIT_DISABLE_DATA
1838 Instructs @lightning{} to not use a constant table, but to use an
1839 alternate method to synthesize those, usually with a larger code
1840 sequence using stack space to transfer the value from a GPR to a
1841 FPR register.
1842
1843 @item JIT_DISABLE_NOTE
1844 @cindex JIT_DISABLE_NOTE
1845 Instructs @lightning{} to not store file or function name, and
1846 line numbers in the constant buffer.
1847 @end table
1848 @end deftypefun
1849
1850 A simple example of a preventing usage of a data buffer is:
1851
1852 @example
1853   @rem{...}
1854   jit_realize();                        @rem{/* ready to generate code */}
1855   jit_get_data(NULL, NULL);
1856   jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1857   @rem{...}
1858 @end example
1859
1860 Or to only use a data buffer, if required:
1861
1862 @example
1863   jit_uint8_t   *data;
1864   jit_word_t     data_size;
1865   @rem{...}
1866   jit_realize();                        @rem{/* ready to generate code */}
1867   jit_get_data(&data_size, NULL);
1868   if (data_size)
1869     data = malloc(data_size);
1870   else
1871     data = NULL;
1872   jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1873   @rem{...}
1874   if (data)
1875     free(data);
1876   @rem{...}
1877 @end example
1878
1879 @node Acknowledgements
1880 @chapter Acknowledgements
1881
1882 As far as I know, the first general-purpose portable dynamic code
1883 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1884 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1885 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1886 directly inspired @lightning{}.
1887
1888 Thanks go to Ian Piumarta, who kindly accepted to release his own
1889 program @sc{ccg} under the GNU General Public License, thereby allowing
1890 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1891 @sc{ccg} provides a way of dynamically assemble programs written in the
1892 underlying architecture's assembly language.  So it is not portable,
1893 yet very interesting.
1894
1895 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1896 was first developed as a tool to be used in GNU Smalltalk's dynamic
1897 translator from bytecodes to native code.