2 @dircategory Software development
4 * lightning: (lightning). Library for dynamic code generation.
24 This document describes @value{TOPIC} the @lightning{} library for
25 dynamic code generation.
28 * Overview:: What GNU lightning is
29 * Installation:: Configuring and installing GNU lightning
30 * The instruction set:: The RISC instruction set used in GNU lightning
31 * GNU lightning examples:: GNU lightning's examples
32 * Reentrancy:: Re-entrant usage of GNU lightning
33 * Registers:: Accessing the whole register file
34 * Customizations:: Advanced code generation customizations
35 * Acknowledgements:: Acknowledgements for GNU lightning
40 @chapter Introduction to @lightning{}
43 This document describes @value{TOPIC} the @lightning{} library for
44 dynamic code generation.
47 Dynamic code generation is the generation of machine code
48 at runtime. It is typically used to strip a layer of interpretation
49 by allowing compilation to occur at runtime. One of the most
50 well-known applications of dynamic code generation is perhaps that
51 of interpreters that compile source code to an intermediate bytecode
52 form, which is then recompiled to machine code at run-time: this
53 approach effectively combines the portability of bytecode
54 representations with the speed of machine code. Another common
55 application of dynamic code generation is in the field of hardware
56 simulators and binary emulators, which can use the same techniques
57 to translate simulated instructions to the instructions of the
60 Yet other applications come to mind: for example, windowing
61 @dfn{bitblt} operations, matrix manipulations, and network packet
62 filters. Albeit very powerful and relatively well known within the
63 compiler community, dynamic code generation techniques are rarely
64 exploited to their full potential and, with the exception of the
65 two applications described above, have remained curiosities because
66 of their portability and functionality barriers: binary instructions
67 are generated, so programs using dynamic code generation must be
68 retargeted for each machine; in addition, coding a run-time code
69 generator is a tedious and error-prone task more than a difficult one.
71 @lightning{} provides a portable, fast and easily retargetable dynamic
72 code generation system.
74 To be portable, @lightning{} abstracts over current architectures'
75 quirks and unorthogonalities. The interface that it exposes to is that
76 of a standardized RISC architecture loosely based on the SPARC and MIPS
77 chips. There are a few general-purpose registers (six, not including
78 those used to receive and pass parameters between subroutines), and
79 arithmetic operations involve three operands---either three registers
80 or two registers and an arbitrarily sized immediate value.
82 On one hand, this architecture is general enough that it is possible to
83 generate pretty efficient code even on CISC architectures such as the
84 Intel x86 or the Motorola 68k families. On the other hand, it matches
85 real architectures closely enough that, most of the time, the
86 compiler's constant folding pass ends up generating code which
87 assembles machine instructions without further tests.
90 @chapter Configuring and installing @lightning{}
92 Here we will assume that your system already has the dependencies
93 necessary to build @lightning{}. For more on dependencies, see
94 @lightning{}'s @file{README-hacking} file.
96 The first thing to do to build @lightning{} is to configure the
97 program, picking the set of macros to be used on the host
98 architecture; this configuration is automatically performed by
99 the @file{configure} shell script; to run it, merely type:
104 The @file{configure} accepts the @code{--enable-disassembler} option,
105 hat enables linking to GNU binutils and optionally print human readable
106 disassembly of the jit code. This option can be disabled by the
107 @code{--disable-disassembler} option.
109 @file{configure} also accepts the @code{--enable-devel-disassembler},
110 option useful to check exactly hat machine instructions were generated
111 for a @lightning{} instrction. Basically mixing @code{jit_print} and
112 @code{jit_disassembly}.
114 The @code{--enable-assertions} option, which enables several consistency
115 hecks in the run-time assemblers. These are not usually needed, so you
116 can decide to simply forget about it; also remember that these consistency
117 checks tend to slow down your code generator.
119 The @code{--enable-devel-strong-type-checking} option that does extra type
120 checking using @code{assert}. This option also enables the
121 @code{--enable-assertions} unless it is explicitly disabled.
123 The option @code{--enable-devel-get-jit-size} should only be used
124 when doing updates or maintenance to lightning. It regenerates the
125 @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
126 translating a @lightning{} instruction to machine code.
128 After you've configured @lightning{}, run @file{make} as usual.
130 @lightning{} has an extensive set of tests to validate it is working
131 correctly in the build host. To test it run:
136 The next important step is:
141 This ends the process of installing @lightning{}.
143 @node The instruction set
144 @chapter @lightning{}'s instruction set
146 @lightning{}'s instruction set was designed by deriving instructions
147 that closely match those of most existing RISC architectures, or
148 that can be easily syntesized if absent. Each instruction is composed
152 an operation, like @code{sub} or @code{mul}
155 most times, a register/immediate flag (@code{r} or @code{i})
158 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
161 Examples of legal mnemonics are @code{addr} (integer add, with three
162 register operands) and @code{muli} (integer multiply, with two
163 register operands and an immediate operand). Each instruction takes
164 two or three operands; in most cases, one of them can be an immediate
165 value instead of a register.
167 Most @lightning{} integer operations are signed wordsize operations,
168 with the exception of operations that convert types, or load or store
169 values to/from memory. When applicable, the types and C types are as
174 _uc @r{unsigned char}
176 _us @r{unsigned short}
184 Most integer operations do not need a type modifier, and when loading or
185 storing values to memory there is an alias to the proper operation
186 using wordsize operands, that is, if ommited, the type is @r{int} on
187 32-bit architectures and @r{long} on 64-bit architectures. Note
188 that lightning also expects @code{sizeof(void*)} to match the wordsize.
190 When an unsigned operation result differs from the equivalent signed
191 operation, there is a the @code{_u} modifier.
193 There are at least seven integer registers, of which six are
194 general-purpose, while the last is used to contain the frame pointer
195 (@code{FP}). The frame pointer can be used to allocate and access local
196 variables on the stack, using the @code{allocai} or @code{allocar}
199 Of the general-purpose registers, at least three are guaranteed to be
200 preserved across function calls (@code{V0}, @code{V1} and
201 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
202 @code{R2}). Six registers are not very much, but this
203 restriction was forced by the need to target CISC architectures
204 which, like the x86, are poor of registers; anyway, backends can
205 specify the actual number of available registers with the calls
206 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
207 (for callee-save registers).
209 There are at least six floating-point registers, named @code{F0} to
210 @code{F5}. These are usually caller-save and are separate from the integer
211 registers on the supported architectures; on Intel architectures,
212 in 32 bit mode if SSE2 is not available or use of X87 is forced,
213 the register stack is mapped to a flat register file. As for the
214 integer registers, the macro @code{JIT_F_NUM} yields the number of
215 floating-point registers.
217 The complete instruction set follows; as you can see, most non-memory
218 operations only take integers (either signed or unsigned) as operands;
219 this was done in order to reduce the instruction set, and because most
220 architectures only provide word and long word operations on registers.
221 There are instructions that allow operands to be extended to fit a larger
222 data type, both in a signed and in an unsigned way.
225 @item Binary ALU operations
226 These accept three operands; the last one can be an immediate.
227 @code{addx} operations must directly follow @code{addc}, and
228 @code{subx} must follow @code{subc}; otherwise, results are undefined.
229 Most, if not all, architectures do not support @r{float} or @r{double}
230 immediate operands; lightning emulates those operations by moving the
231 immediate to a temporary register and emiting the call with only
234 addr _f _d O1 = O2 + O3
235 addi _f _d O1 = O2 + O3
236 addxr O1 = O2 + (O3 + carry)
237 addxi O1 = O2 + (O3 + carry)
238 addcr O1 = O2 + O3, set carry
239 addci O1 = O2 + O3, set carry
240 subr _f _d O1 = O2 - O3
241 subi _f _d O1 = O2 - O3
242 subxr O1 = O2 - (O3 + carry)
243 subxi O1 = O2 - (O3 + carry)
244 subcr O1 = O2 - O3, set carry
245 subci O1 = O2 - O3, set carry
246 rsbr _f _d O1 = O3 - O1
247 rsbi _f _d O1 = O3 - O1
248 mulr _f _d O1 = O2 * O3
249 muli _f _d O1 = O2 * O3
250 divr _u _f _d O1 = O2 / O3
251 divi _u _f _d O1 = O2 / O3
262 rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
263 rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
264 movzr O1 = O3 ? O1 : O2
265 movnr O1 = O3 ? O2 : O1
268 @item Four operand binary ALU operations
269 These accept two result registers, and two operands; the last one can
270 be an immediate. The first two arguments cannot be the same register.
272 @code{qmul} stores the low word of the result in @code{O1} and the
273 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
274 means there was no overflow. For signed multiplication, no overflow
275 check is based on sign, and can be detected if @code{O2} is zero or
278 @code{qdiv} stores the quotient in @code{O1} and the remainder in
279 @code{O2}. It can be used as quick way to check if a division is
280 exact, in which case the remainder is zero.
283 qmulr _u O1 O2 = O3 * O4
284 qmuli _u O1 O2 = O3 * O4
285 qdivr _u O1 O2 = O3 / O4
286 qdivi _u O1 O2 = O3 / O4
289 @item Unary ALU operations
290 These accept two operands, both of which must be registers.
294 clor O1 = number of leading one bits
295 clzr O1 = number of leading zero bits
296 ctor O1 = number of trailing one bits
297 ctzr O1 = number of trailing zero bits
300 Note that @code{ctzr} is basically equivalent of a @code{C} call
301 @code{ffs} but indexed at bit zero, not one.
303 Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
304 value of zero is not an error, it just returns the number of bits
305 in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
308 The @code{clor} and @code{ctor} are just counterparts of the versions
309 that search for zero bits.
311 These unary ALU operations are only defined for float operands.
313 absr _f _d O1 = fabs(O2)
314 sqrtr _f _d O1 = sqrt(O2)
317 Besides requiring the @code{r} modifier, there are no unary operations
318 with an immediate operand.
320 @item Compare instructions
321 These accept three operands; again, the last can be an immediate.
322 The last two operands are compared, and the first operand, that must be
323 an integer register, is set to either 0 or 1, according to whether the
324 given condition was met or not.
326 The conditions given below are for the standard behavior of C,
327 where the ``unordered'' comparison result is mapped to false.
330 ltr _u _f _d O1 = (O2 < O3)
331 lti _u _f _d O1 = (O2 < O3)
332 ler _u _f _d O1 = (O2 <= O3)
333 lei _u _f _d O1 = (O2 <= O3)
334 gtr _u _f _d O1 = (O2 > O3)
335 gti _u _f _d O1 = (O2 > O3)
336 ger _u _f _d O1 = (O2 >= O3)
337 gei _u _f _d O1 = (O2 >= O3)
338 eqr _f _d O1 = (O2 == O3)
339 eqi _f _d O1 = (O2 == O3)
340 ner _f _d O1 = (O2 != O3)
341 nei _f _d O1 = (O2 != O3)
342 unltr _f _d O1 = !(O2 >= O3)
343 unler _f _d O1 = !(O2 > O3)
344 ungtr _f _d O1 = !(O2 <= O3)
345 unger _f _d O1 = !(O2 < O3)
346 uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
347 ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
348 ordr _f _d O1 = (O2 == O2) && (O3 == O3)
349 unordr _f _d O1 = (O2 != O2) || (O3 != O3)
352 @item Transfer operations
353 These accept two operands; for @code{ext} both of them must be
354 registers, while @code{mov} accepts an immediate value as the second
357 Unlike @code{movr} and @code{movi}, the other instructions are used
358 to truncate a wordsize operand to a smaller integer data type or to
359 convert float data types. You can also use @code{extr} to convert an
360 integer to a floating point value: the usual options are @code{extr_f}
366 extr _c _uc _s _us _i _ui _f _d O1 = O2
367 truncr _f _d O1 = trunc(O2)
370 In 64-bit architectures it may be required to use @code{truncr_f_i},
371 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
372 the equivalent C code. Only the @code{_i} modifier is available in
373 32-bit architectures.
376 truncr_f_i = <int> O1 = <float> O2
377 truncr_f_l = <long>O1 = <float> O2
378 truncr_d_i = <int> O1 = <double>O2
379 truncr_d_l = <long>O1 = <double>O2
382 The float conversion operations are @emph{destination first,
383 source second}, but the order of the types is reversed. This happens
384 for historical reasons.
387 extr_f_d = <double>O1 = <float> O2
388 extr_d_f = <float> O1 = <double>O2
391 @item Network extensions
392 These accept two operands, both of which must be registers; these
393 two instructions actually perform the same task, yet they are
394 assigned to two mnemonics for the sake of convenience and
395 completeness. As usual, the first operand is the destination and
396 the second is the source.
397 The @code{_ul} variant is only available in 64-bit architectures.
399 htonr _us _ui _ul @r{Host-to-network (big endian) order}
400 ntohr _us _ui _ul @r{Network-to-host order }
403 @code{bswapr} can be used to unconditionally byte-swap an operand.
404 On little-endian architectures, @code{htonr} and @code{ntohr} resolve
406 The @code{_ul} variant is only available in 64-bit architectures.
408 bswapr _us _ui _ul 01 = byte_swap(02)
411 @item Load operations
412 @code{ld} accepts two operands while @code{ldx} accepts three;
413 in both cases, the last can be either a register or an immediate
414 value. Values are extended (with or without sign, according to
415 the data type specification) to fit a whole register.
416 The @code{_ui} and @code{_l} types are only available in 64-bit
417 architectures. For convenience, there is a version without a
418 type modifier for integer or pointer operands that uses the
419 appropriate wordsize call.
421 ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
422 ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
423 ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
424 ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
427 @item Store operations
428 @code{st} accepts two operands while @code{stx} accepts three; in
429 both cases, the first can be either a register or an immediate
430 value. Values are sign-extended to fit a whole register.
432 str _c _s _i _l _f _d *O1 = O2
433 sti _c _s _i _l _f _d *O1 = O2
434 stxr _c _s _i _l _f _d *(O1+O2) = O3
435 stxi _c _s _i _l _f _d *(O1+O2) = O3
437 Note that the unsigned type modifier is not available, as the store
438 only writes to the 1, 2, 4 or 8 sized memory address.
439 The @code{_l} type is only available in 64-bit architectures, and for
440 convenience, there is a version without a type modifier for integer or
441 pointer operands that uses the appropriate wordsize call.
443 @item Argument management
446 prepare (not specified)
447 va_start (not specified)
448 pushargr _c _uc _s _us _i _ui _l _f _d
449 pushargi _c _uc _s _us _i _ui _l _f _d
450 va_push (not specified)
451 arg _c _uc _s _us _i _ui _l _f _d
452 getarg _c _uc _s _us _i _ui _l _f _d
454 putargr _c _uc _s _us _i _ui _l _f _d
455 putargi _c _uc _s _us _i _ui _l _f _d
457 retr _c _uc _s _us _i _ui _l _f _d
458 reti _c _uc _s _us _i _ui _l _f _d
460 va_end (not specified)
461 retval _c _uc _s _us _i _ui _l _f _d
462 epilog (not specified)
464 As with other operations that use a type modifier, the @code{_ui} and
465 @code{_l} types are only available in 64-bit architectures, but there
466 are operations without a type modifier that alias to the appropriate
467 integer operation with wordsize operands.
469 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
470 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
471 A code snippet that wants to call another procedure and has to pass
472 arguments must, in order: use the @code{prepare} instruction and use
473 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
474 left to right order}; and use @code{finish} or @code{call} (explained below)
475 to perform the actual call.
477 Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
478 handling integer types can be used without a type modifier.
479 It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
480 and @code{getarg} otherwise problems will happen if generating jit for
481 environments that require arguments to be truncated and zero or sign
482 extended by the caller and/or excess arguments might be passed packed
483 in the stack. Currently only Apple systems with @code{aarch64} cpus are
484 known to have this restriction.
486 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
487 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
488 @code{va_push} is required when passing a @code{va_list} to another function,
489 because not all architectures expect it as a single pointer. Known case
490 is DEC Alpha, that requires it as a structure passed by value.
492 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
493 @code{arg} is different from other instruction in that it does not
494 actually generate any code: instead, it is a function which returns
495 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
496 a value'' means that @lightning{} code that compile these
497 instructions return a value when expanded.} You should call
498 @code{arg} as soon as possible, before any function call or, more
499 easily, right after the @code{prolog} instructions
500 (which is treated later).
502 @code{getarg} accepts a register argument and a value returned by
503 @code{arg}, and will move that argument to the register, extending
504 it (with or without sign, according to the data type specification)
505 to fit a whole register. These instructions are more intimately
506 related to the usage of the @lightning{} instruction set in code
507 that generates other code, so they will be treated more
508 specifically in @ref{GNU lightning examples, , Generating code at
511 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
512 it accepts as first argument a register or immediate, and as
513 second argument a value returned by @code{arg}. It allows changing,
514 or restoring an argument to the current function, and is a
515 construct required to implement tail call optimization. Note that
516 arguments in registers are very cheap, but will be overwritten
517 at any moment, including on some operations, for example division,
518 that on several ports is implemented as a function call.
520 Finally, the @code{retval} instruction fetches the return value of a
521 called function in a register. The @code{retval} instruction takes a
522 register argument and copies the return value of the previously called
523 function in that register. A function with a return value should use
524 @code{retr} or @code{reti} to put the return value in the return register
525 before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
527 @code{epilog} is an optional call, that marks the end of a function
528 body. It is automatically generated by @lightning{} if starting a new
529 function (what should be done after a @code{ret} call) or finishing
531 It is very important to note that the fact that @code{epilog} being
532 optional may cause a common mistake. Consider this:
541 Because @code{epilog} is added when finding a new @code{prolog},
542 this will cause the @code{fun2} label to actually be before the
543 return from @code{fun1}. Because @lightning{} will actually
555 You should observe a few rules when using these macros. First of
556 all, if calling a varargs function, you should use the @code{ellipsis}
557 call to mark the position of the ellipsis in the C prototype.
559 You should not nest calls to @code{prepare} inside a
560 @code{prepare/finish} block. Doing this will result in undefined
561 behavior. Note that for functions with zero arguments you can use
564 @item Branch instructions
565 Like @code{arg}, these also return a value which, in this case,
566 is to be used to compile forward branches as explained in
567 @ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
568 compared; of these, the last can be either a register or an immediate.
571 bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
572 blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
573 bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
574 blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
575 bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
576 bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
577 bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
578 bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
579 beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
580 beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
581 bner _f _d @r{if }(O2 != O3)@r{ goto }O1
582 bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
584 bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
585 bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
586 bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
587 bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
588 buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
589 bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
590 bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
591 bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
593 bmsr @r{if }O2 & O3@r{ goto }O1
594 bmsi @r{if }O2 & O3@r{ goto }O1
595 bmcr @r{if }!(O2 & O3)@r{ goto }O1
596 bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
597 boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
598 boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
599 bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
600 bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
601 bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
602 bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
603 bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
604 bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
607 Note that the @code{C} code does not have an @code{O1} argument. It is
608 required to always use the return value as an argument to @code{patch},
609 @code{patch_at} or @code{patch_abs}.
611 @item Jump and return operations
612 These accept one argument except @code{ret} and @code{jmpi} which
613 have none; the difference between @code{finishi} and @code{calli}
614 is that the latter does not clean the stack from pushed parameters
615 (if any) and the former must @strong{always} follow a @code{prepare}
618 callr (not specified) @r{function call to register O1}
619 calli (not specified) @r{function call to immediate O1}
620 finishr (not specified) @r{function call to register O1}
621 finishi (not specified) @r{function call to immediate O1}
622 jmpr (not specified) @r{unconditional jump to register}
623 jmpi (not specified) @r{unconditional jump}
624 ret (not specified) @r{return from subroutine}
625 retr _c _uc _s _us _i _ui _l _f _d
626 reti _c _uc _s _us _i _ui _l _f _d
627 retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
631 Like branch instruction, @code{jmpi} also returns a value which is to
632 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
636 There are 3 @lightning{} instructions to create labels:
638 label (not specified) @r{simple label}
639 forward (not specified) @r{forward label}
640 indirect (not specified) @r{special simple label}
643 The following instruction is used to specify a minimal alignment for
644 the next instruction, usually with a label:
646 align (not specified) @r{align code}
649 Similar to @code{align} is the next instruction, also usually used with
652 skip (not specified) @r{skip code}
654 It is used to specify a minimal number of bytes of nops to be inserted
655 before the next instruction.
657 @code{label} is normally used as @code{patch_at} argument for backward
661 jit_node_t *jump, *label;
664 jump = jit_beqr(JIT_R0, JIT_R1);
665 jit_patch_at(jump, label);
668 @code{forward} is used to patch code generation before the actual
669 position of the label is known.
672 jit_node_t *jump, *label;
673 label = jit_forward();
674 jump = jit_beqr(JIT_R0, JIT_R1);
675 jit_patch_at(jump, label);
680 @code{indirect} is useful when creating jump tables, and tells
681 @lightning{} to not optimize out a label that is not the target of
682 any jump, because an indirect jump may land where it is defined.
685 jit_node_t *jump, *label;
687 jmpr(JIT_R0); @rem{/* may jump to label */}
689 label = jit_indirect();
692 @code{indirect} is an special case of @code{note} and @code{name}
693 because it is a valid argument to @code{address}.
695 Note that the usual idiom to write the previous example is
697 jit_node_t *addr, *jump;
698 addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
702 jit_patch(addr); @rem{/* implicit label added */}
705 that automatically binds the implicit label added by @code{patch} with
706 the @code{movi}, but on some special conditions it is required to create
709 @code{align} is useful for creating multiple entry points to a
710 (trampoline) function that are all accessible through a single
711 function pointer. @code{align} receives an integer argument that
712 defines the minimal alignment of the address of a label directly
713 following the @code{align} instruction. The integer argument must be
714 a power of two and the effective alignment will be a power of two no
715 less than the argument to @code{align}. If the argument to
716 @code{align} is 16 or more, the effective alignment will match the
717 specified minimal alignment exactly.
720 jit_node_t *forward, *label1, *label2, *jump;
721 unsigned char *addr1, *addr2;
722 forward = jit_forward();
724 label1 = jit_indirect(); @rem{/* first entry point */}
725 jump = jit_jmpi(); @rem{/* jump to first handler */}
726 jit_patch_at(jump, forward);
728 label2 = jit_indirect(); @rem{/* second entry point */}
729 ... @rem{/* second handler */}
732 ... @rem{/* first handler /*}
736 addr1 = jit_address(label1);
737 addr2 = jit_address(label2);
738 assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
741 @code{skip} is useful for reserving space in the code buffer that can
742 later be filled (possibly with the help of the pair of functions
743 @code{jit_unprotect} and @code{jit_protect}).
745 @item Function prolog
747 These macros are used to set up a function prolog. The @code{allocai}
748 call accept a single integer argument and returns an offset value
749 for stack storage access. The @code{allocar} accepts two registers
750 arguments, the first is set to the offset for stack access, and the
751 second is the size in bytes argument.
754 prolog (not specified) @r{function prolog}
755 allocai (not specified) @r{reserve space on the stack}
756 allocar (not specified) @r{allocate space on the stack}
759 @code{allocai} receives the number of bytes to allocate and returns
760 the offset from the frame pointer register @code{FP} to the base of
763 @code{allocar} receives two register arguments. The first is where
764 to store the offset from the frame pointer register @code{FP} to the
765 base of the area. The second argument is the size in bytes. Note
766 that @code{allocar} is dynamic allocation, and special attention
767 should be taken when using it. If called in a loop, every iteration
768 will allocate stack space. Stack space is aligned from 8 to 64 bytes
769 depending on backend requirements, even if allocating only one byte.
770 It is advisable to not use it with @code{frame} and @code{tramp}; it
771 should work with @code{frame} with special care to call only once,
772 but is not supported if used in @code{tramp}, even if called only
775 As a small appetizer, here is a small function that adds 1 to the input
776 parameter (an @code{int}). I'm using an assembly-like syntax here which
777 is a bit different from the one used when writing real subroutines with
778 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
779 examples, , Generating code at run-time}.
784 in = arg @rem{! We have an integer argument}
785 getarg R0, in @rem{! Move it to R0}
786 addi R0, R0, 1 @rem{! Add 1}
787 retr R0 @rem{! And return the result}
790 And here is another function which uses the @code{printf} function from
791 the standard C library to write a number in hexadecimal notation:
796 in = arg @rem{! Same as above}
798 prepare @rem{! Begin call sequence for printf}
799 pushargi "%x" @rem{! Push format string}
800 ellipsis @rem{! Varargs start here}
801 pushargr R0 @rem{! Push second argument}
802 finishi printf @rem{! Call printf}
803 ret @rem{! Return to caller}
806 @item Register liveness
808 During code generation, @lightning{} occasionally needs scratch registers
809 or needs to use architecture-defined registers. For that, @lightning{}
810 internally maintains register liveness information.
812 In the following example, @code{qdivr} will need special registers like
813 @code{R0} on some architectures. As @lightning{} understands that
814 @code{R0} is used in the subsequent instruction, it will create
815 save/restore code for @code{R0} in case.
824 The same is not true in the example that follows. Here, @code{R0} is
825 not alive after the division operation because @code{R0} is neither an
826 argument register nor a callee-save register. Thus, no save/restore
827 code for @code{R0} will be created in case.
836 The @code{live} instruction can be used to mark a register as live after
837 it as in the following example. Here, @code{R0} will be preserved
848 The @code{live} instruction is useful at code entry and exit points,
849 like after and before a @code{callr} instruction.
851 @item Trampolines, continuations and tail call optimization
853 Frequently it is required to generate jit code that must jump to
854 code generated later, possibly from another @code{jit_context_t}.
855 These require compatible stack frames.
857 @lightning{} provides two primitives from where trampolines,
858 continuations and tail call optimization can be implemented.
861 frame (not specified) @r{create stack frame}
862 tramp (not specified) @r{assume stack frame}
865 @code{frame} receives an integer argument@footnote{It is not
866 automatically computed because it does not know about the
867 requirement of later generated code.} that defines the size in
868 bytes for the stack frame of the current, @code{C} callable,
869 jit function. To calculate this value, a good formula is maximum
870 number of arguments to any called native function times
871 eight@footnote{Times eight so that it works for double arguments.
872 And would not need conditionals for ports that pass arguments in
873 the stack.}, plus the sum of the arguments to any call to
874 @code{jit_allocai}. @lightning{} automatically adjusts this value
875 for any backend specific stack memory it may need, or any
876 alignment constraint.
878 @code{frame} also instructs @lightning{} to save all callee
879 save registers in the prolog and reload in the epilog.
882 main: @rem{! jit entry point}
883 prolog @rem{! function prolog}
884 frame 256 @rem{! save all callee save registers and}
885 @rem{! reserve at least 256 bytes in stack}
888 jmpi handler @rem{! jumps to external code}
890 ret @rem{! return to the caller}
893 @code{tramp} differs from @code{frame} only that a prolog and epilog
894 will not be generated. Note that @code{prolog} must still be used.
895 The code under @code{tramp} must be ready to be entered with a jump
896 at the prolog position, and instead of a return, it must end with
897 a non conditional jump. @code{tramp} exists solely for the fact
898 that it allows optimizing out prolog and epilog code that would
902 handler: @rem{! handler entry point}
903 prolog @rem{! function prolog}
904 tramp 256 @rem{! assumes all callee save registers}
905 @rem{! are saved and there is at least}
906 @rem{! 256 bytes in stack}
908 jmpi main_loop @rem{! return to the main loop}
911 @lightning{} only supports Tail Call Optimization using the
912 @code{tramp} construct. Any other way is not guaranteed to
915 An example of a simple (recursive) tail call optimization:
918 factorial: @rem{! Entry point of the factorial function}
920 in = arg @rem{! Receive an integer argument}
921 getarg R0, in @rem{! Move argument to RO}
923 pushargi 1 @rem{! This is the accumulator}
924 pushargr R0 @rem{! This is the argument}
925 finishi fact @rem{! Call the tail call optimized function}
926 retval R0 @rem{! Fetch the result}
927 retr R0 @rem{! Return it}
928 epilog @rem{! Epilog *before* label before prolog}
930 fact: @rem{! Entry point of the helper function}
932 frame 16 @rem{! Reserve 16 bytes in the stack}
933 fact_entry: @rem{! This is the tail call entry point}
934 ac = arg @rem{! The accumulator is the first argument}
935 in = arg @rem{! The factorial argument}
936 getarg R0, ac @rem{! Move the accumulator to R0}
937 getarg R1, in @rem{! Move the argument to R1}
938 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
939 mulr R0, R0, R1 @rem{! accumulator *= argument}
940 putargr R0, ac @rem{! Update the accumulator}
941 subi R1, R1, 1 @rem{! argument -= 1}
942 putargr R1, in @rem{! Update the argument}
943 jmpi fact_entry @rem{! Tail Call Optimize it!}
945 retr R0 @rem{! Return the accumulator}
950 forward_p (not specified) @r{forward label predicate}
951 indirect_p (not specified) @r{indirect label predicate}
952 target_p (not specified) @r{used label predicate}
953 arg_register_p (not specified) @r{argument kind predicate}
954 callee_save_p (not specified) @r{callee save predicate}
955 pointer_p (not specified) @r{pointer predicate}
958 @code{forward_p} expects a @code{jit_node_t*} argument, and
959 returns non zero if it is a forward label reference, that is,
960 a label returned by @code{forward}, that still needs a
963 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
964 non zero if it is an indirect label reference, that is, a label that
965 was returned by @code{indirect}.
967 @code{target_p} expects a @code{jit_node_t*} argument, that is any
968 kind of label, and will return non zero if there is at least one
969 jump or move referencing it.
971 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
972 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
973 will return non zero if the argument lives in a register. This call
974 is useful to know the live range of register arguments, as those
975 are very fast to read and write, but have volatile values.
977 @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
978 @code{JIT_Fn}, and will return non zero if the register is callee
979 save. This call is useful because on several ports, the @code{JIT_Rn}
980 and @code{JIT_Fn} registers are actually callee save; no need
981 to save and load the values when making function calls.
983 @code{pointer_p} expects a pointer argument, and will return non
984 zero if the pointer is inside the generated jit code. Must be
985 called after @code{jit_emit} and before @code{jit_destroy_state}.
987 @item Atomic operations
988 Only compare-and-swap is implemented. It accepts four operands;
989 the second can be an immediate.
991 The first argument is set with a boolean value telling if the operation
994 Arguments must be different, cannot use the result register to also pass
997 The second argument is the address of a machine word.
999 The third argument is the old value.
1001 The fourth argument is the new value.
1004 casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1005 casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1008 If value at the address in the second argument is equal to the third
1009 argument, the address value is atomically modified to the value of the
1010 fourth argument and the first argument is set to a non zero value.
1012 If the value at the address in the second argument is not equal to the
1013 third argument nothing is done and the first argument is set to zero.
1016 @node GNU lightning examples
1017 @chapter Generating code at run-time
1019 To use @lightning{}, you should include the @file{lightning.h} file that
1020 is put in your include directory by the @samp{make install} command.
1022 Each of the instructions above translates to a macro or function call.
1023 All you have to do is prepend @code{jit_} (lowercase) to opcode names
1024 and @code{JIT_} (uppercase) to register names. Of course, parameters
1025 are to be put between parentheses.
1027 This small tutorial presents three examples:
1032 The @code{incr} function found in @ref{The instruction set, ,
1033 @lightning{}'s instruction set}:
1036 A simple function call to @code{printf}
1047 * incr:: A function which increments a number by one
1048 * printf:: A simple function call to printf
1049 * RPN calculator:: A more complex example, an RPN calculator
1050 * Fibonacci:: Calculating Fibonacci numbers
1055 @section A function which increments a number by one
1057 Let's see how to create and use the sample @code{incr} function created
1058 in @ref{The instruction set, , @lightning{}'s instruction set}:
1062 #include <lightning.h>
1064 static jit_state_t *_jit;
1066 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1068 int main(int argc, char *argv[])
1074 _jit = jit_new_state();
1076 jit_prolog(); @rem{/* @t{ prolog } */}
1077 in = jit_arg(); @rem{/* @t{ in = arg } */}
1078 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
1079 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
1080 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
1085 @rem{/* call the generated code@comma{} passing 5 as an argument */}
1086 printf("%d + 1 = %d\n", 5, incr(5));
1088 jit_destroy_state();
1094 Let's examine the code line by line (well, almost@dots{}):
1097 @item #include <lightning.h>
1098 You already know about this. It defines all of @lightning{}'s macros.
1100 @item static jit_state_t *_jit;
1101 You might wonder about what is @code{jit_state_t}. It is a structure
1102 that stores jit code generation information. The name @code{_jit} is
1103 special, because since multiple jit generators can run at the same
1104 time, you must either @r{#define _jit my_jit_state} or name it
1107 @item typedef int (*pifi)(int);
1108 Just a handy typedef for a pointer to a function that takes an
1109 @code{int} and returns another.
1111 @item jit_node_t *in;
1112 Declares a variable to hold an identifier for a function argument. It
1113 is an opaque pointer, that will hold the return of a call to @code{arg}
1114 and be used as argument to @code{getarg}.
1117 Declares a function pointer variable to a function that receives an
1118 @code{int} and returns an @code{int}.
1120 @item init_jit(argv[0]);
1121 You must call this function before creating a @code{jit_state_t}
1122 object. This function does global state initialization, and may need
1123 to detect CPU or Operating System features. It receives a string
1124 argument that is later used to read symbols from a shared object using
1125 GNU binutils if disassembly was enabled at configure time. If no
1126 disassembly will be performed a NULL pointer can be used as argument.
1128 @item _jit = jit_new_state();
1129 This call initializes a @lightning{} jit state.
1132 Ok, so we start generating code for our beloved function@dots{}
1134 @item in = jit_arg();
1135 @itemx jit_getarg(JIT_R0, in);
1136 We retrieve the first (and only) argument, an integer, and store it
1137 into the general-purpose register @code{R0}.
1139 @item jit_addi(JIT_R0, JIT_R0, 1);
1140 We add one to the content of the register.
1142 @item jit_retr(JIT_R0);
1143 This instruction generates a standard function epilog that returns
1144 the contents of the @code{R0} register.
1146 @item incr = jit_emit();
1147 This instruction is very important. It actually translates the
1148 @lightning{} macros used before to machine code, flushes the generated
1149 code area out of the processor's instruction cache and return a
1150 pointer to the start of the code.
1152 @item jit_clear_state();
1153 This call cleanups any data not required for jit execution. Note
1154 that it must be called after any call to @code{jit_print} or
1155 @code{jit_address}, as this call destroy the @lightning{}
1156 intermediate representation.
1158 @item printf("%d + 1 = %d", 5, incr(5));
1159 Calling our function is this simple---it is not distinguishable from
1160 a normal C function call, the only difference being that @code{incr}
1163 @item jit_destroy_state();
1164 Releases all memory associated with the jit context. It should be
1165 called after known the jit will no longer be called.
1168 This call cleanups any global state hold by @lightning{}, and is
1169 advisable to call it once jit code will no longer be generated.
1172 @lightning{} abstracts two phases of dynamic code generation: selecting
1173 instructions that map the standard representation, and emitting binary
1174 code for these instructions. The client program has the responsibility
1175 of describing the code to be generated using the standard @lightning{}
1178 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1179 architecture (on the right is the code that an assembly-language
1180 programmer would write):
1193 In this case, @lightning{} introduces overhead to create a register
1194 window (not knowing that the procedure is a leaf procedure) and to
1195 move the argument to the general purpose register @code{R0} (which
1196 maps to @code{%g2} on the SPARC).
1206 In this case, for the x86 port, @lightning{} has simple optimizations
1207 to understand it is a leaf function, and that it is not required to
1208 create a stack frame nor update the stack pointer.
1212 @section A simple function call to @code{printf}
1214 Again, here is the code for the example:
1218 #include <lightning.h>
1220 static jit_state_t *_jit;
1222 typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1224 int main(int argc, char *argv[])
1226 pvfi myFunction; @rem{/* ptr to generated code */}
1227 jit_node_t *start, *end; @rem{/* a couple of labels */}
1228 jit_node_t *in; @rem{/* to get the argument */}
1231 _jit = jit_new_state();
1233 start = jit_note(__FILE__, __LINE__);
1236 jit_getarg(JIT_R1, in);
1238 jit_pushargi((jit_word_t)"generated %d bytes\n");
1240 jit_pushargr(JIT_R1);
1241 jit_finishi(printf);
1244 end = jit_note(__FILE__, __LINE__);
1246 myFunction = jit_emit();
1248 @rem{/* call the generated code@comma{} passing its size as argument */}
1249 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1254 jit_destroy_state();
1260 The function shows how many bytes were generated. Most of the code
1261 is not very interesting, as it resembles very closely the program
1262 presented in @ref{incr, , A function which increments a number by one}.
1264 For this reason, we're going to concentrate on just a few statements.
1267 @item start = jit_note(__FILE__, __LINE__);
1269 @itemx end = jit_note(__FILE__, __LINE__);
1270 These two instruction call the @code{jit_note} macro, which creates
1271 a note in the jit code; arguments to @code{jit_note} usually are a
1272 filename string and line number integer, but using NULL for the
1273 string argument is perfectly valid if only need to create a simple
1276 @item jit_ellipsis();
1277 @code{ellipsis} usually is only required if calling varargs functions
1278 with double arguments, but it is a good practice to properly describe
1279 the @r{@dots{}} in the call sequence.
1281 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1282 Note the use of the @code{(jit_word_t)} cast, that is used only
1283 to avoid a compiler warning, due to using a pointer where a
1284 wordsize integer type was expected.
1286 @item jit_prepare();
1288 @itemx jit_finishi(printf);
1289 Once the arguments to @code{printf} have been pushed, what means
1290 moving them to stack or register arguments, the @code{printf}
1291 function is called and the stack cleaned. Note how @lightning{}
1292 abstracts the differences between different architectures and
1293 ABI's -- the client program does not know how parameter passing
1294 works on the host architecture.
1297 Usually it is not required to call @code{epilog}, but because it
1298 is implicitly called when noticing the end of a function, if the
1299 @code{end} variable was set with a @code{note} call after the
1300 @code{ret}, it would not consider the function epilog.
1302 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1303 This calls the generate jit function passing as argument the offset
1304 difference from the @code{start} and @code{end} notes. The @code{address}
1305 call must be done after the @code{emit} call or either a fatal error
1306 will happen (if @lightning{} is built with assertions enable) or an
1307 undefined value will be returned.
1309 @item jit_clear_state();
1310 Note that @code{jit_clear_state} was called after executing jit in
1311 this example. It was done because it must be called after any call
1312 to @code{jit_address} or @code{jit_print}.
1314 @item jit_disassemble();
1315 @code{disassemble} will dump the generated code to standard output,
1316 unless @lightning{} was built with the disassembler disabled, in which
1317 case no output will be shown.
1320 @node RPN calculator
1321 @section A more complex example, an RPN calculator
1323 We create a small stack-based RPN calculator which applies a series
1324 of operators to a given parameter and to other numeric operands.
1325 Unlike previous examples, the code generator is fully parameterized
1326 and is able to compile different formulas to different functions.
1327 Here is the code for the expression compiler; a sample usage will
1330 Since @lightning{} does not provide push/pop instruction, this
1331 example uses a stack-allocated area to store the data. Such an
1332 area can be allocated using the macro @code{allocai}, which
1333 receives the number of bytes to allocate and returns the offset
1334 from the frame pointer register @code{FP} to the base of the
1337 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1338 to access stack-allocated variables. However, it is possible to
1339 use operations such as @code{add} to compute the address of the
1340 variables, and pass the address around.
1344 #include <lightning.h>
1346 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1348 static jit_state_t *_jit;
1350 void stack_push(int reg, int *sp)
1352 jit_stxi_i (*sp, JIT_FP, reg);
1353 *sp += sizeof (int);
1356 void stack_pop(int reg, int *sp)
1358 *sp -= sizeof (int);
1359 jit_ldxi_i (reg, JIT_FP, *sp);
1362 jit_node_t *compile_rpn(char *expr)
1364 jit_node_t *in, *fn;
1365 int stack_base, stack_ptr;
1367 fn = jit_note(NULL, 0);
1370 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1372 jit_getarg(JIT_R2, in);
1377 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1379 stack_push(JIT_R0, &stack_ptr);
1380 jit_movi(JIT_R0, atoi(buf));
1381 @} else if (*expr == 'x') @{
1382 stack_push(JIT_R0, &stack_ptr);
1383 jit_movr(JIT_R0, JIT_R2);
1384 @} else if (*expr == '+') @{
1385 stack_pop(JIT_R1, &stack_ptr);
1386 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1387 @} else if (*expr == '-') @{
1388 stack_pop(JIT_R1, &stack_ptr);
1389 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1390 @} else if (*expr == '*') @{
1391 stack_pop(JIT_R1, &stack_ptr);
1392 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1393 @} else if (*expr == '/') @{
1394 stack_pop(JIT_R1, &stack_ptr);
1395 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1397 fprintf(stderr, "cannot compile: %s\n", expr);
1408 The principle on which the calculator is based is easy: the stack top
1409 is held in R0, while the remaining items of the stack are held in the
1410 memory area that we allocate with @code{allocai}. Compiling a numeric
1411 operand or the argument @code{x} pushes the old stack top onto the
1412 stack and moves the operand into R0; compiling an operator pops the
1413 second operand off the stack into R1, and compiles the operation so
1414 that the result goes into R0, thus becoming the new stack top.
1416 This example allocates a fixed area for 32 @code{int}s. This is not
1417 a problem when the function is a leaf like in this case; in a full-blown
1418 compiler you will want to analyze the input and determine the number
1419 of needed stack slots---a very simple example of register allocation.
1420 The area is then managed like a stack using @code{stack_push} and
1423 Source code for the client (which lies in the same source file) follows:
1426 int main(int argc, char *argv[])
1428 jit_node_t *nc, *nf;
1433 _jit = jit_new_state();
1435 nc = compile_rpn("32x9*5/+");
1436 nf = compile_rpn("x32-5*9/");
1438 c2f = (pifi)jit_address(nc);
1439 f2c = (pifi)jit_address(nf);
1443 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1445 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1449 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1451 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1454 jit_destroy_state();
1460 The client displays a conversion table between Celsius and Fahrenheit
1461 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1462 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1465 Providing the formula as an argument to @code{compile_rpn} effectively
1466 parameterizes code generation, making it possible to use the same code
1467 to compile different functions; this is what makes dynamic code
1468 generation so powerful.
1471 @section Fibonacci numbers
1473 The code in this section calculates the Fibonacci sequence. That is
1474 modeled by the recurrence relation:
1478 f(n) = f(n-1) + f(n-2)
1481 The purpose of this example is to introduce branches. There are two
1482 kind of branches: backward branches and forward branches. We'll
1483 present the calculation in a recursive and iterative form; the
1484 former only uses forward branches, while the latter uses both.
1488 #include <lightning.h>
1490 static jit_state_t *_jit;
1492 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1494 int main(int argc, char *argv[])
1499 jit_node_t *in; @rem{/* offset of the argument */}
1500 jit_node_t *ref; @rem{/* to patch the forward reference */}
1501 jit_node_t *zero; @rem{/* to patch the forward reference */}
1504 _jit = jit_new_state();
1506 label = jit_label();
1509 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1510 zero = jit_beqi (JIT_R0, 0);
1511 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1512 jit_movi (JIT_R0, 1);
1513 ref = jit_blei (JIT_V0, 2);
1514 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1515 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1517 jit_pushargr(JIT_V1);
1518 call = jit_finishi(NULL);
1519 jit_patch_at(call, label);
1520 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1522 jit_pushargr(JIT_V2);
1523 call = jit_finishi(NULL);
1524 jit_patch_at(call, label);
1525 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1526 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1528 jit_patch(ref); @rem{/* patch jump */}
1529 jit_patch(zero); @rem{/* patch jump */}
1532 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1535 printf("fib(%d) = %d\n", 32, fib(32));
1536 jit_destroy_state();
1542 As said above, this is the first example of dynamically compiling
1543 branches. Branch instructions have two operands containing the
1544 values to be compared, and return a @code{jit_note_t *} object
1547 Because labels final address are only known after calling @code{emit},
1548 it is required to call @code{patch} or @code{patch_at}, what does
1549 tell @lightning{} that the target to patch is actually a pointer to
1550 a @code{jit_node_t *} object, otherwise, it would assume that is
1551 a pointer to a C function. Note that conditional branches do not
1552 receive a label argument, so they must be patched.
1554 You need to call @code{patch_at} on the return of value @code{calli},
1555 @code{finishi}, and @code{calli} if it is actually referencing a label
1556 in the jit code. All branch instructions do not receive a label
1557 argument. Note that @code{movi} is an special case, and patching it
1558 is usually done to get the final address of a label, usually to later
1561 Now, here is the iterative version:
1565 #include <lightning.h>
1567 static jit_state_t *_jit;
1569 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1571 int main(int argc, char *argv[])
1574 jit_node_t *in; @rem{/* offset of the argument */}
1575 jit_node_t *ref; @rem{/* to patch the forward reference */}
1576 jit_node_t *zero; @rem{/* to patch the forward reference */}
1577 jit_node_t *jump; @rem{/* jump to start of loop */}
1578 jit_node_t *loop; @rem{/* start of the loop */}
1581 _jit = jit_new_state();
1585 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1586 zero = jit_beqi (JIT_R0, 0);
1587 jit_movr (JIT_R1, JIT_R0);
1588 jit_movi (JIT_R0, 1);
1589 ref = jit_blti (JIT_R1, 2);
1590 jit_subi (JIT_R2, JIT_R2, 2);
1591 jit_movr (JIT_R1, JIT_R0);
1594 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1595 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1596 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1597 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1598 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1599 jit_patch_at(jump, loop);
1601 jit_patch(ref); @rem{/* patch forward jump */}
1602 jit_patch(zero); @rem{/* patch forward jump */}
1605 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1608 printf("fib(%d) = %d\n", 36, fib(36));
1609 jit_destroy_state();
1615 This code calculates the recurrence relation using iteration (a
1616 @code{for} loop in high-level languages). There are no function
1617 calls anymore: instead, there is a backward jump (the @code{bnei} at
1618 the end of the loop).
1620 Note that the program must remember the address for backward jumps;
1621 for forward jumps it is only required to remember the jump code,
1622 and call @code{patch} for the implicit label.
1625 @chapter Re-entrant usage of @lightning{}
1627 @lightning{} uses the special @code{_jit} identifier. To be able
1628 to be able to use multiple jit generation states at the same
1629 time, it is required to used code similar to:
1632 struct jit_state lightning;
1633 #define lightning _jit
1636 This will cause the symbol defined to @code{_jit} to be passed as
1637 the first argument to the underlying @lightning{} implementation,
1638 that is usually a function with an @code{_} (underscode) prefix
1639 and with an argument named @code{_jit}, in the pattern:
1642 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1643 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1646 The reason for this is to use the same syntax as the initial lightning
1647 implementation and to avoid needing the user to keep adding an extra
1648 argument to every call, as multiple jit states generating code in
1649 paralell should be very uncommon.
1652 @chapter Accessing the whole register file
1654 As mentioned earlier in this chapter, all @lightning{} back-ends are
1655 guaranteed to have at least six general-purpose integer registers and
1656 six floating-point registers, but many back-ends will have more.
1658 To access the entire register files, you can use the
1659 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1660 accept a parameter that identifies the register number, which
1661 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1662 and @code{JIT_F_NUM} respectively; the number need not be
1663 constant. Of course, expressions like @code{JIT_R0} and
1664 @code{JIT_R(0)} denote the same register, and likewise for
1665 integer callee-saved, or floating-point, registers.
1667 @section Scratch registers
1669 For operations, @lightning{} does not support directly, like storing
1670 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1671 acquire and release a scratch register as in the following pattern:
1674 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1675 jit_movi (reg, immediate);
1676 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1677 jit_unget_reg (reg);
1680 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1681 reloads but don't follow branches, the code between both must be in
1682 the same basic block and must not contain any branches as in the
1683 following (bad) example.
1686 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1687 jit_ldxi (reg, JIT_V0, offset);
1688 jump = jit_bnei (reg, V0);
1689 jit_movr (JIT_V1, reg);
1691 jit_unget_reg (reg);
1694 @node Customizations
1695 @chapter Customizations
1697 Frequently it is desirable to have more control over how code is
1698 generated or how memory is used during jit generation or execution.
1700 @section Memory functions
1701 To aid in complete control of memory allocation and deallocation
1702 @lightning{} provides wrappers that default to standard @code{malloc},
1703 @code{realloc} and @code{free}. These are loosely based on the
1704 GNU GMP counterparts, with the difference that they use the same
1705 prototype of the system allocation functions, that is, no @code{size}
1706 for @code{free} or @code{old_size} for @code{realloc}.
1708 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1709 @lightning{} guarantees that memory is only allocated or released
1710 using these wrapped functions, but you must note that if lightning
1711 was linked to GNU binutils, malloc is probably will be called multiple
1712 times from there when initializing the disassembler.
1714 Because @code{init_jit} may call memory functions, if you need to call
1715 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1716 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1717 previous or default wrappers will be passed.
1720 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1721 Get the current memory allocation function. Also, unlike the GNU GMP
1722 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1726 Unless an alternate code buffer is used (see below), @code{jit_emit}
1727 set the access protections that the code buffer's memory can be read and
1728 executed, but not modified. One can use the following functions after
1729 @code{jit_emit} but before @code{jit_clear} to temporarily lift the
1732 @deftypefun void jit_unprotect ()
1733 Changes the access protection that the code buffer's memory can be read and
1734 modified. Before the emitted code can be invoked, @code{jit_protect}
1735 has to be called to reset the change.
1737 This procedure has no effect when an alternate code buffer (see below) is used.
1740 @deftypefun void jit_protect ()
1741 Changes the access protection that the code buffer's memory can be read and
1744 This procedure has no effect when an alternate code buffer (see below) is used.
1747 @section Alternate code buffer
1748 To instruct @lightning{} to use an alternate code buffer it is required
1749 to call @code{jit_realize} before @code{jit_emit}, and then query states
1750 and customize as appropriate.
1752 @deftypefun void jit_realize ()
1753 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1754 that no other @code{jit_xyz} call will be made.
1757 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1758 Returns NULL or the previous value set with @code{jit_set_code}, and
1759 sets the @var{code_size} argument to an appropriate value.
1760 If @code{jit_get_code} is called before @code{jit_emit}, the
1761 @var{code_size} argument is set to the expected amount of bytes
1762 required to generate code.
1763 If @code{jit_get_code} is called after @code{jit_emit}, the
1764 @var{code_size} argument is set to the exact amount of bytes used
1768 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1769 Instructs @lightning{} to output to the @var{code} argument and
1770 use @var{size} as a guard to not write to invalid memory. If during
1771 @code{jit_emit} @lightning{} finds out that the code would not fit
1772 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1775 A simple example of a loop using an alternate buffer is:
1779 int *(func)(int); @rem{/* function pointer */}
1780 jit_word_t code_size;
1781 jit_word_t real_code_size;
1783 jit_realize(); @rem{/* ready to generate code */}
1784 jit_get_code(&code_size); @rem{/* get expected code size */}
1785 code_size = (code_size + 4095) & -4096;
1787 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1788 MAP_PRIVATE | MAP_ANON, -1, 0);
1789 jit_set_code(code, code_size);
1790 if ((func = jit_emit()) == NULL) @{
1791 munmap(code, code_size);
1794 @} while (func == NULL);
1795 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1798 The first call to @code{jit_get_code} should return @code{NULL} and set
1799 the @code{code_size} argument to the expected amount of bytes required
1801 The second call to @code{jit_get_code} is after a successful call to
1802 @code{jit_emit}, and will return the value previously set with
1803 @code{jit_set_code} and set the @code{real_code_size} argument to the
1804 exact amount of bytes used to emit the code.
1806 @section Alternate data buffer
1807 Sometimes it may be desirable to customize how, or to prevent
1808 @lightning{} from using an extra buffer for constants or debug
1809 annotation. Usually when also using an alternate code buffer.
1811 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1812 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1813 and sets the @var{data_size} argument to how many bytes are required
1814 for the constants data buffer, and @var{note_size} to how many bytes
1815 are required to store the debug note information.
1816 Note that it always preallocate one debug note entry even if
1817 @code{jit_name} or @code{jit_note} are never called, but will return
1818 zero in the @var{data_size} argument if no constant is required;
1819 constants are only used for the @code{float} and @code{double} operations
1820 that have an immediate argument, and not in all @lightning{} ports.
1823 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1825 @var{data} can be NULL if disabling constants and annotations, otherwise,
1826 a valid pointer must be passed. An assertion is done that the data will
1827 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1828 with @code{-DNDEBUG}).
1830 @var{size} tells the space in bytes available in @var{data}.
1832 @var{flags} can be zero to tell to just use the alternate data buffer,
1833 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1836 @item JIT_DISABLE_DATA
1837 @cindex JIT_DISABLE_DATA
1838 Instructs @lightning{} to not use a constant table, but to use an
1839 alternate method to synthesize those, usually with a larger code
1840 sequence using stack space to transfer the value from a GPR to a
1843 @item JIT_DISABLE_NOTE
1844 @cindex JIT_DISABLE_NOTE
1845 Instructs @lightning{} to not store file or function name, and
1846 line numbers in the constant buffer.
1850 A simple example of a preventing usage of a data buffer is:
1854 jit_realize(); @rem{/* ready to generate code */}
1855 jit_get_data(NULL, NULL);
1856 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1860 Or to only use a data buffer, if required:
1864 jit_word_t data_size;
1866 jit_realize(); @rem{/* ready to generate code */}
1867 jit_get_data(&data_size, NULL);
1869 data = malloc(data_size);
1872 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1879 @node Acknowledgements
1880 @chapter Acknowledgements
1882 As far as I know, the first general-purpose portable dynamic code
1883 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1884 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1885 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1886 directly inspired @lightning{}.
1888 Thanks go to Ian Piumarta, who kindly accepted to release his own
1889 program @sc{ccg} under the GNU General Public License, thereby allowing
1890 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1891 @sc{ccg} provides a way of dynamically assemble programs written in the
1892 underlying architecture's assembly language. So it is not portable,
1893 yet very interesting.
1895 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1896 was first developed as a tool to be used in GNU Smalltalk's dynamic
1897 translator from bytecodes to native code.