2 @dircategory Software development
4 * lightning: (lightning). Library for dynamic code generation.
24 This document describes @value{TOPIC} the @lightning{} library for
25 dynamic code generation.
28 * Overview:: What GNU lightning is
29 * Installation:: Configuring and installing GNU lightning
30 * The instruction set:: The RISC instruction set used in GNU lightning
31 * GNU lightning examples:: GNU lightning's examples
32 * Reentrancy:: Re-entrant usage of GNU lightning
33 * Registers:: Accessing the whole register file
34 * Customizations:: Advanced code generation customizations
35 * Acknowledgements:: Acknowledgements for GNU lightning
40 @chapter Introduction to @lightning{}
43 This document describes @value{TOPIC} the @lightning{} library for
44 dynamic code generation.
47 Dynamic code generation is the generation of machine code
48 at runtime. It is typically used to strip a layer of interpretation
49 by allowing compilation to occur at runtime. One of the most
50 well-known applications of dynamic code generation is perhaps that
51 of interpreters that compile source code to an intermediate bytecode
52 form, which is then recompiled to machine code at run-time: this
53 approach effectively combines the portability of bytecode
54 representations with the speed of machine code. Another common
55 application of dynamic code generation is in the field of hardware
56 simulators and binary emulators, which can use the same techniques
57 to translate simulated instructions to the instructions of the
60 Yet other applications come to mind: for example, windowing
61 @dfn{bitblt} operations, matrix manipulations, and network packet
62 filters. Albeit very powerful and relatively well known within the
63 compiler community, dynamic code generation techniques are rarely
64 exploited to their full potential and, with the exception of the
65 two applications described above, have remained curiosities because
66 of their portability and functionality barriers: binary instructions
67 are generated, so programs using dynamic code generation must be
68 retargeted for each machine; in addition, coding a run-time code
69 generator is a tedious and error-prone task more than a difficult one.
71 @lightning{} provides a portable, fast and easily retargetable dynamic
72 code generation system.
74 To be portable, @lightning{} abstracts over current architectures'
75 quirks and unorthogonalities. The interface that it exposes to is that
76 of a standardized RISC architecture loosely based on the SPARC and MIPS
77 chips. There are a few general-purpose registers (six, not including
78 those used to receive and pass parameters between subroutines), and
79 arithmetic operations involve three operands---either three registers
80 or two registers and an arbitrarily sized immediate value.
82 On one hand, this architecture is general enough that it is possible to
83 generate pretty efficient code even on CISC architectures such as the
84 Intel x86 or the Motorola 68k families. On the other hand, it matches
85 real architectures closely enough that, most of the time, the
86 compiler's constant folding pass ends up generating code which
87 assembles machine instructions without further tests.
90 @chapter Configuring and installing @lightning{}
92 Here we will assume that your system already has the dependencies
93 necessary to build @lightning{}. For more on dependencies, see
94 @lightning{}'s @file{README-hacking} file.
96 The first thing to do to build @lightning{} is to configure the
97 program, picking the set of macros to be used on the host
98 architecture; this configuration is automatically performed by
99 the @file{configure} shell script; to run it, merely type:
104 The @file{configure} accepts the @code{--enable-disassembler} option,
105 hat enables linking to GNU binutils and optionally print human readable
106 disassembly of the jit code. This option can be disabled by the
107 @code{--disable-disassembler} option.
109 @file{configure} also accepts the @code{--enable-devel-disassembler},
110 option useful to check exactly hat machine instructions were generated
111 for a @lightning{} instrction. Basically mixing @code{jit_print} and
112 @code{jit_disassembly}.
114 The @code{--enable-assertions} option, which enables several consistency
115 hecks in the run-time assemblers. These are not usually needed, so you
116 can decide to simply forget about it; also remember that these consistency
117 checks tend to slow down your code generator.
119 The @code{--enable-devel-strong-type-checking} option that does extra type
120 checking using @code{assert}. This option also enables the
121 @code{--enable-assertions} unless it is explicitly disabled.
123 The option @code{--enable-devel-get-jit-size} should only be used
124 when doing updates or maintenance to lightning. It regenerates the
125 @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
126 translating a @lightning{} instruction to machine code.
128 After you've configured @lightning{}, run @file{make} as usual.
130 @lightning{} has an extensive set of tests to validate it is working
131 correctly in the build host. To test it run:
136 The next important step is:
141 This ends the process of installing @lightning{}.
143 @node The instruction set
144 @chapter @lightning{}'s instruction set
146 @lightning{}'s instruction set was designed by deriving instructions
147 that closely match those of most existing RISC architectures, or
148 that can be easily syntesized if absent. Each instruction is composed
152 an operation, like @code{sub} or @code{mul}
155 most times, a register/immediate flag (@code{r} or @code{i})
158 an unsigned modifier (@code{u}), a type identifier or two, when applicable.
161 Examples of legal mnemonics are @code{addr} (integer add, with three
162 register operands) and @code{muli} (integer multiply, with two
163 register operands and an immediate operand). Each instruction takes
164 two or three operands; in most cases, one of them can be an immediate
165 value instead of a register.
167 Most @lightning{} integer operations are signed wordsize operations,
168 with the exception of operations that convert types, or load or store
169 values to/from memory. When applicable, the types and C types are as
174 _uc @r{unsigned char}
176 _us @r{unsigned short}
184 Most integer operations do not need a type modifier, and when loading or
185 storing values to memory there is an alias to the proper operation
186 using wordsize operands, that is, if ommited, the type is @r{int} on
187 32-bit architectures and @r{long} on 64-bit architectures. Note
188 that lightning also expects @code{sizeof(void*)} to match the wordsize.
190 When an unsigned operation result differs from the equivalent signed
191 operation, there is a the @code{_u} modifier.
193 There are at least seven integer registers, of which six are
194 general-purpose, while the last is used to contain the frame pointer
195 (@code{FP}). The frame pointer can be used to allocate and access local
196 variables on the stack, using the @code{allocai} or @code{allocar}
199 Of the general-purpose registers, at least three are guaranteed to be
200 preserved across function calls (@code{V0}, @code{V1} and
201 @code{V2}) and at least three are not (@code{R0}, @code{R1} and
202 @code{R2}). Six registers are not very much, but this
203 restriction was forced by the need to target CISC architectures
204 which, like the x86, are poor of registers; anyway, backends can
205 specify the actual number of available registers with the calls
206 @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
207 (for callee-save registers).
209 There are at least six floating-point registers, named @code{F0} to
210 @code{F5}. These are usually caller-save and are separate from the integer
211 registers on the supported architectures; on Intel architectures,
212 in 32 bit mode if SSE2 is not available or use of X87 is forced,
213 the register stack is mapped to a flat register file. As for the
214 integer registers, the macro @code{JIT_F_NUM} yields the number of
215 floating-point registers.
217 The complete instruction set follows; as you can see, most non-memory
218 operations only take integers (either signed or unsigned) as operands;
219 this was done in order to reduce the instruction set, and because most
220 architectures only provide word and long word operations on registers.
221 There are instructions that allow operands to be extended to fit a larger
222 data type, both in a signed and in an unsigned way.
225 @item Binary ALU operations
226 These accept three operands; the last one can be an immediate.
227 @code{addx} operations must directly follow @code{addc}, and
228 @code{subx} must follow @code{subc}; otherwise, results are undefined.
229 Most, if not all, architectures do not support @r{float} or @r{double}
230 immediate operands; lightning emulates those operations by moving the
231 immediate to a temporary register and emiting the call with only
234 addr _f _d O1 = O2 + O3
235 addi _f _d O1 = O2 + O3
236 addxr O1 = O2 + (O3 + carry)
237 addxi O1 = O2 + (O3 + carry)
238 addcr O1 = O2 + O3, set carry
239 addci O1 = O2 + O3, set carry
240 subr _f _d O1 = O2 - O3
241 subi _f _d O1 = O2 - O3
242 subxr O1 = O2 - (O3 + carry)
243 subxi O1 = O2 - (O3 + carry)
244 subcr O1 = O2 - O3, set carry
245 subci O1 = O2 - O3, set carry
246 rsbr _f _d O1 = O3 - O1
247 rsbi _f _d O1 = O3 - O1
248 mulr _f _d O1 = O2 * O3
249 muli _f _d O1 = O2 * O3
250 hmulr _u O1 = ((O2 * O3) >> WORDSIZE)
251 hmuli _u O1 = ((O2 * O3) >> WORDSIZE)
252 divr _u _f _d O1 = O2 / O3
253 divi _u _f _d O1 = O2 / O3
264 rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
265 rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
266 lrotr O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
267 lroti O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
268 rrotr O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
269 rroti O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
270 movzr O1 = O3 ? O1 : O2
271 movnr O1 = O3 ? O2 : O1
274 Note that @code{lrotr}, @code{lroti}, @code{rrotr} and @code{rroti}
275 are described as the fallback operation. These are bit shift/rotation
278 @item Four operand binary ALU operations
279 These accept two result registers, and two operands; the last one can
280 be an immediate. The first two arguments cannot be the same register.
282 @code{qmul} stores the low word of the result in @code{O1} and the
283 high word in @code{O2}. For unsigned multiplication, @code{O2} zero
284 means there was no overflow. For signed multiplication, no overflow
285 check is based on sign, and can be detected if @code{O2} is zero or
288 @code{qdiv} stores the quotient in @code{O1} and the remainder in
289 @code{O2}. It can be used as quick way to check if a division is
290 exact, in which case the remainder is zero.
292 @code{qlsh} shifts from 0 to @emph{wordsize}, doing a normal left
293 shift for the first result register and setting the second result
294 resister to the overflow bits. @code{qlsh} can be used as a quick
295 way to multiply by powers of two.
297 @code{qrsh} shifts from 0 to @emph{wordsize}, doing a normal right
298 shift for the first result register and setting the second result
299 register to the overflow bits. @code{qrsh} can be used as a quick
300 way to divide by powers of two.
302 Note that @code{qlsh} and @code{qrsh} are basically implemented as
303 two shifts. It is undefined behavior to pass a value not in the range
304 0 to @emph{wordsize}. Most cpus will usually @code{and} the shift
305 amount with @emph{wordsize} - 1, or possible use the @emph{remainder}.
306 @lightning{} only generates code to specially handle 0 and @emph{wordsize}
307 shifts. Since in a code generator for a @emph{safe language} should
308 usually check the shift amount, these instructions usually should be
309 used as a fast path to check for division without remainder or
310 multiplication that does not overflow.
313 qmulr _u O1 O2 = O3 * O4
314 qmuli _u O1 O2 = O3 * O4
315 qdivr _u O1 O2 = O3 / O4
316 qdivi _u O1 O2 = O3 / O4
317 qlshr _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
318 qlshi _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
319 qrshr _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
320 qrshi _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
323 These four operand ALU operations are only defined for float operands.
326 fmar _f _d O1 = O2 * O3 + O4
327 fmai _f _d O1 = O2 * O3 + O4
328 fmsr _f _d O1 = O2 * O3 - O4
329 fmsi _f _d O1 = O2 * O3 - O4
330 fnmar _f _d O1 = -O2 * O3 - O4
331 fnmai _f _d O1 = -O2 * O3 - O4
332 fnmsr _f _d O1 = -O2 * O3 + O4
333 fnmsi _f _d O1 = -O2 * O3 + O4
336 These are a family of fused multiply-add instructions.
337 Note that @lightning{} does not handle rounding modes nor math exceptions.
338 Also note that not all backends provide a instruction for the equivalent
339 @lightning{} instruction presented above. Some are completely implemented
340 as fallbacks and some are composed of one or more instructions. For common
341 input this should not cause major issues, but note that when implemented by
342 the cpu, these are implemented as the multiplication calculated with infinite
343 precision, and after the addition step rounding is done. Due to this, For
344 specially crafted input different ports might show different output. When
345 implemented by the CPU, it is also possible to have exceptions that do
346 not happen if implemented as a fallback.
348 @item Unary ALU operations
349 These accept two operands, the first must be a register and the
350 second is a register if the @code{r} modifier is used, otherwise,
351 the @code{i} modifier is used and the second argument is a constant.
358 clor O1 = number of leading one bits in O2
359 cloi O1 = number of leading one bits in O2
360 clzr O1 = number of leading zero bits in O2
361 clzi O1 = number of leading zero bits in O2
362 ctor O1 = number of trailing one bits in O2
363 ctoi O1 = number of trailing one bits in O2
364 ctzr O1 = number of trailing zero bits in O2
365 ctzi O1 = number of trailing zero bits in O2
366 rbitr O1 = bits of O2 reversed
367 rbiti O1 = bits of O2 reversed
368 popcntr O1 = number of bits set in O2
369 popcnti O1 = number of bits set in O2
372 Note that @code{ctzr} is basically equivalent of a @code{C} call
373 @code{ffs} but indexed at bit zero, not one.
375 Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
376 value of zero is not an error, it just returns the number of bits
377 in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
380 The @code{clor} and @code{ctor} are just counterparts of the versions
381 that search for zero bits.
383 These unary ALU operations are only defined for float operands.
386 absr _f _d O1 = fabs(O2)
387 absi _f _d O1 = fabs(O2)
388 sqrtr _f _d O1 = sqrt(O2)
389 sqrti _f _d O1 = sqrt(O2)
392 Note that for @code{float} and @code{double} unary operations, @lightning{}
393 will generate code to actually execute the operation at runtime.
395 @item Compare instructions
396 These accept three operands; again, the last can be an immediate.
397 The last two operands are compared, and the first operand, that must be
398 an integer register, is set to either 0 or 1, according to whether the
399 given condition was met or not.
401 The conditions given below are for the standard behavior of C,
402 where the ``unordered'' comparison result is mapped to false.
405 ltr _u _f _d O1 = (O2 < O3)
406 lti _u _f _d O1 = (O2 < O3)
407 ler _u _f _d O1 = (O2 <= O3)
408 lei _u _f _d O1 = (O2 <= O3)
409 gtr _u _f _d O1 = (O2 > O3)
410 gti _u _f _d O1 = (O2 > O3)
411 ger _u _f _d O1 = (O2 >= O3)
412 gei _u _f _d O1 = (O2 >= O3)
413 eqr _f _d O1 = (O2 == O3)
414 eqi _f _d O1 = (O2 == O3)
415 ner _f _d O1 = (O2 != O3)
416 nei _f _d O1 = (O2 != O3)
417 unltr _f _d O1 = !(O2 >= O3)
418 unler _f _d O1 = !(O2 > O3)
419 ungtr _f _d O1 = !(O2 <= O3)
420 unger _f _d O1 = !(O2 < O3)
421 uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
422 ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
423 ordr _f _d O1 = (O2 == O2) && (O3 == O3)
424 unordr _f _d O1 = (O2 != O2) || (O3 != O3)
427 @item Transfer operations
428 These accept two operands; for @code{ext} both of them must be
429 registers, while @code{mov} accepts an immediate value as the second
432 Unlike @code{movr} and @code{movi}, the other instructions are used
433 to truncate a wordsize operand to a smaller integer data type or to
434 convert float data types. You can also use @code{extr} to convert an
435 integer to a floating point value: the usual options are @code{extr_f}
441 extr _c _uc _s _us _i _ui _f _d O1 = O2
442 truncr _f _d O1 = trunc(O2)
443 extr O1 = sign_extend(O2[O3:O3+04])
444 extr_u O1 = O2[O3:O3+04]
445 depr O1[O3:O3+O4] = O2
448 @code{extr}, @code{extr_u} and @code{depr} are useful to access @code{C}
449 compatible bit fields, provided that these are contained in a machine
450 word. @code{extr} is used to @emph{extract} and signed extend a value
451 from a bit field. @code{extr_u} is used to @emph{extract} and zero
452 extend a value from a bit field. @code{depr} is used to @emph{deposit}
453 a value into a bit field.
456 extr(result, source, offset, length)
457 extr_u(result, source, offset, length)
458 depr(result, source, offset, length)
461 A common way to declare @code{C} and @lightning{} compatible bit fields is:
465 jit_word_t signed_bits: @code{length};
466 jit_uword_t unsigned_bits: @code{length};
469 jit_word_t signed_value;
470 jit_uword_t unsigned_value;
474 In 64-bit architectures it may be required to use @code{truncr_f_i},
475 @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
476 the equivalent C code. Only the @code{_i} modifier is available in
477 32-bit architectures.
480 truncr_f_i <int> O1 = <float> O2
481 truncr_f_l <long>O1 = <float> O2
482 truncr_d_i <int> O1 = <double>O2
483 truncr_d_l <long>O1 = <double>O2
486 The float conversion operations are @emph{destination first,
487 source second}, but the order of the types is reversed. This happens
488 for historical reasons.
491 extr_f_d <double>O1 = <float> O2
492 extr_d_f <float> O1 = <double>O2
495 The float to/from integer transfer operations are also @emph{destination
496 first, source second}. These were added later, but follow the pattern
497 of historic patterns.
500 movr_w_f <float>O1 = <int>O2
501 movi_w_f <float>O1 = <int>O2
502 movr_f_w <int>O1 = <float>O2
503 movi_f_w <int>O1 = <float>O2
504 movr_w_d <double>O1 = <long>O2
505 movi_w_d <double>O1 = <long>O2
506 movr_d_w <long>O1 = <double>O2
507 movi_d_w <long>O1 = <double>O2
508 movr_ww_d <double>O1 = [<int>O2:<int>O3]
509 movi_ww_d <double>O1 = [<int>O2:<int>O3]
510 movr_d_ww [<int>O1:<int>O2] = <double>O3
511 movi_d_ww [<int>O1:<int>O2] = <double>O3
514 These are used to transfer bits to/from floats to/from integers, and are
515 useful to access bits of floating point values.
517 @code{movr_w_d}, @code{movi_w_d}, @code{movr_d_w} and @code{movi_d_w} are
518 only available in 64-bit. Conversely, @code{movr_ww_d}, @code{movi_ww_d},
519 @code{movr_d_ww} and @code{movi_d_ww} are only available in 32-bit.
520 For the int pair to/from double transfers, integer arguments must respect
521 endianess, to match how the cpu handles the verbatim byte values.
523 @item Network extensions
524 These accept two operands, both of which must be registers; these
525 two instructions actually perform the same task, yet they are
526 assigned to two mnemonics for the sake of convenience and
527 completeness. As usual, the first operand is the destination and
528 the second is the source.
529 The @code{_ul} variant is only available in 64-bit architectures.
531 htonr _us _ui _ul @r{Host-to-network (big endian) order}
532 ntohr _us _ui _ul @r{Network-to-host order }
535 @code{bswapr} can be used to unconditionally byte-swap an operand.
536 On little-endian architectures, @code{htonr} and @code{ntohr} resolve
538 The @code{_ul} variant is only available in 64-bit architectures.
540 bswapr _us _ui _ul 01 = byte_swap(02)
543 @item Load operations
544 @code{ld} accepts two operands while @code{ldx} accepts three;
545 in both cases, the last can be either a register or an immediate
546 value. Values are extended (with or without sign, according to
547 the data type specification) to fit a whole register.
548 The @code{_ui} and @code{_l} types are only available in 64-bit
549 architectures. For convenience, there is a version without a
550 type modifier for integer or pointer operands that uses the
551 appropriate wordsize call.
553 ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
554 ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
555 ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
556 ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
559 @item Store operations
560 @code{st} accepts two operands while @code{stx} accepts three; in
561 both cases, the first can be either a register or an immediate
562 value. Values are sign-extended to fit a whole register.
564 str _c _s _i _l _f _d *O1 = O2
565 sti _c _s _i _l _f _d *O1 = O2
566 stxr _c _s _i _l _f _d *(O1+O2) = O3
567 stxi _c _s _i _l _f _d *(O1+O2) = O3
569 Note that the unsigned type modifier is not available, as the store
570 only writes to the 1, 2, 4 or 8 sized memory address.
571 The @code{_l} type is only available in 64-bit architectures, and for
572 convenience, there is a version without a type modifier for integer or
573 pointer operands that uses the appropriate wordsize call.
575 @item Unaligned memory access
576 These allow access to integers of size 3, in 32-bit, and extra sizes
577 5, 6 and 7 in 64-bit.
578 For floating point values only support for size 4 and 8 is provided.
580 unldr O1 = *(signed O3 byte integer)* = O2
581 unldi O1 = *(signed O3 byte integer)* = O2
582 unldr_u O1 = *(unsigned O3 byte integer)* = O2
583 unldi_u O1 = *(unsigned O3 byte integer)* = O2
584 unldr_x O1 = *(O3 byte float)* = O2
585 unldi_x O1 = *(O3 byte float)* = O2
586 unstr *(O3 byte integer)O1 = O2
587 unsti *(O3 byte integer)O1 = O2
588 unstr_x *(O3 byte float)O1 = O2
589 unsti_x *(O3 byte float)O1 = O2
591 With the exception of non standard sized integers, these might be
592 implemented as normal loads and stores, if the processor supports
593 unaligned memory access, or, mode can be chosen at jit initialization
594 time, to generate or not generate, code that does trap on unaligned
595 memory access. Letting the kernel trap means smaller code generation
596 as it is required to check alignment at runtime@footnote{This requires changing jit_cpu.unaligned to 0 to disable or 1 to enable unaligned code generation. Not all ports have the C jit_cpu.unaligned value.}.
598 @item Argument management
601 prepare (not specified)
602 va_start (not specified)
603 pushargr _c _uc _s _us _i _ui _l _f _d
604 pushargi _c _uc _s _us _i _ui _l _f _d
605 va_push (not specified)
606 arg _c _uc _s _us _i _ui _l _f _d
607 getarg _c _uc _s _us _i _ui _l _f _d
609 putargr _c _uc _s _us _i _ui _l _f _d
610 putargi _c _uc _s _us _i _ui _l _f _d
612 retr _c _uc _s _us _i _ui _l _f _d
613 reti _c _uc _s _us _i _ui _l _f _d
615 va_end (not specified)
616 retval _c _uc _s _us _i _ui _l _f _d
617 epilog (not specified)
619 As with other operations that use a type modifier, the @code{_ui} and
620 @code{_l} types are only available in 64-bit architectures, but there
621 are operations without a type modifier that alias to the appropriate
622 integer operation with wordsize operands.
624 @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
625 while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
626 A code snippet that wants to call another procedure and has to pass
627 arguments must, in order: use the @code{prepare} instruction and use
628 the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
629 left to right order}; and use @code{finish} or @code{call} (explained below)
630 to perform the actual call.
632 Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
633 handling integer types can be used without a type modifier.
634 It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
635 and @code{getarg} otherwise problems will happen if generating jit for
636 environments that require arguments to be truncated and zero or sign
637 extended by the caller and/or excess arguments might be passed packed
638 in the stack. Currently only Apple systems with @code{aarch64} cpus are
639 known to have this restriction.
641 @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
642 arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
643 @code{va_push} is required when passing a @code{va_list} to another function,
644 because not all architectures expect it as a single pointer. Known case
645 is DEC Alpha, that requires it as a structure passed by value.
647 @code{arg}, @code{getarg} and @code{putarg} are used by the callee.
648 @code{arg} is different from other instruction in that it does not
649 actually generate any code: instead, it is a function which returns
650 a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
651 a value'' means that @lightning{} code that compile these
652 instructions return a value when expanded.} You should call
653 @code{arg} as soon as possible, before any function call or, more
654 easily, right after the @code{prolog} instructions
655 (which is treated later).
657 @code{getarg} accepts a register argument and a value returned by
658 @code{arg}, and will move that argument to the register, extending
659 it (with or without sign, according to the data type specification)
660 to fit a whole register. These instructions are more intimately
661 related to the usage of the @lightning{} instruction set in code
662 that generates other code, so they will be treated more
663 specifically in @ref{GNU lightning examples, , Generating code at
666 @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
667 it accepts as first argument a register or immediate, and as
668 second argument a value returned by @code{arg}. It allows changing,
669 or restoring an argument to the current function, and is a
670 construct required to implement tail call optimization. Note that
671 arguments in registers are very cheap, but will be overwritten
672 at any moment, including on some operations, for example division,
673 that on several ports is implemented as a function call.
675 Finally, the @code{retval} instruction fetches the return value of a
676 called function in a register. The @code{retval} instruction takes a
677 register argument and copies the return value of the previously called
678 function in that register. A function with a return value should use
679 @code{retr} or @code{reti} to put the return value in the return register
680 before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
682 @code{epilog} is an optional call, that marks the end of a function
683 body. It is automatically generated by @lightning{} if starting a new
684 function (what should be done after a @code{ret} call) or finishing
686 It is very important to note that the fact that @code{epilog} being
687 optional may cause a common mistake. Consider this:
696 Because @code{epilog} is added when finding a new @code{prolog},
697 this will cause the @code{fun2} label to actually be before the
698 return from @code{fun1}. Because @lightning{} will actually
710 You should observe a few rules when using these macros. First of
711 all, if calling a varargs function, you should use the @code{ellipsis}
712 call to mark the position of the ellipsis in the C prototype.
714 You should not nest calls to @code{prepare} inside a
715 @code{prepare/finish} block. Doing this will result in undefined
716 behavior. Note that for functions with zero arguments you can use
719 @item Branch instructions
720 Like @code{arg}, these also return a value which, in this case,
721 is to be used to compile forward branches as explained in
722 @ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
723 compared; of these, the last can be either a register or an immediate.
726 bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
727 blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
728 bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
729 blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
730 bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
731 bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
732 bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
733 bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
734 beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
735 beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
736 bner _f _d @r{if }(O2 != O3)@r{ goto }O1
737 bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
739 bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
740 bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
741 bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
742 bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
743 buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
744 bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
745 bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
746 bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
748 bmsr @r{if }O2 & O3@r{ goto }O1
749 bmsi @r{if }O2 & O3@r{ goto }O1
750 bmcr @r{if }!(O2 & O3)@r{ goto }O1
751 bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
752 boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
753 boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
754 bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
755 bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
756 bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
757 bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
758 bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
759 bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
762 Note that the @code{C} code does not have an @code{O1} argument. It is
763 required to always use the return value as an argument to @code{patch},
764 @code{patch_at} or @code{patch_abs}.
766 @item Jump and return operations
767 These accept one argument except @code{ret} and @code{jmpi} which
768 have none; the difference between @code{finishi} and @code{calli}
769 is that the latter does not clean the stack from pushed parameters
770 (if any) and the former must @strong{always} follow a @code{prepare}
773 callr (not specified) @r{function call to register O1}
774 calli (not specified) @r{function call to immediate O1}
775 finishr (not specified) @r{function call to register O1}
776 finishi (not specified) @r{function call to immediate O1}
777 jmpr (not specified) @r{unconditional jump to register}
778 jmpi (not specified) @r{unconditional jump}
779 ret (not specified) @r{return from subroutine}
780 retr _c _uc _s _us _i _ui _l _f _d
781 reti _c _uc _s _us _i _ui _l _f _d
782 retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
786 Like branch instruction, @code{jmpi} also returns a value which is to
787 be used to compile forward branches. @xref{Fibonacci, , Fibonacci
791 There are 3 @lightning{} instructions to create labels:
793 label (not specified) @r{simple label}
794 forward (not specified) @r{forward label}
795 indirect (not specified) @r{special simple label}
798 The following instruction is used to specify a minimal alignment for
799 the next instruction, usually with a label:
801 align (not specified) @r{align code}
804 Similar to @code{align} is the next instruction, also usually used with
807 skip (not specified) @r{skip code}
809 It is used to specify a minimal number of bytes of nops to be inserted
810 before the next instruction.
812 @code{label} is normally used as @code{patch_at} argument for backward
816 jit_node_t *jump, *label;
819 jump = jit_beqr(JIT_R0, JIT_R1);
820 jit_patch_at(jump, label);
823 @code{forward} is used to patch code generation before the actual
824 position of the label is known.
827 jit_node_t *jump, *label;
828 label = jit_forward();
829 jump = jit_beqr(JIT_R0, JIT_R1);
830 jit_patch_at(jump, label);
835 @code{indirect} is useful when creating jump tables, and tells
836 @lightning{} to not optimize out a label that is not the target of
837 any jump, because an indirect jump may land where it is defined.
840 jit_node_t *jump, *label;
842 jmpr(JIT_R0); @rem{/* may jump to label */}
844 label = jit_indirect();
847 @code{indirect} is an special case of @code{note} and @code{name}
848 because it is a valid argument to @code{address}.
850 Note that the usual idiom to write the previous example is
852 jit_node_t *addr, *jump;
853 addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
857 jit_patch(addr); @rem{/* implicit label added */}
860 that automatically binds the implicit label added by @code{patch} with
861 the @code{movi}, but on some special conditions it is required to create
864 @code{align} is useful for creating multiple entry points to a
865 (trampoline) function that are all accessible through a single
866 function pointer. @code{align} receives an integer argument that
867 defines the minimal alignment of the address of a label directly
868 following the @code{align} instruction. The integer argument must be
869 a power of two and the effective alignment will be a power of two no
870 less than the argument to @code{align}. If the argument to
871 @code{align} is 16 or more, the effective alignment will match the
872 specified minimal alignment exactly.
875 jit_node_t *forward, *label1, *label2, *jump;
876 unsigned char *addr1, *addr2;
877 forward = jit_forward();
879 label1 = jit_indirect(); @rem{/* first entry point */}
880 jump = jit_jmpi(); @rem{/* jump to first handler */}
881 jit_patch_at(jump, forward);
883 label2 = jit_indirect(); @rem{/* second entry point */}
884 ... @rem{/* second handler */}
887 ... @rem{/* first handler /*}
891 addr1 = jit_address(label1);
892 addr2 = jit_address(label2);
893 assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
896 @code{skip} is useful for reserving space in the code buffer that can
897 later be filled (possibly with the help of the pair of functions
898 @code{jit_unprotect} and @code{jit_protect}).
900 @item Function prolog
902 These macros are used to set up a function prolog. The @code{allocai}
903 call accept a single integer argument and returns an offset value
904 for stack storage access. The @code{allocar} accepts two registers
905 arguments, the first is set to the offset for stack access, and the
906 second is the size in bytes argument.
909 prolog (not specified) @r{function prolog}
910 allocai (not specified) @r{reserve space on the stack}
911 allocar (not specified) @r{allocate space on the stack}
914 @code{allocai} receives the number of bytes to allocate and returns
915 the offset from the frame pointer register @code{FP} to the base of
918 @code{allocar} receives two register arguments. The first is where
919 to store the offset from the frame pointer register @code{FP} to the
920 base of the area. The second argument is the size in bytes. Note
921 that @code{allocar} is dynamic allocation, and special attention
922 should be taken when using it. If called in a loop, every iteration
923 will allocate stack space. Stack space is aligned from 8 to 64 bytes
924 depending on backend requirements, even if allocating only one byte.
925 It is advisable to not use it with @code{frame} and @code{tramp}; it
926 should work with @code{frame} with special care to call only once,
927 but is not supported if used in @code{tramp}, even if called only
930 As a small appetizer, here is a small function that adds 1 to the input
931 parameter (an @code{int}). I'm using an assembly-like syntax here which
932 is a bit different from the one used when writing real subroutines with
933 @lightning{}; the real syntax will be introduced in @xref{GNU lightning
934 examples, , Generating code at run-time}.
939 in = arg @rem{! We have an integer argument}
940 getarg R0, in @rem{! Move it to R0}
941 addi R0, R0, 1 @rem{! Add 1}
942 retr R0 @rem{! And return the result}
945 And here is another function which uses the @code{printf} function from
946 the standard C library to write a number in hexadecimal notation:
951 in = arg @rem{! Same as above}
953 prepare @rem{! Begin call sequence for printf}
954 pushargi "%x" @rem{! Push format string}
955 ellipsis @rem{! Varargs start here}
956 pushargr R0 @rem{! Push second argument}
957 finishi printf @rem{! Call printf}
958 ret @rem{! Return to caller}
961 @item Register liveness
963 During code generation, @lightning{} occasionally needs scratch registers
964 or needs to use architecture-defined registers. For that, @lightning{}
965 internally maintains register liveness information.
967 In the following example, @code{qdivr} will need special registers like
968 @code{R0} on some architectures. As @lightning{} understands that
969 @code{R0} is used in the subsequent instruction, it will create
970 save/restore code for @code{R0} in case.
979 The same is not true in the example that follows. Here, @code{R0} is
980 not alive after the division operation because @code{R0} is neither an
981 argument register nor a callee-save register. Thus, no save/restore
982 code for @code{R0} will be created in case.
991 The @code{live} instruction can be used to mark a register as live after
992 it as in the following example. Here, @code{R0} will be preserved
1003 The @code{live} instruction is useful at code entry and exit points,
1004 like after and before a @code{callr} instruction.
1006 @item Trampolines, continuations and tail call optimization
1008 Frequently it is required to generate jit code that must jump to
1009 code generated later, possibly from another @code{jit_context_t}.
1010 These require compatible stack frames.
1012 @lightning{} provides two primitives from where trampolines,
1013 continuations and tail call optimization can be implemented.
1016 frame (not specified) @r{create stack frame}
1017 tramp (not specified) @r{assume stack frame}
1020 @code{frame} receives an integer argument@footnote{It is not
1021 automatically computed because it does not know about the
1022 requirement of later generated code.} that defines the size in
1023 bytes for the stack frame of the current, @code{C} callable,
1024 jit function. To calculate this value, a good formula is maximum
1025 number of arguments to any called native function times
1026 eight@footnote{Times eight so that it works for double arguments.
1027 And would not need conditionals for ports that pass arguments in
1028 the stack.}, plus the sum of the arguments to any call to
1029 @code{jit_allocai}. @lightning{} automatically adjusts this value
1030 for any backend specific stack memory it may need, or any
1031 alignment constraint.
1033 @code{frame} also instructs @lightning{} to save all callee
1034 save registers in the prolog and reload in the epilog.
1037 main: @rem{! jit entry point}
1038 prolog @rem{! function prolog}
1039 frame 256 @rem{! save all callee save registers and}
1040 @rem{! reserve at least 256 bytes in stack}
1043 jmpi handler @rem{! jumps to external code}
1045 ret @rem{! return to the caller}
1048 @code{tramp} differs from @code{frame} only that a prolog and epilog
1049 will not be generated. Note that @code{prolog} must still be used.
1050 The code under @code{tramp} must be ready to be entered with a jump
1051 at the prolog position, and instead of a return, it must end with
1052 a non conditional jump. @code{tramp} exists solely for the fact
1053 that it allows optimizing out prolog and epilog code that would
1057 handler: @rem{! handler entry point}
1058 prolog @rem{! function prolog}
1059 tramp 256 @rem{! assumes all callee save registers}
1060 @rem{! are saved and there is at least}
1061 @rem{! 256 bytes in stack}
1063 jmpi main_loop @rem{! return to the main loop}
1066 @lightning{} only supports Tail Call Optimization using the
1067 @code{tramp} construct. Any other way is not guaranteed to
1070 An example of a simple (recursive) tail call optimization:
1073 factorial: @rem{! Entry point of the factorial function}
1075 in = arg @rem{! Receive an integer argument}
1076 getarg R0, in @rem{! Move argument to RO}
1078 pushargi 1 @rem{! This is the accumulator}
1079 pushargr R0 @rem{! This is the argument}
1080 finishi fact @rem{! Call the tail call optimized function}
1081 retval R0 @rem{! Fetch the result}
1082 retr R0 @rem{! Return it}
1083 epilog @rem{! Epilog *before* label before prolog}
1085 fact: @rem{! Entry point of the helper function}
1087 frame 16 @rem{! Reserve 16 bytes in the stack}
1088 fact_entry: @rem{! This is the tail call entry point}
1089 ac = arg @rem{! The accumulator is the first argument}
1090 in = arg @rem{! The factorial argument}
1091 getarg R0, ac @rem{! Move the accumulator to R0}
1092 getarg R1, in @rem{! Move the argument to R1}
1093 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
1094 mulr R0, R0, R1 @rem{! accumulator *= argument}
1095 putargr R0, ac @rem{! Update the accumulator}
1096 subi R1, R1, 1 @rem{! argument -= 1}
1097 putargr R1, in @rem{! Update the argument}
1098 jmpi fact_entry @rem{! Tail Call Optimize it!}
1100 retr R0 @rem{! Return the accumulator}
1105 forward_p (not specified) @r{forward label predicate}
1106 indirect_p (not specified) @r{indirect label predicate}
1107 target_p (not specified) @r{used label predicate}
1108 arg_register_p (not specified) @r{argument kind predicate}
1109 callee_save_p (not specified) @r{callee save predicate}
1110 pointer_p (not specified) @r{pointer predicate}
1113 @code{forward_p} expects a @code{jit_node_t*} argument, and
1114 returns non zero if it is a forward label reference, that is,
1115 a label returned by @code{forward}, that still needs a
1118 @code{indirect_p} expects a @code{jit_node_t*} argument, and returns
1119 non zero if it is an indirect label reference, that is, a label that
1120 was returned by @code{indirect}.
1122 @code{target_p} expects a @code{jit_node_t*} argument, that is any
1123 kind of label, and will return non zero if there is at least one
1124 jump or move referencing it.
1126 @code{arg_register_p} expects a @code{jit_node_t*} argument, that must
1127 have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
1128 will return non zero if the argument lives in a register. This call
1129 is useful to know the live range of register arguments, as those
1130 are very fast to read and write, but have volatile values.
1132 @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
1133 @code{JIT_Fn}, and will return non zero if the register is callee
1134 save. This call is useful because on several ports, the @code{JIT_Rn}
1135 and @code{JIT_Fn} registers are actually callee save; no need
1136 to save and load the values when making function calls.
1138 @code{pointer_p} expects a pointer argument, and will return non
1139 zero if the pointer is inside the generated jit code. Must be
1140 called after @code{jit_emit} and before @code{jit_destroy_state}.
1142 @item Atomic operations
1143 Only compare-and-swap is implemented. It accepts four operands;
1144 the second can be an immediate.
1146 The first argument is set with a boolean value telling if the operation
1149 Arguments must be different, cannot use the result register to also pass
1152 The second argument is the address of a machine word.
1154 The third argument is the old value.
1156 The fourth argument is the new value.
1159 casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1160 casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1163 If value at the address in the second argument is equal to the third
1164 argument, the address value is atomically modified to the value of the
1165 fourth argument and the first argument is set to a non zero value.
1167 If the value at the address in the second argument is not equal to the
1168 third argument nothing is done and the first argument is set to zero.
1171 @node GNU lightning examples
1172 @chapter Generating code at run-time
1174 To use @lightning{}, you should include the @file{lightning.h} file that
1175 is put in your include directory by the @samp{make install} command.
1177 Each of the instructions above translates to a macro or function call.
1178 All you have to do is prepend @code{jit_} (lowercase) to opcode names
1179 and @code{JIT_} (uppercase) to register names. Of course, parameters
1180 are to be put between parentheses.
1182 This small tutorial presents three examples:
1187 The @code{incr} function found in @ref{The instruction set, ,
1188 @lightning{}'s instruction set}:
1191 A simple function call to @code{printf}
1202 * incr:: A function which increments a number by one
1203 * printf:: A simple function call to printf
1204 * RPN calculator:: A more complex example, an RPN calculator
1205 * Fibonacci:: Calculating Fibonacci numbers
1210 @section A function which increments a number by one
1212 Let's see how to create and use the sample @code{incr} function created
1213 in @ref{The instruction set, , @lightning{}'s instruction set}:
1217 #include <lightning.h>
1219 static jit_state_t *_jit;
1221 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1223 int main(int argc, char *argv[])
1229 _jit = jit_new_state();
1231 jit_prolog(); @rem{/* @t{ prolog } */}
1232 in = jit_arg(); @rem{/* @t{ in = arg } */}
1233 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
1234 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
1235 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
1240 @rem{/* call the generated code@comma{} passing 5 as an argument */}
1241 printf("%d + 1 = %d\n", 5, incr(5));
1243 jit_destroy_state();
1249 Let's examine the code line by line (well, almost@dots{}):
1252 @item #include <lightning.h>
1253 You already know about this. It defines all of @lightning{}'s macros.
1255 @item static jit_state_t *_jit;
1256 You might wonder about what is @code{jit_state_t}. It is a structure
1257 that stores jit code generation information. The name @code{_jit} is
1258 special, because since multiple jit generators can run at the same
1259 time, you must either @r{#define _jit my_jit_state} or name it
1262 @item typedef int (*pifi)(int);
1263 Just a handy typedef for a pointer to a function that takes an
1264 @code{int} and returns another.
1266 @item jit_node_t *in;
1267 Declares a variable to hold an identifier for a function argument. It
1268 is an opaque pointer, that will hold the return of a call to @code{arg}
1269 and be used as argument to @code{getarg}.
1272 Declares a function pointer variable to a function that receives an
1273 @code{int} and returns an @code{int}.
1275 @item init_jit(argv[0]);
1276 You must call this function before creating a @code{jit_state_t}
1277 object. This function does global state initialization, and may need
1278 to detect CPU or Operating System features. It receives a string
1279 argument that is later used to read symbols from a shared object using
1280 GNU binutils if disassembly was enabled at configure time. If no
1281 disassembly will be performed a NULL pointer can be used as argument.
1283 @item _jit = jit_new_state();
1284 This call initializes a @lightning{} jit state.
1287 Ok, so we start generating code for our beloved function@dots{}
1289 @item in = jit_arg();
1290 @itemx jit_getarg(JIT_R0, in);
1291 We retrieve the first (and only) argument, an integer, and store it
1292 into the general-purpose register @code{R0}.
1294 @item jit_addi(JIT_R0, JIT_R0, 1);
1295 We add one to the content of the register.
1297 @item jit_retr(JIT_R0);
1298 This instruction generates a standard function epilog that returns
1299 the contents of the @code{R0} register.
1301 @item incr = jit_emit();
1302 This instruction is very important. It actually translates the
1303 @lightning{} macros used before to machine code, flushes the generated
1304 code area out of the processor's instruction cache and return a
1305 pointer to the start of the code.
1307 @item jit_clear_state();
1308 This call cleanups any data not required for jit execution. Note
1309 that it must be called after any call to @code{jit_print} or
1310 @code{jit_address}, as this call destroy the @lightning{}
1311 intermediate representation.
1313 @item printf("%d + 1 = %d", 5, incr(5));
1314 Calling our function is this simple---it is not distinguishable from
1315 a normal C function call, the only difference being that @code{incr}
1318 @item jit_destroy_state();
1319 Releases all memory associated with the jit context. It should be
1320 called after known the jit will no longer be called.
1323 This call cleanups any global state hold by @lightning{}, and is
1324 advisable to call it once jit code will no longer be generated.
1327 @lightning{} abstracts two phases of dynamic code generation: selecting
1328 instructions that map the standard representation, and emitting binary
1329 code for these instructions. The client program has the responsibility
1330 of describing the code to be generated using the standard @lightning{}
1333 Let's examine the code generated for @code{incr} on the SPARC and x86_64
1334 architecture (on the right is the code that an assembly-language
1335 programmer would write):
1348 In this case, @lightning{} introduces overhead to create a register
1349 window (not knowing that the procedure is a leaf procedure) and to
1350 move the argument to the general purpose register @code{R0} (which
1351 maps to @code{%g2} on the SPARC).
1361 In this case, for the x86 port, @lightning{} has simple optimizations
1362 to understand it is a leaf function, and that it is not required to
1363 create a stack frame nor update the stack pointer.
1367 @section A simple function call to @code{printf}
1369 Again, here is the code for the example:
1373 #include <lightning.h>
1375 static jit_state_t *_jit;
1377 typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1379 int main(int argc, char *argv[])
1381 pvfi myFunction; @rem{/* ptr to generated code */}
1382 jit_node_t *start, *end; @rem{/* a couple of labels */}
1383 jit_node_t *in; @rem{/* to get the argument */}
1386 _jit = jit_new_state();
1388 start = jit_note(__FILE__, __LINE__);
1391 jit_getarg(JIT_R1, in);
1393 jit_pushargi((jit_word_t)"generated %d bytes\n");
1395 jit_pushargr(JIT_R1);
1396 jit_finishi(printf);
1399 end = jit_note(__FILE__, __LINE__);
1401 myFunction = jit_emit();
1403 @rem{/* call the generated code@comma{} passing its size as argument */}
1404 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1409 jit_destroy_state();
1415 The function shows how many bytes were generated. Most of the code
1416 is not very interesting, as it resembles very closely the program
1417 presented in @ref{incr, , A function which increments a number by one}.
1419 For this reason, we're going to concentrate on just a few statements.
1422 @item start = jit_note(__FILE__, __LINE__);
1424 @itemx end = jit_note(__FILE__, __LINE__);
1425 These two instruction call the @code{jit_note} macro, which creates
1426 a note in the jit code; arguments to @code{jit_note} usually are a
1427 filename string and line number integer, but using NULL for the
1428 string argument is perfectly valid if only need to create a simple
1431 @item jit_ellipsis();
1432 @code{ellipsis} usually is only required if calling varargs functions
1433 with double arguments, but it is a good practice to properly describe
1434 the @r{@dots{}} in the call sequence.
1436 @item jit_pushargi((jit_word_t)"generated %d bytes\n");
1437 Note the use of the @code{(jit_word_t)} cast, that is used only
1438 to avoid a compiler warning, due to using a pointer where a
1439 wordsize integer type was expected.
1441 @item jit_prepare();
1443 @itemx jit_finishi(printf);
1444 Once the arguments to @code{printf} have been pushed, what means
1445 moving them to stack or register arguments, the @code{printf}
1446 function is called and the stack cleaned. Note how @lightning{}
1447 abstracts the differences between different architectures and
1448 ABI's -- the client program does not know how parameter passing
1449 works on the host architecture.
1452 Usually it is not required to call @code{epilog}, but because it
1453 is implicitly called when noticing the end of a function, if the
1454 @code{end} variable was set with a @code{note} call after the
1455 @code{ret}, it would not consider the function epilog.
1457 @item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1458 This calls the generate jit function passing as argument the offset
1459 difference from the @code{start} and @code{end} notes. The @code{address}
1460 call must be done after the @code{emit} call or either a fatal error
1461 will happen (if @lightning{} is built with assertions enable) or an
1462 undefined value will be returned.
1464 @item jit_clear_state();
1465 Note that @code{jit_clear_state} was called after executing jit in
1466 this example. It was done because it must be called after any call
1467 to @code{jit_address} or @code{jit_print}.
1469 @item jit_disassemble();
1470 @code{disassemble} will dump the generated code to standard output,
1471 unless @lightning{} was built with the disassembler disabled, in which
1472 case no output will be shown.
1475 @node RPN calculator
1476 @section A more complex example, an RPN calculator
1478 We create a small stack-based RPN calculator which applies a series
1479 of operators to a given parameter and to other numeric operands.
1480 Unlike previous examples, the code generator is fully parameterized
1481 and is able to compile different formulas to different functions.
1482 Here is the code for the expression compiler; a sample usage will
1485 Since @lightning{} does not provide push/pop instruction, this
1486 example uses a stack-allocated area to store the data. Such an
1487 area can be allocated using the macro @code{allocai}, which
1488 receives the number of bytes to allocate and returns the offset
1489 from the frame pointer register @code{FP} to the base of the
1492 Usually, you will use the @code{ldxi} and @code{stxi} instruction
1493 to access stack-allocated variables. However, it is possible to
1494 use operations such as @code{add} to compute the address of the
1495 variables, and pass the address around.
1499 #include <lightning.h>
1501 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1503 static jit_state_t *_jit;
1505 void stack_push(int reg, int *sp)
1507 jit_stxi_i (*sp, JIT_FP, reg);
1508 *sp += sizeof (int);
1511 void stack_pop(int reg, int *sp)
1513 *sp -= sizeof (int);
1514 jit_ldxi_i (reg, JIT_FP, *sp);
1517 jit_node_t *compile_rpn(char *expr)
1519 jit_node_t *in, *fn;
1520 int stack_base, stack_ptr;
1522 fn = jit_note(NULL, 0);
1525 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1527 jit_getarg(JIT_R2, in);
1532 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1534 stack_push(JIT_R0, &stack_ptr);
1535 jit_movi(JIT_R0, atoi(buf));
1536 @} else if (*expr == 'x') @{
1537 stack_push(JIT_R0, &stack_ptr);
1538 jit_movr(JIT_R0, JIT_R2);
1539 @} else if (*expr == '+') @{
1540 stack_pop(JIT_R1, &stack_ptr);
1541 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1542 @} else if (*expr == '-') @{
1543 stack_pop(JIT_R1, &stack_ptr);
1544 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1545 @} else if (*expr == '*') @{
1546 stack_pop(JIT_R1, &stack_ptr);
1547 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1548 @} else if (*expr == '/') @{
1549 stack_pop(JIT_R1, &stack_ptr);
1550 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1552 fprintf(stderr, "cannot compile: %s\n", expr);
1563 The principle on which the calculator is based is easy: the stack top
1564 is held in R0, while the remaining items of the stack are held in the
1565 memory area that we allocate with @code{allocai}. Compiling a numeric
1566 operand or the argument @code{x} pushes the old stack top onto the
1567 stack and moves the operand into R0; compiling an operator pops the
1568 second operand off the stack into R1, and compiles the operation so
1569 that the result goes into R0, thus becoming the new stack top.
1571 This example allocates a fixed area for 32 @code{int}s. This is not
1572 a problem when the function is a leaf like in this case; in a full-blown
1573 compiler you will want to analyze the input and determine the number
1574 of needed stack slots---a very simple example of register allocation.
1575 The area is then managed like a stack using @code{stack_push} and
1578 Source code for the client (which lies in the same source file) follows:
1581 int main(int argc, char *argv[])
1583 jit_node_t *nc, *nf;
1588 _jit = jit_new_state();
1590 nc = compile_rpn("32x9*5/+");
1591 nf = compile_rpn("x32-5*9/");
1593 c2f = (pifi)jit_address(nc);
1594 f2c = (pifi)jit_address(nf);
1598 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1600 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1604 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1606 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1609 jit_destroy_state();
1615 The client displays a conversion table between Celsius and Fahrenheit
1616 degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1617 formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1620 Providing the formula as an argument to @code{compile_rpn} effectively
1621 parameterizes code generation, making it possible to use the same code
1622 to compile different functions; this is what makes dynamic code
1623 generation so powerful.
1626 @section Fibonacci numbers
1628 The code in this section calculates the Fibonacci sequence. That is
1629 modeled by the recurrence relation:
1633 f(n) = f(n-1) + f(n-2)
1636 The purpose of this example is to introduce branches. There are two
1637 kind of branches: backward branches and forward branches. We'll
1638 present the calculation in a recursive and iterative form; the
1639 former only uses forward branches, while the latter uses both.
1643 #include <lightning.h>
1645 static jit_state_t *_jit;
1647 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1649 int main(int argc, char *argv[])
1654 jit_node_t *in; @rem{/* offset of the argument */}
1655 jit_node_t *ref; @rem{/* to patch the forward reference */}
1656 jit_node_t *zero; @rem{/* to patch the forward reference */}
1659 _jit = jit_new_state();
1661 label = jit_label();
1664 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1665 zero = jit_beqi (JIT_R0, 0);
1666 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1667 jit_movi (JIT_R0, 1);
1668 ref = jit_blei (JIT_V0, 2);
1669 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1670 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1672 jit_pushargr(JIT_V1);
1673 call = jit_finishi(NULL);
1674 jit_patch_at(call, label);
1675 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1677 jit_pushargr(JIT_V2);
1678 call = jit_finishi(NULL);
1679 jit_patch_at(call, label);
1680 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1681 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1683 jit_patch(ref); @rem{/* patch jump */}
1684 jit_patch(zero); @rem{/* patch jump */}
1687 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1690 printf("fib(%d) = %d\n", 32, fib(32));
1691 jit_destroy_state();
1697 As said above, this is the first example of dynamically compiling
1698 branches. Branch instructions have two operands containing the
1699 values to be compared, and return a @code{jit_note_t *} object
1702 Because labels final address are only known after calling @code{emit},
1703 it is required to call @code{patch} or @code{patch_at}, what does
1704 tell @lightning{} that the target to patch is actually a pointer to
1705 a @code{jit_node_t *} object, otherwise, it would assume that is
1706 a pointer to a C function. Note that conditional branches do not
1707 receive a label argument, so they must be patched.
1709 You need to call @code{patch_at} on the return of value @code{calli},
1710 @code{finishi}, and @code{calli} if it is actually referencing a label
1711 in the jit code. All branch instructions do not receive a label
1712 argument. Note that @code{movi} is an special case, and patching it
1713 is usually done to get the final address of a label, usually to later
1716 Now, here is the iterative version:
1720 #include <lightning.h>
1722 static jit_state_t *_jit;
1724 typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1726 int main(int argc, char *argv[])
1729 jit_node_t *in; @rem{/* offset of the argument */}
1730 jit_node_t *ref; @rem{/* to patch the forward reference */}
1731 jit_node_t *zero; @rem{/* to patch the forward reference */}
1732 jit_node_t *jump; @rem{/* jump to start of loop */}
1733 jit_node_t *loop; @rem{/* start of the loop */}
1736 _jit = jit_new_state();
1740 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1741 zero = jit_beqi (JIT_R0, 0);
1742 jit_movr (JIT_R1, JIT_R0);
1743 jit_movi (JIT_R0, 1);
1744 ref = jit_blti (JIT_R1, 2);
1745 jit_subi (JIT_R2, JIT_R2, 2);
1746 jit_movr (JIT_R1, JIT_R0);
1749 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1750 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1751 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1752 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1753 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1754 jit_patch_at(jump, loop);
1756 jit_patch(ref); @rem{/* patch forward jump */}
1757 jit_patch(zero); @rem{/* patch forward jump */}
1760 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1763 printf("fib(%d) = %d\n", 36, fib(36));
1764 jit_destroy_state();
1770 This code calculates the recurrence relation using iteration (a
1771 @code{for} loop in high-level languages). There are no function
1772 calls anymore: instead, there is a backward jump (the @code{bnei} at
1773 the end of the loop).
1775 Note that the program must remember the address for backward jumps;
1776 for forward jumps it is only required to remember the jump code,
1777 and call @code{patch} for the implicit label.
1780 @chapter Re-entrant usage of @lightning{}
1782 @lightning{} uses the special @code{_jit} identifier. To be able
1783 to be able to use multiple jit generation states at the same
1784 time, it is required to used code similar to:
1787 struct jit_state lightning;
1788 #define lightning _jit
1791 This will cause the symbol defined to @code{_jit} to be passed as
1792 the first argument to the underlying @lightning{} implementation,
1793 that is usually a function with an @code{_} (underscode) prefix
1794 and with an argument named @code{_jit}, in the pattern:
1797 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1798 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1801 The reason for this is to use the same syntax as the initial lightning
1802 implementation and to avoid needing the user to keep adding an extra
1803 argument to every call, as multiple jit states generating code in
1804 paralell should be very uncommon.
1807 @chapter Accessing the whole register file
1809 As mentioned earlier in this chapter, all @lightning{} back-ends are
1810 guaranteed to have at least six general-purpose integer registers and
1811 six floating-point registers, but many back-ends will have more.
1813 To access the entire register files, you can use the
1814 @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1815 accept a parameter that identifies the register number, which
1816 must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1817 and @code{JIT_F_NUM} respectively; the number need not be
1818 constant. Of course, expressions like @code{JIT_R0} and
1819 @code{JIT_R(0)} denote the same register, and likewise for
1820 integer callee-saved, or floating-point, registers.
1822 @section Scratch registers
1824 For operations, @lightning{} does not support directly, like storing
1825 a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1826 acquire and release a scratch register as in the following pattern:
1829 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1830 jit_movi (reg, immediate);
1831 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1832 jit_unget_reg (reg);
1835 As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1836 reloads but don't follow branches, the code between both must be in
1837 the same basic block and must not contain any branches as in the
1838 following (bad) example.
1841 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1842 jit_ldxi (reg, JIT_V0, offset);
1843 jump = jit_bnei (reg, V0);
1844 jit_movr (JIT_V1, reg);
1846 jit_unget_reg (reg);
1849 @node Customizations
1850 @chapter Customizations
1852 Frequently it is desirable to have more control over how code is
1853 generated or how memory is used during jit generation or execution.
1855 @section Memory functions
1856 To aid in complete control of memory allocation and deallocation
1857 @lightning{} provides wrappers that default to standard @code{malloc},
1858 @code{realloc} and @code{free}. These are loosely based on the
1859 GNU GMP counterparts, with the difference that they use the same
1860 prototype of the system allocation functions, that is, no @code{size}
1861 for @code{free} or @code{old_size} for @code{realloc}.
1863 @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1864 @lightning{} guarantees that memory is only allocated or released
1865 using these wrapped functions, but you must note that if lightning
1866 was linked to GNU binutils, malloc is probably will be called multiple
1867 times from there when initializing the disassembler.
1869 Because @code{init_jit} may call memory functions, if you need to call
1870 @code{jit_set_memory_functions}, it must be called before @code{init_jit},
1871 otherwise, when calling @code{finish_jit}, a pointer allocated with the
1872 previous or default wrappers will be passed.
1875 @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1876 Get the current memory allocation function. Also, unlike the GNU GMP
1877 counterpart, it is an error to pass @code{NULL} pointers as arguments.
1881 Unless an alternate code buffer is used (see below), @code{jit_emit}
1882 set the access protections that the code buffer's memory can be read and
1883 executed, but not modified. One can use the following functions after
1884 @code{jit_emit} but before @code{jit_clear} to temporarily lift the
1887 @deftypefun void jit_unprotect ()
1888 Changes the access protection that the code buffer's memory can be read and
1889 modified. Before the emitted code can be invoked, @code{jit_protect}
1890 has to be called to reset the change.
1892 This procedure has no effect when an alternate code buffer (see below) is used.
1895 @deftypefun void jit_protect ()
1896 Changes the access protection that the code buffer's memory can be read and
1899 This procedure has no effect when an alternate code buffer (see below) is used.
1902 @section Alternate code buffer
1903 To instruct @lightning{} to use an alternate code buffer it is required
1904 to call @code{jit_realize} before @code{jit_emit}, and then query states
1905 and customize as appropriate.
1907 @deftypefun void jit_realize ()
1908 Must be called once, before @code{jit_emit}, to instruct @lightning{}
1909 that no other @code{jit_xyz} call will be made.
1912 @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1913 Returns NULL or the previous value set with @code{jit_set_code}, and
1914 sets the @var{code_size} argument to an appropriate value.
1915 If @code{jit_get_code} is called before @code{jit_emit}, the
1916 @var{code_size} argument is set to the expected amount of bytes
1917 required to generate code.
1918 If @code{jit_get_code} is called after @code{jit_emit}, the
1919 @var{code_size} argument is set to the exact amount of bytes used
1923 @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1924 Instructs @lightning{} to output to the @var{code} argument and
1925 use @var{size} as a guard to not write to invalid memory. If during
1926 @code{jit_emit} @lightning{} finds out that the code would not fit
1927 in @var{size} bytes, it halts code emit and returns @code{NULL}.
1930 A simple example of a loop using an alternate buffer is:
1934 int *(func)(int); @rem{/* function pointer */}
1935 jit_word_t code_size;
1936 jit_word_t real_code_size;
1938 jit_realize(); @rem{/* ready to generate code */}
1939 jit_get_code(&code_size); @rem{/* get expected code size */}
1940 code_size = (code_size + 4095) & -4096;
1942 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1943 MAP_PRIVATE | MAP_ANON, -1, 0);
1944 jit_set_code(code, code_size);
1945 if ((func = jit_emit()) == NULL) @{
1946 munmap(code, code_size);
1949 @} while (func == NULL);
1950 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1953 The first call to @code{jit_get_code} should return @code{NULL} and set
1954 the @code{code_size} argument to the expected amount of bytes required
1956 The second call to @code{jit_get_code} is after a successful call to
1957 @code{jit_emit}, and will return the value previously set with
1958 @code{jit_set_code} and set the @code{real_code_size} argument to the
1959 exact amount of bytes used to emit the code.
1961 @section Alternate data buffer
1962 Sometimes it may be desirable to customize how, or to prevent
1963 @lightning{} from using an extra buffer for constants or debug
1964 annotation. Usually when also using an alternate code buffer.
1966 @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1967 Returns @code{NULL} or the previous value set with @code{jit_set_data},
1968 and sets the @var{data_size} argument to how many bytes are required
1969 for the constants data buffer, and @var{note_size} to how many bytes
1970 are required to store the debug note information.
1971 Note that it always preallocate one debug note entry even if
1972 @code{jit_name} or @code{jit_note} are never called, but will return
1973 zero in the @var{data_size} argument if no constant is required;
1974 constants are only used for the @code{float} and @code{double} operations
1975 that have an immediate argument, and not in all @lightning{} ports.
1978 @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1980 @var{data} can be NULL if disabling constants and annotations, otherwise,
1981 a valid pointer must be passed. An assertion is done that the data will
1982 fit in @var{size} bytes (but that is a noop if @lightning{} was built
1983 with @code{-DNDEBUG}).
1985 @var{size} tells the space in bytes available in @var{data}.
1987 @var{flags} can be zero to tell to just use the alternate data buffer,
1988 or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1991 @item JIT_DISABLE_DATA
1992 @cindex JIT_DISABLE_DATA
1993 Instructs @lightning{} to not use a constant table, but to use an
1994 alternate method to synthesize those, usually with a larger code
1995 sequence using stack space to transfer the value from a GPR to a
1998 @item JIT_DISABLE_NOTE
1999 @cindex JIT_DISABLE_NOTE
2000 Instructs @lightning{} to not store file or function name, and
2001 line numbers in the constant buffer.
2005 A simple example of a preventing usage of a data buffer is:
2009 jit_realize(); @rem{/* ready to generate code */}
2010 jit_get_data(NULL, NULL);
2011 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
2015 Or to only use a data buffer, if required:
2019 jit_word_t data_size;
2021 jit_realize(); @rem{/* ready to generate code */}
2022 jit_get_data(&data_size, NULL);
2024 data = malloc(data_size);
2027 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
2034 @node Acknowledgements
2035 @chapter Acknowledgements
2037 As far as I know, the first general-purpose portable dynamic code
2038 generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
2039 Further work by Dawson R. Engler resulted in the @sc{vcode} system;
2040 unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
2041 directly inspired @lightning{}.
2043 Thanks go to Ian Piumarta, who kindly accepted to release his own
2044 program @sc{ccg} under the GNU General Public License, thereby allowing
2045 @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
2046 @sc{ccg} provides a way of dynamically assemble programs written in the
2047 underlying architecture's assembly language. So it is not portable,
2048 yet very interesting.
2050 I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
2051 was first developed as a tool to be used in GNU Smalltalk's dynamic
2052 translator from bytecodes to native code.