Makefile: Fix build on Windows and Android
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
519a9ea1 33* Registers:: Accessing the whole register file
4a71579b
PC
34* Customizations:: Advanced code generation customizations
35* Acknowledgements:: Acknowledgements for GNU lightning
36@end menu
37@end ifnottex
38
39@node Overview
40@chapter Introduction to @lightning{}
41
42@iftex
43This document describes @value{TOPIC} the @lightning{} library for
44dynamic code generation.
45@end iftex
46
519a9ea1
PC
47Dynamic code generation is the generation of machine code
48at runtime. It is typically used to strip a layer of interpretation
4a71579b
PC
49by allowing compilation to occur at runtime. One of the most
50well-known applications of dynamic code generation is perhaps that
51of interpreters that compile source code to an intermediate bytecode
52form, which is then recompiled to machine code at run-time: this
53approach effectively combines the portability of bytecode
54representations with the speed of machine code. Another common
55application of dynamic code generation is in the field of hardware
56simulators and binary emulators, which can use the same techniques
519a9ea1 57to translate simulated instructions to the instructions of the
4a71579b
PC
58underlying machine.
59
60Yet other applications come to mind: for example, windowing
61@dfn{bitblt} operations, matrix manipulations, and network packet
62filters. Albeit very powerful and relatively well known within the
63compiler community, dynamic code generation techniques are rarely
64exploited to their full potential and, with the exception of the
65two applications described above, have remained curiosities because
66of their portability and functionality barriers: binary instructions
67are generated, so programs using dynamic code generation must be
68retargeted for each machine; in addition, coding a run-time code
69generator is a tedious and error-prone task more than a difficult one.
70
71@lightning{} provides a portable, fast and easily retargetable dynamic
519a9ea1 72code generation system.
4a71579b
PC
73
74To be portable, @lightning{} abstracts over current architectures'
75quirks and unorthogonalities. The interface that it exposes to is that
76of a standardized RISC architecture loosely based on the SPARC and MIPS
77chips. There are a few general-purpose registers (six, not including
78those used to receive and pass parameters between subroutines), and
79arithmetic operations involve three operands---either three registers
80or two registers and an arbitrarily sized immediate value.
81
82On one hand, this architecture is general enough that it is possible to
83generate pretty efficient code even on CISC architectures such as the
84Intel x86 or the Motorola 68k families. On the other hand, it matches
85real architectures closely enough that, most of the time, the
86compiler's constant folding pass ends up generating code which
87assembles machine instructions without further tests.
88
89@node Installation
90@chapter Configuring and installing @lightning{}
91
92The first thing to do to use @lightning{} is to configure the
93program, picking the set of macros to be used on the host
94architecture; this configuration is automatically performed by
95the @file{configure} shell script; to run it, merely type:
96@example
97 ./configure
98@end example
99
100@lightning{} supports the @code{--enable-disassembler} option, that
101enables linking to GNU binutils and optionally print human readable
102disassembly of the jit code. This option can be disabled by the
103@code{--disable-disassembler} option.
104
105Another option that @file{configure} accepts is
106@code{--enable-assertions}, which enables several consistency checks in
107the run-time assemblers. These are not usually needed, so you can
108decide to simply forget about it; also remember that these consistency
109checks tend to slow down your code generator.
110
111After you've configured @lightning{}, run @file{make} as usual.
112
113@lightning{} has an extensive set of tests to validate it is working
114correctly in the build host. To test it run:
115@example
116 make check
117@end example
118
119The next important step is:
120@example
121 make install
122@end example
123
124This ends the process of installing @lightning{}.
125
126@node The instruction set
127@chapter @lightning{}'s instruction set
128
129@lightning{}'s instruction set was designed by deriving instructions
130that closely match those of most existing RISC architectures, or
131that can be easily syntesized if absent. Each instruction is composed
132of:
133@itemize @bullet
134@item
135an operation, like @code{sub} or @code{mul}
136
137@item
138most times, a register/immediate flag (@code{r} or @code{i})
139
140@item
141an unsigned modifier (@code{u}), a type identifier or two, when applicable.
142@end itemize
143
144Examples of legal mnemonics are @code{addr} (integer add, with three
145register operands) and @code{muli} (integer multiply, with two
146register operands and an immediate operand). Each instruction takes
147two or three operands; in most cases, one of them can be an immediate
148value instead of a register.
149
150Most @lightning{} integer operations are signed wordsize operations,
151with the exception of operations that convert types, or load or store
152values to/from memory. When applicable, the types and C types are as
153follow:
154
155@example
156 _c @r{signed char}
157 _uc @r{unsigned char}
158 _s @r{short}
159 _us @r{unsigned short}
160 _i @r{int}
161 _ui @r{unsigned int}
162 _l @r{long}
163 _f @r{float}
164 _d @r{double}
165@end example
166
167Most integer operations do not need a type modifier, and when loading or
168storing values to memory there is an alias to the proper operation
169using wordsize operands, that is, if ommited, the type is @r{int} on
17032-bit architectures and @r{long} on 64-bit architectures. Note
171that lightning also expects @code{sizeof(void*)} to match the wordsize.
172
173When an unsigned operation result differs from the equivalent signed
174operation, there is a the @code{_u} modifier.
175
176There are at least seven integer registers, of which six are
177general-purpose, while the last is used to contain the frame pointer
178(@code{FP}). The frame pointer can be used to allocate and access local
179variables on the stack, using the @code{allocai} or @code{allocar}
180instruction.
181
182Of the general-purpose registers, at least three are guaranteed to be
183preserved across function calls (@code{V0}, @code{V1} and
184@code{V2}) and at least three are not (@code{R0}, @code{R1} and
185@code{R2}). Six registers are not very much, but this
186restriction was forced by the need to target CISC architectures
187which, like the x86, are poor of registers; anyway, backends can
188specify the actual number of available registers with the calls
189@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
190(for callee-save registers).
191
192There are at least six floating-point registers, named @code{F0} to
193@code{F5}. These are usually caller-save and are separate from the integer
194registers on the supported architectures; on Intel architectures,
195in 32 bit mode if SSE2 is not available or use of X87 is forced,
196the register stack is mapped to a flat register file. As for the
197integer registers, the macro @code{JIT_F_NUM} yields the number of
198floating-point registers.
199
200The complete instruction set follows; as you can see, most non-memory
201operations only take integers (either signed or unsigned) as operands;
202this was done in order to reduce the instruction set, and because most
203architectures only provide word and long word operations on registers.
204There are instructions that allow operands to be extended to fit a larger
205data type, both in a signed and in an unsigned way.
206
207@table @b
208@item Binary ALU operations
209These accept three operands; the last one can be an immediate.
210@code{addx} operations must directly follow @code{addc}, and
211@code{subx} must follow @code{subc}; otherwise, results are undefined.
212Most, if not all, architectures do not support @r{float} or @r{double}
213immediate operands; lightning emulates those operations by moving the
214immediate to a temporary register and emiting the call with only
215register operands.
216@example
217addr _f _d O1 = O2 + O3
218addi _f _d O1 = O2 + O3
219addxr O1 = O2 + (O3 + carry)
220addxi O1 = O2 + (O3 + carry)
221addcr O1 = O2 + O3, set carry
222addci O1 = O2 + O3, set carry
223subr _f _d O1 = O2 - O3
224subi _f _d O1 = O2 - O3
225subxr O1 = O2 - (O3 + carry)
226subxi O1 = O2 - (O3 + carry)
227subcr O1 = O2 - O3, set carry
228subci O1 = O2 - O3, set carry
229rsbr _f _d O1 = O3 - O1
230rsbi _f _d O1 = O3 - O1
231mulr _f _d O1 = O2 * O3
232muli _f _d O1 = O2 * O3
233divr _u _f _d O1 = O2 / O3
234divi _u _f _d O1 = O2 / O3
235remr _u O1 = O2 % O3
236remi _u O1 = O2 % O3
237andr O1 = O2 & O3
238andi O1 = O2 & O3
239orr O1 = O2 | O3
240ori O1 = O2 | O3
241xorr O1 = O2 ^ O3
242xori O1 = O2 ^ O3
243lshr O1 = O2 << O3
244lshi O1 = O2 << O3
245rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
246rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
1f22b268
PC
247movzr O1 = O3 ? O1 : O2
248movnr O1 = O3 ? O2 : O1
4a71579b
PC
249@end example
250
251@item Four operand binary ALU operations
252These accept two result registers, and two operands; the last one can
253be an immediate. The first two arguments cannot be the same register.
254
255@code{qmul} stores the low word of the result in @code{O1} and the
256high word in @code{O2}. For unsigned multiplication, @code{O2} zero
257means there was no overflow. For signed multiplication, no overflow
258check is based on sign, and can be detected if @code{O2} is zero or
259minus one.
260
261@code{qdiv} stores the quotient in @code{O1} and the remainder in
262@code{O2}. It can be used as quick way to check if a division is
263exact, in which case the remainder is zero.
264
265@example
266qmulr _u O1 O2 = O3 * O4
267qmuli _u O1 O2 = O3 * O4
268qdivr _u O1 O2 = O3 / O4
269qdivi _u O1 O2 = O3 / O4
270@end example
271
272@item Unary ALU operations
273These accept two operands, both of which must be registers.
274@example
275negr _f _d O1 = -O2
276comr O1 = ~O2
277@end example
278
279These unary ALU operations are only defined for float operands.
280@example
281absr _f _d O1 = fabs(O2)
282sqrtr O1 = sqrt(O2)
283@end example
284
285Besides requiring the @code{r} modifier, there are no unary operations
286with an immediate operand.
287
288@item Compare instructions
289These accept three operands; again, the last can be an immediate.
290The last two operands are compared, and the first operand, that must be
291an integer register, is set to either 0 or 1, according to whether the
292given condition was met or not.
293
294The conditions given below are for the standard behavior of C,
295where the ``unordered'' comparison result is mapped to false.
296
297@example
298ltr _u _f _d O1 = (O2 < O3)
299lti _u _f _d O1 = (O2 < O3)
300ler _u _f _d O1 = (O2 <= O3)
301lei _u _f _d O1 = (O2 <= O3)
302gtr _u _f _d O1 = (O2 > O3)
303gti _u _f _d O1 = (O2 > O3)
304ger _u _f _d O1 = (O2 >= O3)
305gei _u _f _d O1 = (O2 >= O3)
306eqr _f _d O1 = (O2 == O3)
307eqi _f _d O1 = (O2 == O3)
308ner _f _d O1 = (O2 != O3)
309nei _f _d O1 = (O2 != O3)
310unltr _f _d O1 = !(O2 >= O3)
311unler _f _d O1 = !(O2 > O3)
312ungtr _f _d O1 = !(O2 <= O3)
313unger _f _d O1 = !(O2 < O3)
314uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
315ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
316ordr _f _d O1 = (O2 == O2) && (O3 == O3)
317unordr _f _d O1 = (O2 != O2) || (O3 != O3)
318@end example
319
320@item Transfer operations
321These accept two operands; for @code{ext} both of them must be
322registers, while @code{mov} accepts an immediate value as the second
323operand.
324
325Unlike @code{movr} and @code{movi}, the other instructions are used
326to truncate a wordsize operand to a smaller integer data type or to
327convert float data types. You can also use @code{extr} to convert an
328integer to a floating point value: the usual options are @code{extr_f}
329and @code{extr_d}.
330
331@example
332movr _f _d O1 = O2
333movi _f _d O1 = O2
334extr _c _uc _s _us _i _ui _f _d O1 = O2
335truncr _f _d O1 = trunc(O2)
336@end example
337
338In 64-bit architectures it may be required to use @code{truncr_f_i},
339@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
340the equivalent C code. Only the @code{_i} modifier is available in
34132-bit architectures.
342
343@example
344truncr_f_i = <int> O1 = <float> O2
345truncr_f_l = <long>O1 = <float> O2
346truncr_d_i = <int> O1 = <double>O2
347truncr_d_l = <long>O1 = <double>O2
348@end example
349
350The float conversion operations are @emph{destination first,
351source second}, but the order of the types is reversed. This happens
352for historical reasons.
353
354@example
355extr_f_d = <double>O1 = <float> O2
356extr_d_f = <float> O1 = <double>O2
357@end example
358
359@item Network extensions
360These accept two operands, both of which must be registers; these
361two instructions actually perform the same task, yet they are
362assigned to two mnemonics for the sake of convenience and
363completeness. As usual, the first operand is the destination and
364the second is the source.
365The @code{_ul} variant is only available in 64-bit architectures.
366@example
367htonr _us _ui _ul @r{Host-to-network (big endian) order}
368ntohr _us _ui _ul @r{Network-to-host order }
369@end example
370
371@item Load operations
372@code{ld} accepts two operands while @code{ldx} accepts three;
373in both cases, the last can be either a register or an immediate
374value. Values are extended (with or without sign, according to
375the data type specification) to fit a whole register.
376The @code{_ui} and @code{_l} types are only available in 64-bit
377architectures. For convenience, there is a version without a
378type modifier for integer or pointer operands that uses the
379appropriate wordsize call.
380@example
381ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
382ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
383ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
384ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
385@end example
386
387@item Store operations
388@code{st} accepts two operands while @code{stx} accepts three; in
389both cases, the first can be either a register or an immediate
390value. Values are sign-extended to fit a whole register.
391@example
392str _c _uc _s _us _i _ui _l _f _d *O1 = O2
393sti _c _uc _s _us _i _ui _l _f _d *O1 = O2
394stxr _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
395stxi _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
396@end example
397As for the load operations, the @code{_ui} and @code{_l} types are
398only available in 64-bit architectures, and for convenience, there
399is a version without a type modifier for integer or pointer operands
400that uses the appropriate wordsize call.
401
402@item Argument management
403These are:
404@example
405prepare (not specified)
406va_start (not specified)
407pushargr _f _d
408pushargi _f _d
409va_push (not specified)
410arg _f _d
411getarg _c _uc _s _us _i _ui _l _f _d
412va_arg _d
413putargr _f _d
414putargi _f _d
415ret (not specified)
416retr _f _d
417reti _f _d
418va_end (not specified)
419retval _c _uc _s _us _i _ui _l _f _d
420epilog (not specified)
421@end example
422As with other operations that use a type modifier, the @code{_ui} and
423@code{_l} types are only available in 64-bit architectures, but there
424are operations without a type modifier that alias to the appropriate
425integer operation with wordsize operands.
426
427@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
428while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
429A code snippet that wants to call another procedure and has to pass
430arguments must, in order: use the @code{prepare} instruction and use
431the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
432left to right order}; and use @code{finish} or @code{call} (explained below)
433to perform the actual call.
434
435@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
436arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
437@code{va_push} is required when passing a @code{va_list} to another function,
438because not all architectures expect it as a single pointer. Known case
439is DEC Alpha, that requires it as a structure passed by value.
440
441@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
442@code{arg} is different from other instruction in that it does not
443actually generate any code: instead, it is a function which returns
444a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
445a value'' means that @lightning{} code that compile these
446instructions return a value when expanded.} You should call
447@code{arg} as soon as possible, before any function call or, more
448easily, right after the @code{prolog} instructions
449(which is treated later).
450
451@code{getarg} accepts a register argument and a value returned by
452@code{arg}, and will move that argument to the register, extending
453it (with or without sign, according to the data type specification)
454to fit a whole register. These instructions are more intimately
455related to the usage of the @lightning{} instruction set in code
456that generates other code, so they will be treated more
457specifically in @ref{GNU lightning examples, , Generating code at
458run-time}.
459
460@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
461it accepts as first argument a register or immediate, and as
462second argument a value returned by @code{arg}. It allows changing,
463or restoring an argument to the current function, and is a
464construct required to implement tail call optimization. Note that
465arguments in registers are very cheap, but will be overwritten
466at any moment, including on some operations, for example division,
467that on several ports is implemented as a function call.
468
469Finally, the @code{retval} instruction fetches the return value of a
470called function in a register. The @code{retval} instruction takes a
471register argument and copies the return value of the previously called
472function in that register. A function with a return value should use
473@code{retr} or @code{reti} to put the return value in the return register
474before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
475
476@code{epilog} is an optional call, that marks the end of a function
477body. It is automatically generated by @lightning{} if starting a new
478function (what should be done after a @code{ret} call) or finishing
479generating jit.
480It is very important to note that the fact that @code{epilog} being
481optional may cause a common mistake. Consider this:
482@example
483fun1:
484 prolog
485 ...
486 ret
487fun2:
488 prolog
489@end example
490Because @code{epilog} is added when finding a new @code{prolog},
491this will cause the @code{fun2} label to actually be before the
492return from @code{fun1}. Because @lightning{} will actually
493understand it as:
494@example
495fun1:
496 prolog
497 ...
498 ret
499fun2:
500 epilog
501 prolog
502@end example
503
504You should observe a few rules when using these macros. First of
505all, if calling a varargs function, you should use the @code{ellipsis}
506call to mark the position of the ellipsis in the C prototype.
507
508You should not nest calls to @code{prepare} inside a
509@code{prepare/finish} block. Doing this will result in undefined
510behavior. Note that for functions with zero arguments you can use
511just @code{call}.
512
513@item Branch instructions
514Like @code{arg}, these also return a value which, in this case,
515is to be used to compile forward branches as explained in
516@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
517compared; of these, the last can be either a register or an immediate.
518They are:
519@example
520bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
521blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
522bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
523blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
524bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
525bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
526bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
527bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
528beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
529beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
530bner _f _d @r{if }(O2 != O3)@r{ goto }O1
531bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
532
533bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
534bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
535bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
536bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
537buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
538bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
539bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
540bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
541
542bmsr @r{if }O2 & O3@r{ goto }O1
543bmsi @r{if }O2 & O3@r{ goto }O1
544bmcr @r{if }!(O2 & O3)@r{ goto }O1
545bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
546boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
547boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
548bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
549bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
550bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
551bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
552bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
553bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
554@end example
555
556@item Jump and return operations
557These accept one argument except @code{ret} and @code{jmpi} which
558have none; the difference between @code{finishi} and @code{calli}
559is that the latter does not clean the stack from pushed parameters
560(if any) and the former must @strong{always} follow a @code{prepare}
561instruction.
562@example
563callr (not specified) @r{function call to register O1}
564calli (not specified) @r{function call to immediate O1}
565finishr (not specified) @r{function call to register O1}
566finishi (not specified) @r{function call to immediate O1}
567jmpr (not specified) @r{unconditional jump to register}
568jmpi (not specified) @r{unconditional jump}
569ret (not specified) @r{return from subroutine}
570retr _c _uc _s _us _i _ui _l _f _d
571reti _c _uc _s _us _i _ui _l _f _d
572retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
573 @r{to register}
574@end example
575
576Like branch instruction, @code{jmpi} also returns a value which is to
577be used to compile forward branches. @xref{Fibonacci, , Fibonacci
578numbers}.
579
580@item Labels
581There are 3 @lightning{} instructions to create labels:
582@example
583label (not specified) @r{simple label}
584forward (not specified) @r{forward label}
585indirect (not specified) @r{special simple label}
586@end example
587
588@code{label} is normally used as @code{patch_at} argument for backward
589jumps.
590
591@example
592 jit_node_t *jump, *label;
593label = jit_label();
594 ...
595 jump = jit_beqr(JIT_R0, JIT_R1);
596 jit_patch_at(jump, label);
597@end example
598
599@code{forward} is used to patch code generation before the actual
600position of the label is known.
601
602@example
603 jit_node_t *jump, *label;
604label = jit_forward();
605 jump = jit_beqr(JIT_R0, JIT_R1);
606 jit_patch_at(jump, label);
607 ...
608 jit_link(label);
609@end example
610
611@code{indirect} is useful when creating jump tables, and tells
612@lightning{} to not optimize out a label that is not the target of
613any jump, because an indirect jump may land where it is defined.
614
615@example
616 jit_node_t *jump, *label;
617 ...
618 jmpr(JIT_R0); @rem{/* may jump to label */}
619 ...
620label = jit_indirect();
621@end example
622
623@code{indirect} is an special case of @code{note} and @code{name}
624because it is a valid argument to @code{address}.
625
626Note that the usual idiom to write the previous example is
627@example
628 jit_node_t *addr, *jump;
629addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
630 ...
631 jmpr(JIT_R0);
632 ...
633 jit_patch(addr); @rem{/* implicit label added */}
634@end example
635
636that automatically binds the implicit label added by @code{patch} with
637the @code{movi}, but on some special conditions it is required to create
638an "unbound" label.
639
640@item Function prolog
641
642These macros are used to set up a function prolog. The @code{allocai}
643call accept a single integer argument and returns an offset value
644for stack storage access. The @code{allocar} accepts two registers
645arguments, the first is set to the offset for stack access, and the
646second is the size in bytes argument.
647
648@example
649prolog (not specified) @r{function prolog}
650allocai (not specified) @r{reserve space on the stack}
651allocar (not specified) @r{allocate space on the stack}
652@end example
653
654@code{allocai} receives the number of bytes to allocate and returns
655the offset from the frame pointer register @code{FP} to the base of
656the area.
657
658@code{allocar} receives two register arguments. The first is where
659to store the offset from the frame pointer register @code{FP} to the
660base of the area. The second argument is the size in bytes. Note
661that @code{allocar} is dynamic allocation, and special attention
662should be taken when using it. If called in a loop, every iteration
663will allocate stack space. Stack space is aligned from 8 to 64 bytes
664depending on backend requirements, even if allocating only one byte.
665It is advisable to not use it with @code{frame} and @code{tramp}; it
666should work with @code{frame} with special care to call only once,
667but is not supported if used in @code{tramp}, even if called only
668once.
669
670As a small appetizer, here is a small function that adds 1 to the input
671parameter (an @code{int}). I'm using an assembly-like syntax here which
672is a bit different from the one used when writing real subroutines with
673@lightning{}; the real syntax will be introduced in @xref{GNU lightning
674examples, , Generating code at run-time}.
675
676@example
677incr:
678 prolog
679in = arg @rem{! We have an integer argument}
680 getarg R0, in @rem{! Move it to R0}
681 addi R0, R0, 1 @rem{! Add 1}
682 retr R0 @rem{! And return the result}
683@end example
684
685And here is another function which uses the @code{printf} function from
686the standard C library to write a number in hexadecimal notation:
687
688@example
689printhex:
690 prolog
691in = arg @rem{! Same as above}
692 getarg R0, in
693 prepare @rem{! Begin call sequence for printf}
694 pushargi "%x" @rem{! Push format string}
695 ellipsis @rem{! Varargs start here}
696 pushargr R0 @rem{! Push second argument}
697 finishi printf @rem{! Call printf}
698 ret @rem{! Return to caller}
699@end example
700
519a9ea1
PC
701@item Register liveness
702
703During code generation, @lightning{} occasionally needs scratch registers
704or needs to use architecture-defined registers. For that, @lightning{}
705internally maintains register liveness information.
706
707In the following example, @code{qdivr} will need special registers like
708@code{R0} on some architectures. As @lightning{} understands that
709@code{R0} is used in the subsequent instruction, it will create
710save/restore code for @code{R0} in case.
711
712@example
713...
714qdivr V0, V1, V2, V3
715movr V3, R0
716...
717@end example
718
719The same is not true in the example that follows. Here, @code{R0} is
720not alive after the division operation because @code{R0} is neither an
721argument register nor a callee-save register. Thus, no save/restore
722code for @code{R0} will be created in case.
723
724@example
725...
726qdivr V0, V1, V2, V3
727jmpr R1
728...
729@end example
730
731The @code{live} instruction can be used to mark a register as live after
732it as in the following example. Here, @code{R0} will be preserved
733across the division.
734
735@example
736...
737qdivr V0, V1, V2, V3
738live R0
739jmpr R1
740...
741@end example
742
743The @code{live} instruction is useful at code entry and exit points,
744like after and before a @code{callr} instruction.
745
4a71579b
PC
746@item Trampolines, continuations and tail call optimization
747
748Frequently it is required to generate jit code that must jump to
749code generated later, possibly from another @code{jit_context_t}.
750These require compatible stack frames.
751
752@lightning{} provides two primitives from where trampolines,
753continuations and tail call optimization can be implemented.
754
755@example
756frame (not specified) @r{create stack frame}
757tramp (not specified) @r{assume stack frame}
758@end example
759
760@code{frame} receives an integer argument@footnote{It is not
761automatically computed because it does not know about the
762requirement of later generated code.} that defines the size in
763bytes for the stack frame of the current, @code{C} callable,
764jit function. To calculate this value, a good formula is maximum
765number of arguments to any called native function times
766eight@footnote{Times eight so that it works for double arguments.
767And would not need conditionals for ports that pass arguments in
768the stack.}, plus the sum of the arguments to any call to
769@code{jit_allocai}. @lightning{} automatically adjusts this value
770for any backend specific stack memory it may need, or any
771alignment constraint.
772
773@code{frame} also instructs @lightning{} to save all callee
774save registers in the prolog and reload in the epilog.
775
776@example
777main: @rem{! jit entry point}
778 prolog @rem{! function prolog}
779 frame 256 @rem{! save all callee save registers and}
780 @rem{! reserve at least 256 bytes in stack}
781main_loop:
782 ...
783 jmpi handler @rem{! jumps to external code}
784 ...
785 ret @rem{! return to the caller}
786@end example
787
788@code{tramp} differs from @code{frame} only that a prolog and epilog
789will not be generated. Note that @code{prolog} must still be used.
790The code under @code{tramp} must be ready to be entered with a jump
791at the prolog position, and instead of a return, it must end with
792a non conditional jump. @code{tramp} exists solely for the fact
793that it allows optimizing out prolog and epilog code that would
794never be executed.
795
796@example
797handler: @rem{! handler entry point}
798 prolog @rem{! function prolog}
799 tramp 256 @rem{! assumes all callee save registers}
800 @rem{! are saved and there is at least}
801 @rem{! 256 bytes in stack}
802 ...
803 jmpi main_loop @rem{! return to the main loop}
804@end example
805
806@lightning{} only supports Tail Call Optimization using the
807@code{tramp} construct. Any other way is not guaranteed to
808work on all ports.
809
810An example of a simple (recursive) tail call optimization:
811
812@example
813factorial: @rem{! Entry point of the factorial function}
814 prolog
815in = arg @rem{! Receive an integer argument}
816 getarg R0, in @rem{! Move argument to RO}
817 prepare
818 pushargi 1 @rem{! This is the accumulator}
819 pushargr R0 @rem{! This is the argument}
820 finishi fact @rem{! Call the tail call optimized function}
821 retval R0 @rem{! Fetch the result}
822 retr R0 @rem{! Return it}
823 epilog @rem{! Epilog *before* label before prolog}
824
825fact: @rem{! Entry point of the helper function}
826 prolog
827 frame 16 @rem{! Reserve 16 bytes in the stack}
828fact_entry: @rem{! This is the tail call entry point}
829ac = arg @rem{! The accumulator is the first argument}
830in = arg @rem{! The factorial argument}
831 getarg R0, ac @rem{! Move the accumulator to R0}
832 getarg R1, in @rem{! Move the argument to R1}
833 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
834 mulr R0, R0, R1 @rem{! accumulator *= argument}
835 putargr R0, ac @rem{! Update the accumulator}
836 subi R1, R1, 1 @rem{! argument -= 1}
837 putargr R1, in @rem{! Update the argument}
838 jmpi fact_entry @rem{! Tail Call Optimize it!}
839fact_out:
840 retr R0 @rem{! Return the accumulator}
841@end example
842
843@item Predicates
844@example
845forward_p (not specified) @r{forward label predicate}
846indirect_p (not specified) @r{indirect label predicate}
847target_p (not specified) @r{used label predicate}
848arg_register_p (not specified) @r{argument kind predicate}
849callee_save_p (not specified) @r{callee save predicate}
850pointer_p (not specified) @r{pointer predicate}
851@end example
852
853@code{forward_p} expects a @code{jit_node_t*} argument, and
854returns non zero if it is a forward label reference, that is,
855a label returned by @code{forward}, that still needs a
856@code{link} call.
857
858@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
859non zero if it is an indirect label reference, that is, a label that
860was returned by @code{indirect}.
861
862@code{target_p} expects a @code{jit_node_t*} argument, that is any
863kind of label, and will return non zero if there is at least one
864jump or move referencing it.
865
866@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
867have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
868will return non zero if the argument lives in a register. This call
869is useful to know the live range of register arguments, as those
870are very fast to read and write, but have volatile values.
871
872@code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
873@code{JIT_Fn}, and will return non zero if the register is callee
874save. This call is useful because on several ports, the @code{JIT_Rn}
875and @code{JIT_Fn} registers are actually callee save; no need
876to save and load the values when making function calls.
877
878@code{pointer_p} expects a pointer argument, and will return non
879zero if the pointer is inside the generated jit code. Must be
880called after @code{jit_emit} and before @code{jit_destroy_state}.
881@end table
882
883@node GNU lightning examples
884@chapter Generating code at run-time
885
886To use @lightning{}, you should include the @file{lightning.h} file that
887is put in your include directory by the @samp{make install} command.
888
889Each of the instructions above translates to a macro or function call.
890All you have to do is prepend @code{jit_} (lowercase) to opcode names
891and @code{JIT_} (uppercase) to register names. Of course, parameters
892are to be put between parentheses.
893
894This small tutorial presents three examples:
895
896@iftex
897@itemize @bullet
898@item
899The @code{incr} function found in @ref{The instruction set, ,
900@lightning{}'s instruction set}:
901
902@item
903A simple function call to @code{printf}
904
905@item
906An RPN calculator.
907
908@item
909Fibonacci numbers
910@end itemize
911@end iftex
912@ifnottex
913@menu
914* incr:: A function which increments a number by one
915* printf:: A simple function call to printf
916* RPN calculator:: A more complex example, an RPN calculator
917* Fibonacci:: Calculating Fibonacci numbers
918@end menu
919@end ifnottex
920
921@node incr
922@section A function which increments a number by one
923
924Let's see how to create and use the sample @code{incr} function created
925in @ref{The instruction set, , @lightning{}'s instruction set}:
926
927@example
928#include <stdio.h>
929#include <lightning.h>
930
931static jit_state_t *_jit;
932
933typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
934
935int main(int argc, char *argv[])
936@{
937 jit_node_t *in;
938 pifi incr;
939
940 init_jit(argv[0]);
941 _jit = jit_new_state();
942
943 jit_prolog(); @rem{/* @t{ prolog } */}
944 in = jit_arg(); @rem{/* @t{ in = arg } */}
945 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
946 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
947 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
948
949 incr = jit_emit();
950 jit_clear_state();
951
952 @rem{/* call the generated code@comma{} passing 5 as an argument */}
953 printf("%d + 1 = %d\n", 5, incr(5));
954
955 jit_destroy_state();
956 finish_jit();
957 return 0;
958@}
959@end example
960
961Let's examine the code line by line (well, almost@dots{}):
962
963@table @t
964@item #include <lightning.h>
965You already know about this. It defines all of @lightning{}'s macros.
966
967@item static jit_state_t *_jit;
968You might wonder about what is @code{jit_state_t}. It is a structure
969that stores jit code generation information. The name @code{_jit} is
970special, because since multiple jit generators can run at the same
971time, you must either @r{#define _jit my_jit_state} or name it
972@code{_jit}.
973
974@item typedef int (*pifi)(int);
975Just a handy typedef for a pointer to a function that takes an
976@code{int} and returns another.
977
978@item jit_node_t *in;
979Declares a variable to hold an identifier for a function argument. It
980is an opaque pointer, that will hold the return of a call to @code{arg}
981and be used as argument to @code{getarg}.
982
983@item pifi incr;
984Declares a function pointer variable to a function that receives an
985@code{int} and returns an @code{int}.
986
987@item init_jit(argv[0]);
988You must call this function before creating a @code{jit_state_t}
989object. This function does global state initialization, and may need
990to detect CPU or Operating System features. It receives a string
991argument that is later used to read symbols from a shared object using
992GNU binutils if disassembly was enabled at configure time. If no
993disassembly will be performed a NULL pointer can be used as argument.
994
995@item _jit = jit_new_state();
996This call initializes a @lightning{} jit state.
997
998@item jit_prolog();
999Ok, so we start generating code for our beloved function@dots{}
1000
1001@item in = jit_arg();
1002@itemx jit_getarg(JIT_R0, in);
1003We retrieve the first (and only) argument, an integer, and store it
1004into the general-purpose register @code{R0}.
1005
1006@item jit_addi(JIT_R0, JIT_R0, 1);
1007We add one to the content of the register.
1008
1009@item jit_retr(JIT_R0);
1010This instruction generates a standard function epilog that returns
1011the contents of the @code{R0} register.
1012
1013@item incr = jit_emit();
1014This instruction is very important. It actually translates the
1015@lightning{} macros used before to machine code, flushes the generated
1016code area out of the processor's instruction cache and return a
1017pointer to the start of the code.
1018
1019@item jit_clear_state();
1020This call cleanups any data not required for jit execution. Note
1021that it must be called after any call to @code{jit_print} or
1022@code{jit_address}, as this call destroy the @lightning{}
1023intermediate representation.
1024
1025@item printf("%d + 1 = %d", 5, incr(5));
1026Calling our function is this simple---it is not distinguishable from
1027a normal C function call, the only difference being that @code{incr}
1028is a variable.
1029
1030@item jit_destroy_state();
1031Releases all memory associated with the jit context. It should be
1032called after known the jit will no longer be called.
1033
1034@item finish_jit();
1035This call cleanups any global state hold by @lightning{}, and is
1036advisable to call it once jit code will no longer be generated.
1037@end table
1038
1039@lightning{} abstracts two phases of dynamic code generation: selecting
1040instructions that map the standard representation, and emitting binary
1041code for these instructions. The client program has the responsibility
1042of describing the code to be generated using the standard @lightning{}
1043instruction set.
1044
1045Let's examine the code generated for @code{incr} on the SPARC and x86_64
1046architecture (on the right is the code that an assembly-language
1047programmer would write):
1048
1049@table @b
1050@item SPARC
1051@example
1052 save %sp, -112, %sp
1053 mov %i0, %g2 retl
1054 inc %g2 inc %o0
1055 mov %g2, %i0
519a9ea1
PC
1056 restore
1057 retl
1058 nop
4a71579b
PC
1059@end example
1060In this case, @lightning{} introduces overhead to create a register
1061window (not knowing that the procedure is a leaf procedure) and to
1062move the argument to the general purpose register @code{R0} (which
1063maps to @code{%g2} on the SPARC).
1064@end table
1065
1066@table @b
1067@item x86_64
1068@example
1069 sub $0x30,%rsp
1070 mov %rbp,(%rsp)
1071 mov %rsp,%rbp
1072 sub $0x18,%rsp
1073 mov %rdi,%rax mov %rdi, %rax
1074 add $0x1,%rax inc %rax
1075 mov %rbp,%rsp
1076 mov (%rsp),%rbp
1077 add $0x30,%rsp
1078 retq retq
1079@end example
1080In this case, the main overhead is due to the function's prolog and
1081epilog, and stack alignment after reserving stack space for word
1082to/from float conversions or moving data from/to x87 to/from SSE.
1083Note that besides allocating space to save callee saved registers,
1084no registers are saved/restored because @lightning{} notices those
1085registers are not modified. There is currently no logic to detect
1086if it needs to allocate stack space for type conversions neither
1087proper leaf function detection, but these are subject to change
1088(FIXME).
1089@end table
1090
1091@node printf
1092@section A simple function call to @code{printf}
1093
1094Again, here is the code for the example:
1095
1096@example
1097#include <stdio.h>
1098#include <lightning.h>
1099
1100static jit_state_t *_jit;
1101
1102typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1103
1104int main(int argc, char *argv[])
1105@{
1106 pvfi myFunction; @rem{/* ptr to generated code */}
1107 jit_node_t *start, *end; @rem{/* a couple of labels */}
1108 jit_node_t *in; @rem{/* to get the argument */}
1109
1110 init_jit(argv[0]);
1111 _jit = jit_new_state();
1112
1113 start = jit_note(__FILE__, __LINE__);
1114 jit_prolog();
1115 in = jit_arg();
1116 jit_getarg(JIT_R1, in);
1117 jit_prepare();
1118 jit_pushargi((jit_word_t)"generated %d bytes\n");
1119 jit_ellipsis();
1120 jit_pushargr(JIT_R1);
1121 jit_finishi(printf);
1122 jit_ret();
1123 jit_epilog();
1124 end = jit_note(__FILE__, __LINE__);
1125
1126 myFunction = jit_emit();
1127
1128 @rem{/* call the generated code@comma{} passing its size as argument */}
1129 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1130 jit_clear_state();
1131
1132 jit_disassemble();
1133
1134 jit_destroy_state();
1135 finish_jit();
1136 return 0;
1137@}
1138@end example
1139
1140The function shows how many bytes were generated. Most of the code
1141is not very interesting, as it resembles very closely the program
1142presented in @ref{incr, , A function which increments a number by one}.
1143
1144For this reason, we're going to concentrate on just a few statements.
1145
1146@table @t
1147@item start = jit_note(__FILE__, __LINE__);
1148@itemx @r{@dots{}}
1149@itemx end = jit_note(__FILE__, __LINE__);
1150These two instruction call the @code{jit_note} macro, which creates
1151a note in the jit code; arguments to @code{jit_note} usually are a
1152filename string and line number integer, but using NULL for the
1153string argument is perfectly valid if only need to create a simple
1154marker in the code.
1155
1156@item jit_ellipsis();
1157@code{ellipsis} usually is only required if calling varargs functions
1158with double arguments, but it is a good practice to properly describe
1159the @r{@dots{}} in the call sequence.
1160
1161@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1162Note the use of the @code{(jit_word_t)} cast, that is used only
1163to avoid a compiler warning, due to using a pointer where a
1164wordsize integer type was expected.
1165
1166@item jit_prepare();
1167@itemx @r{@dots{}}
1168@itemx jit_finishi(printf);
1169Once the arguments to @code{printf} have been pushed, what means
1170moving them to stack or register arguments, the @code{printf}
1171function is called and the stack cleaned. Note how @lightning{}
1172abstracts the differences between different architectures and
1173ABI's -- the client program does not know how parameter passing
1174works on the host architecture.
1175
1176@item jit_epilog();
1177Usually it is not required to call @code{epilog}, but because it
1178is implicitly called when noticing the end of a function, if the
1179@code{end} variable was set with a @code{note} call after the
1180@code{ret}, it would not consider the function epilog.
1181
1182@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1183This calls the generate jit function passing as argument the offset
1184difference from the @code{start} and @code{end} notes. The @code{address}
1185call must be done after the @code{emit} call or either a fatal error
1186will happen (if @lightning{} is built with assertions enable) or an
1187undefined value will be returned.
1188
1189@item jit_clear_state();
1190Note that @code{jit_clear_state} was called after executing jit in
1191this example. It was done because it must be called after any call
1192to @code{jit_address} or @code{jit_print}.
1193
1194@item jit_disassemble();
1195@code{disassemble} will dump the generated code to standard output,
1196unless @lightning{} was built with the disassembler disabled, in which
1197case no output will be shown.
1198@end table
1199
1200@node RPN calculator
1201@section A more complex example, an RPN calculator
1202
1203We create a small stack-based RPN calculator which applies a series
1204of operators to a given parameter and to other numeric operands.
1205Unlike previous examples, the code generator is fully parameterized
1206and is able to compile different formulas to different functions.
1207Here is the code for the expression compiler; a sample usage will
1208follow.
1209
1210Since @lightning{} does not provide push/pop instruction, this
1211example uses a stack-allocated area to store the data. Such an
1212area can be allocated using the macro @code{allocai}, which
1213receives the number of bytes to allocate and returns the offset
1214from the frame pointer register @code{FP} to the base of the
1215area.
1216
1217Usually, you will use the @code{ldxi} and @code{stxi} instruction
1218to access stack-allocated variables. However, it is possible to
1219use operations such as @code{add} to compute the address of the
1220variables, and pass the address around.
1221
1222@example
1223#include <stdio.h>
1224#include <lightning.h>
1225
1226typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1227
1228static jit_state_t *_jit;
1229
1230void stack_push(int reg, int *sp)
1231@{
1232 jit_stxi_i (*sp, JIT_FP, reg);
1233 *sp += sizeof (int);
1234@}
1235
1236void stack_pop(int reg, int *sp)
1237@{
1238 *sp -= sizeof (int);
1239 jit_ldxi_i (reg, JIT_FP, *sp);
1240@}
1241
1242jit_node_t *compile_rpn(char *expr)
1243@{
1244 jit_node_t *in, *fn;
1245 int stack_base, stack_ptr;
1246
1247 fn = jit_note(NULL, 0);
1248 jit_prolog();
1249 in = jit_arg();
1250 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1251
1252 jit_getarg_i(JIT_R2, in);
1253
1254 while (*expr) @{
1255 char buf[32];
1256 int n;
1257 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1258 expr += n - 1;
1259 stack_push(JIT_R0, &stack_ptr);
1260 jit_movi(JIT_R0, atoi(buf));
1261 @} else if (*expr == 'x') @{
1262 stack_push(JIT_R0, &stack_ptr);
1263 jit_movr(JIT_R0, JIT_R2);
1264 @} else if (*expr == '+') @{
1265 stack_pop(JIT_R1, &stack_ptr);
1266 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1267 @} else if (*expr == '-') @{
1268 stack_pop(JIT_R1, &stack_ptr);
1269 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1270 @} else if (*expr == '*') @{
1271 stack_pop(JIT_R1, &stack_ptr);
1272 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1273 @} else if (*expr == '/') @{
1274 stack_pop(JIT_R1, &stack_ptr);
1275 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1276 @} else @{
1277 fprintf(stderr, "cannot compile: %s\n", expr);
1278 abort();
1279 @}
1280 ++expr;
1281 @}
1282 jit_retr(JIT_R0);
1283 jit_epilog();
1284 return fn;
1285@}
1286@end example
1287
1288The principle on which the calculator is based is easy: the stack top
1289is held in R0, while the remaining items of the stack are held in the
1290memory area that we allocate with @code{allocai}. Compiling a numeric
1291operand or the argument @code{x} pushes the old stack top onto the
1292stack and moves the operand into R0; compiling an operator pops the
1293second operand off the stack into R1, and compiles the operation so
1294that the result goes into R0, thus becoming the new stack top.
1295
1296This example allocates a fixed area for 32 @code{int}s. This is not
1297a problem when the function is a leaf like in this case; in a full-blown
1298compiler you will want to analyze the input and determine the number
1299of needed stack slots---a very simple example of register allocation.
1300The area is then managed like a stack using @code{stack_push} and
1301@code{stack_pop}.
1302
1303Source code for the client (which lies in the same source file) follows:
1304
1305@example
1306int main(int argc, char *argv[])
1307@{
1308 jit_node_t *nc, *nf;
1309 pifi c2f, f2c;
1310 int i;
1311
1312 init_jit(argv[0]);
1313 _jit = jit_new_state();
1314
1315 nc = compile_rpn("32x9*5/+");
1316 nf = compile_rpn("x32-5*9/");
1317 (void)jit_emit();
1318 c2f = (pifi)jit_address(nc);
1319 f2c = (pifi)jit_address(nf);
1320 jit_clear_state();
1321
1322 printf("\nC:");
1323 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1324 printf("\nF:");
1325 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1326 printf("\n");
1327
1328 printf("\nF:");
1329 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1330 printf("\nC:");
1331 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1332 printf("\n");
1333
1334 jit_destroy_state();
1335 finish_jit();
1336 return 0;
1337@}
1338@end example
1339
1340The client displays a conversion table between Celsius and Fahrenheit
1341degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1342formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1343respectively.
1344
1345Providing the formula as an argument to @code{compile_rpn} effectively
1346parameterizes code generation, making it possible to use the same code
1347to compile different functions; this is what makes dynamic code
1348generation so powerful.
1349
1350@node Fibonacci
1351@section Fibonacci numbers
1352
1353The code in this section calculates the Fibonacci sequence. That is
1354modeled by the recurrence relation:
1355@display
1356 f(0) = 0
1357 f(1) = f(2) = 1
1358 f(n) = f(n-1) + f(n-2)
1359@end display
1360
1361The purpose of this example is to introduce branches. There are two
1362kind of branches: backward branches and forward branches. We'll
1363present the calculation in a recursive and iterative form; the
1364former only uses forward branches, while the latter uses both.
1365
1366@example
1367#include <stdio.h>
1368#include <lightning.h>
1369
1370static jit_state_t *_jit;
1371
1372typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1373
1374int main(int argc, char *argv[])
1375@{
1376 pifi fib;
1377 jit_node_t *label;
1378 jit_node_t *call;
1379 jit_node_t *in; @rem{/* offset of the argument */}
1380 jit_node_t *ref; @rem{/* to patch the forward reference */}
1381 jit_node_t *zero; @rem{/* to patch the forward reference */}
1382
1383 init_jit(argv[0]);
1384 _jit = jit_new_state();
1385
1386 label = jit_label();
1387 jit_prolog ();
1388 in = jit_arg ();
1389 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1390 zero = jit_beqi (JIT_R0, 0);
1391 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1392 jit_movi (JIT_R0, 1);
1393 ref = jit_blei (JIT_V0, 2);
1394 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1395 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1396 jit_prepare();
1397 jit_pushargr(JIT_V1);
1398 call = jit_finishi(NULL);
1399 jit_patch_at(call, label);
1400 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1401 jit_prepare();
1402 jit_pushargr(JIT_V2);
1403 call = jit_finishi(NULL);
1404 jit_patch_at(call, label);
1405 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1406 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1407
1408 jit_patch(ref); @rem{/* patch jump */}
1409 jit_patch(zero); @rem{/* patch jump */}
1410 jit_retr(JIT_R0);
1411
1412 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1413 fib = jit_emit();
1414 jit_clear_state();
1415 printf("fib(%d) = %d\n", 32, fib(32));
1416 jit_destroy_state();
1417 finish_jit();
1418 return 0;
1419@}
1420@end example
1421
1422As said above, this is the first example of dynamically compiling
1423branches. Branch instructions have two operands containing the
1424values to be compared, and return a @code{jit_note_t *} object
1425to be patched.
1426
1427Because labels final address are only known after calling @code{emit},
1428it is required to call @code{patch} or @code{patch_at}, what does
1429tell @lightning{} that the target to patch is actually a pointer to
1430a @code{jit_node_t *} object, otherwise, it would assume that is
1431a pointer to a C function. Note that conditional branches do not
1432receive a label argument, so they must be patched.
1433
1434You need to call @code{patch_at} on the return of value @code{calli},
1435@code{finishi}, and @code{calli} if it is actually referencing a label
1436in the jit code. All branch instructions do not receive a label
1437argument. Note that @code{movi} is an special case, and patching it
1438is usually done to get the final address of a label, usually to later
1439call @code{jmpr}.
1440
1441Now, here is the iterative version:
1442
1443@example
1444#include <stdio.h>
1445#include <lightning.h>
1446
1447static jit_state_t *_jit;
1448
1449typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1450
1451int main(int argc, char *argv[])
1452@{
1453 pifi fib;
1454 jit_node_t *in; @rem{/* offset of the argument */}
1455 jit_node_t *ref; @rem{/* to patch the forward reference */}
1456 jit_node_t *zero; @rem{/* to patch the forward reference */}
1457 jit_node_t *jump; @rem{/* jump to start of loop */}
1458 jit_node_t *loop; @rem{/* start of the loop */}
1459
1460 init_jit(argv[0]);
1461 _jit = jit_new_state();
1462
1463 jit_prolog ();
1464 in = jit_arg ();
1465 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1466 zero = jit_beqi (JIT_R0, 0);
1467 jit_movr (JIT_R1, JIT_R0);
1468 jit_movi (JIT_R0, 1);
1469 ref = jit_blti (JIT_R1, 2);
1470 jit_subi (JIT_R2, JIT_R2, 2);
1471 jit_movr (JIT_R1, JIT_R0);
1472
1473 loop= jit_label();
1474 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1475 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1476 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1477 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1478 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1479 jit_patch_at(jump, loop);
1480
1481 jit_patch(ref); @rem{/* patch forward jump */}
1482 jit_patch(zero); @rem{/* patch forward jump */}
1483 jit_retr (JIT_R0);
1484
1485 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1486 fib = jit_emit();
1487 jit_clear_state();
1488 printf("fib(%d) = %d\n", 36, fib(36));
1489 jit_destroy_state();
1490 finish_jit();
1491 return 0;
1492@}
1493@end example
1494
1495This code calculates the recurrence relation using iteration (a
1496@code{for} loop in high-level languages). There are no function
1497calls anymore: instead, there is a backward jump (the @code{bnei} at
1498the end of the loop).
1499
1500Note that the program must remember the address for backward jumps;
1501for forward jumps it is only required to remember the jump code,
1502and call @code{patch} for the implicit label.
1503
1504@node Reentrancy
1505@chapter Re-entrant usage of @lightning{}
1506
1507@lightning{} uses the special @code{_jit} identifier. To be able
1508to be able to use multiple jit generation states at the same
1509time, it is required to used code similar to:
1510
1511@example
1512 struct jit_state lightning;
1513 #define lightning _jit
1514@end example
1515
1516This will cause the symbol defined to @code{_jit} to be passed as
1517the first argument to the underlying @lightning{} implementation,
1518that is usually a function with an @code{_} (underscode) prefix
1519and with an argument named @code{_jit}, in the pattern:
1520
1521@example
1522 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1523 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1524@end example
1525
1526The reason for this is to use the same syntax as the initial lightning
1527implementation and to avoid needing the user to keep adding an extra
1528argument to every call, as multiple jit states generating code in
1529paralell should be very uncommon.
1530
519a9ea1 1531@node Registers
4a71579b
PC
1532@chapter Accessing the whole register file
1533
1534As mentioned earlier in this chapter, all @lightning{} back-ends are
1535guaranteed to have at least six general-purpose integer registers and
1536six floating-point registers, but many back-ends will have more.
1537
1538To access the entire register files, you can use the
1539@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1540accept a parameter that identifies the register number, which
1541must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1542and @code{JIT_F_NUM} respectively; the number need not be
1543constant. Of course, expressions like @code{JIT_R0} and
1544@code{JIT_R(0)} denote the same register, and likewise for
1545integer callee-saved, or floating-point, registers.
1546
519a9ea1
PC
1547@section Scratch registers
1548
1549For operations, @lightning{} does not support directly, like storing
1550a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1551acquire and release a scratch register as in the following pattern:
1552
1553@example
1554 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1555 jit_movi (reg, immediate);
1556 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1557 jit_unget_reg (reg);
1558@end example
1559
1560As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1561reloads but don't follow branches, the code between both must be in
1562the same basic block and must not contain any branches as in the
1563following (bad) example.
1564
1565@example
1566 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1567 jit_ldxi (reg, JIT_V0, offset);
1568 jump = jit_bnei (reg, V0);
1569 jit_movr (JIT_V1, reg);
1570 jit_patch (jump);
1571 jit_unget_reg (reg);
1572@end example
1573
4a71579b
PC
1574@node Customizations
1575@chapter Customizations
1576
1577Frequently it is desirable to have more control over how code is
1578generated or how memory is used during jit generation or execution.
1579
1580@section Memory functions
1581To aid in complete control of memory allocation and deallocation
1582@lightning{} provides wrappers that default to standard @code{malloc},
1583@code{realloc} and @code{free}. These are loosely based on the
1584GNU GMP counterparts, with the difference that they use the same
1585prototype of the system allocation functions, that is, no @code{size}
1586for @code{free} or @code{old_size} for @code{realloc}.
1587
1588@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1589@lightning{} guarantees that memory is only allocated or released
1590using these wrapped functions, but you must note that if lightning
1591was linked to GNU binutils, malloc is probably will be called multiple
1592times from there when initializing the disassembler.
1593
1594Because @code{init_jit} may call memory functions, if you need to call
1595@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1596otherwise, when calling @code{finish_jit}, a pointer allocated with the
1597previous or default wrappers will be passed.
1598@end deftypefun
1599
1600@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1601Get the current memory allocation function. Also, unlike the GNU GMP
1602counterpart, it is an error to pass @code{NULL} pointers as arguments.
1603@end deftypefun
1604
1605@section Alternate code buffer
1606To instruct @lightning{} to use an alternate code buffer it is required
1607to call @code{jit_realize} before @code{jit_emit}, and then query states
1608and customize as appropriate.
1609
1610@deftypefun void jit_realize ()
1611Must be called once, before @code{jit_emit}, to instruct @lightning{}
1612that no other @code{jit_xyz} call will be made.
1613@end deftypefun
1614
1615@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1616Returns NULL or the previous value set with @code{jit_set_code}, and
1617sets the @var{code_size} argument to an appropriate value.
1618If @code{jit_get_code} is called before @code{jit_emit}, the
1619@var{code_size} argument is set to the expected amount of bytes
1620required to generate code.
1621If @code{jit_get_code} is called after @code{jit_emit}, the
1622@var{code_size} argument is set to the exact amount of bytes used
1623by the code.
1624@end deftypefun
1625
1626@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1627Instructs @lightning{} to output to the @var{code} argument and
1628use @var{size} as a guard to not write to invalid memory. If during
1629@code{jit_emit} @lightning{} finds out that the code would not fit
1630in @var{size} bytes, it halts code emit and returns @code{NULL}.
1631@end deftypefun
1632
1633A simple example of a loop using an alternate buffer is:
1634
1635@example
1636 jit_uint8_t *code;
1637 int *(func)(int); @rem{/* function pointer */}
1638 jit_word_t code_size;
1639 jit_word_t real_code_size;
1640 @rem{...}
1641 jit_realize(); @rem{/* ready to generate code */}
1642 jit_get_code(&code_size); @rem{/* get expected code size */}
1643 code_size = (code_size + 4095) & -4096;
1644 do (;;) @{
1645 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1646 MAP_PRIVATE | MAP_ANON, -1, 0);
1647 jit_set_code(code, code_size);
1648 if ((func = jit_emit()) == NULL) @{
1649 munmap(code, code_size);
1650 code_size += 4096;
1651 @}
1652 @} while (func == NULL);
1653 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1654@end example
1655
1656The first call to @code{jit_get_code} should return @code{NULL} and set
1657the @code{code_size} argument to the expected amount of bytes required
1658to emit code.
1659The second call to @code{jit_get_code} is after a successful call to
1660@code{jit_emit}, and will return the value previously set with
1661@code{jit_set_code} and set the @code{real_code_size} argument to the
1662exact amount of bytes used to emit the code.
1663
1664@section Alternate data buffer
1665Sometimes it may be desirable to customize how, or to prevent
1666@lightning{} from using an extra buffer for constants or debug
1667annotation. Usually when also using an alternate code buffer.
1668
1669@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1670Returns @code{NULL} or the previous value set with @code{jit_set_data},
1671and sets the @var{data_size} argument to how many bytes are required
1672for the constants data buffer, and @var{note_size} to how many bytes
1673are required to store the debug note information.
1674Note that it always preallocate one debug note entry even if
1675@code{jit_name} or @code{jit_note} are never called, but will return
1676zero in the @var{data_size} argument if no constant is required;
1677constants are only used for the @code{float} and @code{double} operations
1678that have an immediate argument, and not in all @lightning{} ports.
1679@end deftypefun
1680
1681@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1682
1683@var{data} can be NULL if disabling constants and annotations, otherwise,
1684a valid pointer must be passed. An assertion is done that the data will
1685fit in @var{size} bytes (but that is a noop if @lightning{} was built
1686with @code{-DNDEBUG}).
1687
1688@var{size} tells the space in bytes available in @var{data}.
1689
1690@var{flags} can be zero to tell to just use the alternate data buffer,
1691or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1692
1693@table @t
1694@item JIT_DISABLE_DATA
1695@cindex JIT_DISABLE_DATA
1696Instructs @lightning{} to not use a constant table, but to use an
1697alternate method to synthesize those, usually with a larger code
1698sequence using stack space to transfer the value from a GPR to a
1699FPR register.
1700
1701@item JIT_DISABLE_NOTE
1702@cindex JIT_DISABLE_NOTE
1703Instructs @lightning{} to not store file or function name, and
1704line numbers in the constant buffer.
1705@end table
1706@end deftypefun
1707
1708A simple example of a preventing usage of a data buffer is:
1709
1710@example
1711 @rem{...}
1712 jit_realize(); @rem{/* ready to generate code */}
1713 jit_get_data(NULL, NULL);
1714 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1715 @rem{...}
1716@end example
1717
1718Or to only use a data buffer, if required:
1719
1720@example
1721 jit_uint8_t *data;
1722 jit_word_t data_size;
1723 @rem{...}
1724 jit_realize(); @rem{/* ready to generate code */}
1725 jit_get_data(&data_size, NULL);
1726 if (data_size)
1727 data = malloc(data_size);
1728 else
1729 data = NULL;
1730 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1731 @rem{...}
1732 if (data)
1733 free(data);
1734 @rem{...}
1735@end example
1736
1737@node Acknowledgements
1738@chapter Acknowledgements
1739
1740As far as I know, the first general-purpose portable dynamic code
1741generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1742Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1743unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1744directly inspired @lightning{}.
1745
1746Thanks go to Ian Piumarta, who kindly accepted to release his own
1747program @sc{ccg} under the GNU General Public License, thereby allowing
1748@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1749@sc{ccg} provides a way of dynamically assemble programs written in the
1750underlying architecture's assembly language. So it is not portable,
1751yet very interesting.
1752
1753I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1754was first developed as a tool to be used in GNU Smalltalk's dynamic
1755translator from bytecodes to native code.