Fix GTE directly referencing psxRegs
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
33* Customizations:: Advanced code generation customizations
34* Acknowledgements:: Acknowledgements for GNU lightning
35@end menu
36@end ifnottex
37
38@node Overview
39@chapter Introduction to @lightning{}
40
41@iftex
42This document describes @value{TOPIC} the @lightning{} library for
43dynamic code generation.
44@end iftex
45
46Dynamic code generation is the generation of machine code
47at runtime. It is typically used to strip a layer of interpretation
48by allowing compilation to occur at runtime. One of the most
49well-known applications of dynamic code generation is perhaps that
50of interpreters that compile source code to an intermediate bytecode
51form, which is then recompiled to machine code at run-time: this
52approach effectively combines the portability of bytecode
53representations with the speed of machine code. Another common
54application of dynamic code generation is in the field of hardware
55simulators and binary emulators, which can use the same techniques
56to translate simulated instructions to the instructions of the
57underlying machine.
58
59Yet other applications come to mind: for example, windowing
60@dfn{bitblt} operations, matrix manipulations, and network packet
61filters. Albeit very powerful and relatively well known within the
62compiler community, dynamic code generation techniques are rarely
63exploited to their full potential and, with the exception of the
64two applications described above, have remained curiosities because
65of their portability and functionality barriers: binary instructions
66are generated, so programs using dynamic code generation must be
67retargeted for each machine; in addition, coding a run-time code
68generator is a tedious and error-prone task more than a difficult one.
69
70@lightning{} provides a portable, fast and easily retargetable dynamic
71code generation system.
72
73To be portable, @lightning{} abstracts over current architectures'
74quirks and unorthogonalities. The interface that it exposes to is that
75of a standardized RISC architecture loosely based on the SPARC and MIPS
76chips. There are a few general-purpose registers (six, not including
77those used to receive and pass parameters between subroutines), and
78arithmetic operations involve three operands---either three registers
79or two registers and an arbitrarily sized immediate value.
80
81On one hand, this architecture is general enough that it is possible to
82generate pretty efficient code even on CISC architectures such as the
83Intel x86 or the Motorola 68k families. On the other hand, it matches
84real architectures closely enough that, most of the time, the
85compiler's constant folding pass ends up generating code which
86assembles machine instructions without further tests.
87
88@node Installation
89@chapter Configuring and installing @lightning{}
90
91The first thing to do to use @lightning{} is to configure the
92program, picking the set of macros to be used on the host
93architecture; this configuration is automatically performed by
94the @file{configure} shell script; to run it, merely type:
95@example
96 ./configure
97@end example
98
99@lightning{} supports the @code{--enable-disassembler} option, that
100enables linking to GNU binutils and optionally print human readable
101disassembly of the jit code. This option can be disabled by the
102@code{--disable-disassembler} option.
103
104Another option that @file{configure} accepts is
105@code{--enable-assertions}, which enables several consistency checks in
106the run-time assemblers. These are not usually needed, so you can
107decide to simply forget about it; also remember that these consistency
108checks tend to slow down your code generator.
109
110After you've configured @lightning{}, run @file{make} as usual.
111
112@lightning{} has an extensive set of tests to validate it is working
113correctly in the build host. To test it run:
114@example
115 make check
116@end example
117
118The next important step is:
119@example
120 make install
121@end example
122
123This ends the process of installing @lightning{}.
124
125@node The instruction set
126@chapter @lightning{}'s instruction set
127
128@lightning{}'s instruction set was designed by deriving instructions
129that closely match those of most existing RISC architectures, or
130that can be easily syntesized if absent. Each instruction is composed
131of:
132@itemize @bullet
133@item
134an operation, like @code{sub} or @code{mul}
135
136@item
137most times, a register/immediate flag (@code{r} or @code{i})
138
139@item
140an unsigned modifier (@code{u}), a type identifier or two, when applicable.
141@end itemize
142
143Examples of legal mnemonics are @code{addr} (integer add, with three
144register operands) and @code{muli} (integer multiply, with two
145register operands and an immediate operand). Each instruction takes
146two or three operands; in most cases, one of them can be an immediate
147value instead of a register.
148
149Most @lightning{} integer operations are signed wordsize operations,
150with the exception of operations that convert types, or load or store
151values to/from memory. When applicable, the types and C types are as
152follow:
153
154@example
155 _c @r{signed char}
156 _uc @r{unsigned char}
157 _s @r{short}
158 _us @r{unsigned short}
159 _i @r{int}
160 _ui @r{unsigned int}
161 _l @r{long}
162 _f @r{float}
163 _d @r{double}
164@end example
165
166Most integer operations do not need a type modifier, and when loading or
167storing values to memory there is an alias to the proper operation
168using wordsize operands, that is, if ommited, the type is @r{int} on
16932-bit architectures and @r{long} on 64-bit architectures. Note
170that lightning also expects @code{sizeof(void*)} to match the wordsize.
171
172When an unsigned operation result differs from the equivalent signed
173operation, there is a the @code{_u} modifier.
174
175There are at least seven integer registers, of which six are
176general-purpose, while the last is used to contain the frame pointer
177(@code{FP}). The frame pointer can be used to allocate and access local
178variables on the stack, using the @code{allocai} or @code{allocar}
179instruction.
180
181Of the general-purpose registers, at least three are guaranteed to be
182preserved across function calls (@code{V0}, @code{V1} and
183@code{V2}) and at least three are not (@code{R0}, @code{R1} and
184@code{R2}). Six registers are not very much, but this
185restriction was forced by the need to target CISC architectures
186which, like the x86, are poor of registers; anyway, backends can
187specify the actual number of available registers with the calls
188@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
189(for callee-save registers).
190
191There are at least six floating-point registers, named @code{F0} to
192@code{F5}. These are usually caller-save and are separate from the integer
193registers on the supported architectures; on Intel architectures,
194in 32 bit mode if SSE2 is not available or use of X87 is forced,
195the register stack is mapped to a flat register file. As for the
196integer registers, the macro @code{JIT_F_NUM} yields the number of
197floating-point registers.
198
199The complete instruction set follows; as you can see, most non-memory
200operations only take integers (either signed or unsigned) as operands;
201this was done in order to reduce the instruction set, and because most
202architectures only provide word and long word operations on registers.
203There are instructions that allow operands to be extended to fit a larger
204data type, both in a signed and in an unsigned way.
205
206@table @b
207@item Binary ALU operations
208These accept three operands; the last one can be an immediate.
209@code{addx} operations must directly follow @code{addc}, and
210@code{subx} must follow @code{subc}; otherwise, results are undefined.
211Most, if not all, architectures do not support @r{float} or @r{double}
212immediate operands; lightning emulates those operations by moving the
213immediate to a temporary register and emiting the call with only
214register operands.
215@example
216addr _f _d O1 = O2 + O3
217addi _f _d O1 = O2 + O3
218addxr O1 = O2 + (O3 + carry)
219addxi O1 = O2 + (O3 + carry)
220addcr O1 = O2 + O3, set carry
221addci O1 = O2 + O3, set carry
222subr _f _d O1 = O2 - O3
223subi _f _d O1 = O2 - O3
224subxr O1 = O2 - (O3 + carry)
225subxi O1 = O2 - (O3 + carry)
226subcr O1 = O2 - O3, set carry
227subci O1 = O2 - O3, set carry
228rsbr _f _d O1 = O3 - O1
229rsbi _f _d O1 = O3 - O1
230mulr _f _d O1 = O2 * O3
231muli _f _d O1 = O2 * O3
232divr _u _f _d O1 = O2 / O3
233divi _u _f _d O1 = O2 / O3
234remr _u O1 = O2 % O3
235remi _u O1 = O2 % O3
236andr O1 = O2 & O3
237andi O1 = O2 & O3
238orr O1 = O2 | O3
239ori O1 = O2 | O3
240xorr O1 = O2 ^ O3
241xori O1 = O2 ^ O3
242lshr O1 = O2 << O3
243lshi O1 = O2 << O3
244rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
245rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
246@end example
247
248@item Four operand binary ALU operations
249These accept two result registers, and two operands; the last one can
250be an immediate. The first two arguments cannot be the same register.
251
252@code{qmul} stores the low word of the result in @code{O1} and the
253high word in @code{O2}. For unsigned multiplication, @code{O2} zero
254means there was no overflow. For signed multiplication, no overflow
255check is based on sign, and can be detected if @code{O2} is zero or
256minus one.
257
258@code{qdiv} stores the quotient in @code{O1} and the remainder in
259@code{O2}. It can be used as quick way to check if a division is
260exact, in which case the remainder is zero.
261
262@example
263qmulr _u O1 O2 = O3 * O4
264qmuli _u O1 O2 = O3 * O4
265qdivr _u O1 O2 = O3 / O4
266qdivi _u O1 O2 = O3 / O4
267@end example
268
269@item Unary ALU operations
270These accept two operands, both of which must be registers.
271@example
272negr _f _d O1 = -O2
273comr O1 = ~O2
274@end example
275
276These unary ALU operations are only defined for float operands.
277@example
278absr _f _d O1 = fabs(O2)
279sqrtr O1 = sqrt(O2)
280@end example
281
282Besides requiring the @code{r} modifier, there are no unary operations
283with an immediate operand.
284
285@item Compare instructions
286These accept three operands; again, the last can be an immediate.
287The last two operands are compared, and the first operand, that must be
288an integer register, is set to either 0 or 1, according to whether the
289given condition was met or not.
290
291The conditions given below are for the standard behavior of C,
292where the ``unordered'' comparison result is mapped to false.
293
294@example
295ltr _u _f _d O1 = (O2 < O3)
296lti _u _f _d O1 = (O2 < O3)
297ler _u _f _d O1 = (O2 <= O3)
298lei _u _f _d O1 = (O2 <= O3)
299gtr _u _f _d O1 = (O2 > O3)
300gti _u _f _d O1 = (O2 > O3)
301ger _u _f _d O1 = (O2 >= O3)
302gei _u _f _d O1 = (O2 >= O3)
303eqr _f _d O1 = (O2 == O3)
304eqi _f _d O1 = (O2 == O3)
305ner _f _d O1 = (O2 != O3)
306nei _f _d O1 = (O2 != O3)
307unltr _f _d O1 = !(O2 >= O3)
308unler _f _d O1 = !(O2 > O3)
309ungtr _f _d O1 = !(O2 <= O3)
310unger _f _d O1 = !(O2 < O3)
311uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
312ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
313ordr _f _d O1 = (O2 == O2) && (O3 == O3)
314unordr _f _d O1 = (O2 != O2) || (O3 != O3)
315@end example
316
317@item Transfer operations
318These accept two operands; for @code{ext} both of them must be
319registers, while @code{mov} accepts an immediate value as the second
320operand.
321
322Unlike @code{movr} and @code{movi}, the other instructions are used
323to truncate a wordsize operand to a smaller integer data type or to
324convert float data types. You can also use @code{extr} to convert an
325integer to a floating point value: the usual options are @code{extr_f}
326and @code{extr_d}.
327
328@example
329movr _f _d O1 = O2
330movi _f _d O1 = O2
331extr _c _uc _s _us _i _ui _f _d O1 = O2
332truncr _f _d O1 = trunc(O2)
333@end example
334
335In 64-bit architectures it may be required to use @code{truncr_f_i},
336@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
337the equivalent C code. Only the @code{_i} modifier is available in
33832-bit architectures.
339
340@example
341truncr_f_i = <int> O1 = <float> O2
342truncr_f_l = <long>O1 = <float> O2
343truncr_d_i = <int> O1 = <double>O2
344truncr_d_l = <long>O1 = <double>O2
345@end example
346
347The float conversion operations are @emph{destination first,
348source second}, but the order of the types is reversed. This happens
349for historical reasons.
350
351@example
352extr_f_d = <double>O1 = <float> O2
353extr_d_f = <float> O1 = <double>O2
354@end example
355
356@item Network extensions
357These accept two operands, both of which must be registers; these
358two instructions actually perform the same task, yet they are
359assigned to two mnemonics for the sake of convenience and
360completeness. As usual, the first operand is the destination and
361the second is the source.
362The @code{_ul} variant is only available in 64-bit architectures.
363@example
364htonr _us _ui _ul @r{Host-to-network (big endian) order}
365ntohr _us _ui _ul @r{Network-to-host order }
366@end example
367
368@item Load operations
369@code{ld} accepts two operands while @code{ldx} accepts three;
370in both cases, the last can be either a register or an immediate
371value. Values are extended (with or without sign, according to
372the data type specification) to fit a whole register.
373The @code{_ui} and @code{_l} types are only available in 64-bit
374architectures. For convenience, there is a version without a
375type modifier for integer or pointer operands that uses the
376appropriate wordsize call.
377@example
378ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
379ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
380ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
381ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
382@end example
383
384@item Store operations
385@code{st} accepts two operands while @code{stx} accepts three; in
386both cases, the first can be either a register or an immediate
387value. Values are sign-extended to fit a whole register.
388@example
389str _c _uc _s _us _i _ui _l _f _d *O1 = O2
390sti _c _uc _s _us _i _ui _l _f _d *O1 = O2
391stxr _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
392stxi _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
393@end example
394As for the load operations, the @code{_ui} and @code{_l} types are
395only available in 64-bit architectures, and for convenience, there
396is a version without a type modifier for integer or pointer operands
397that uses the appropriate wordsize call.
398
399@item Argument management
400These are:
401@example
402prepare (not specified)
403va_start (not specified)
404pushargr _f _d
405pushargi _f _d
406va_push (not specified)
407arg _f _d
408getarg _c _uc _s _us _i _ui _l _f _d
409va_arg _d
410putargr _f _d
411putargi _f _d
412ret (not specified)
413retr _f _d
414reti _f _d
415va_end (not specified)
416retval _c _uc _s _us _i _ui _l _f _d
417epilog (not specified)
418@end example
419As with other operations that use a type modifier, the @code{_ui} and
420@code{_l} types are only available in 64-bit architectures, but there
421are operations without a type modifier that alias to the appropriate
422integer operation with wordsize operands.
423
424@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
425while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
426A code snippet that wants to call another procedure and has to pass
427arguments must, in order: use the @code{prepare} instruction and use
428the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
429left to right order}; and use @code{finish} or @code{call} (explained below)
430to perform the actual call.
431
432@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
433arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
434@code{va_push} is required when passing a @code{va_list} to another function,
435because not all architectures expect it as a single pointer. Known case
436is DEC Alpha, that requires it as a structure passed by value.
437
438@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
439@code{arg} is different from other instruction in that it does not
440actually generate any code: instead, it is a function which returns
441a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
442a value'' means that @lightning{} code that compile these
443instructions return a value when expanded.} You should call
444@code{arg} as soon as possible, before any function call or, more
445easily, right after the @code{prolog} instructions
446(which is treated later).
447
448@code{getarg} accepts a register argument and a value returned by
449@code{arg}, and will move that argument to the register, extending
450it (with or without sign, according to the data type specification)
451to fit a whole register. These instructions are more intimately
452related to the usage of the @lightning{} instruction set in code
453that generates other code, so they will be treated more
454specifically in @ref{GNU lightning examples, , Generating code at
455run-time}.
456
457@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
458it accepts as first argument a register or immediate, and as
459second argument a value returned by @code{arg}. It allows changing,
460or restoring an argument to the current function, and is a
461construct required to implement tail call optimization. Note that
462arguments in registers are very cheap, but will be overwritten
463at any moment, including on some operations, for example division,
464that on several ports is implemented as a function call.
465
466Finally, the @code{retval} instruction fetches the return value of a
467called function in a register. The @code{retval} instruction takes a
468register argument and copies the return value of the previously called
469function in that register. A function with a return value should use
470@code{retr} or @code{reti} to put the return value in the return register
471before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
472
473@code{epilog} is an optional call, that marks the end of a function
474body. It is automatically generated by @lightning{} if starting a new
475function (what should be done after a @code{ret} call) or finishing
476generating jit.
477It is very important to note that the fact that @code{epilog} being
478optional may cause a common mistake. Consider this:
479@example
480fun1:
481 prolog
482 ...
483 ret
484fun2:
485 prolog
486@end example
487Because @code{epilog} is added when finding a new @code{prolog},
488this will cause the @code{fun2} label to actually be before the
489return from @code{fun1}. Because @lightning{} will actually
490understand it as:
491@example
492fun1:
493 prolog
494 ...
495 ret
496fun2:
497 epilog
498 prolog
499@end example
500
501You should observe a few rules when using these macros. First of
502all, if calling a varargs function, you should use the @code{ellipsis}
503call to mark the position of the ellipsis in the C prototype.
504
505You should not nest calls to @code{prepare} inside a
506@code{prepare/finish} block. Doing this will result in undefined
507behavior. Note that for functions with zero arguments you can use
508just @code{call}.
509
510@item Branch instructions
511Like @code{arg}, these also return a value which, in this case,
512is to be used to compile forward branches as explained in
513@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
514compared; of these, the last can be either a register or an immediate.
515They are:
516@example
517bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
518blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
519bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
520blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
521bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
522bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
523bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
524bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
525beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
526beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
527bner _f _d @r{if }(O2 != O3)@r{ goto }O1
528bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
529
530bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
531bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
532bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
533bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
534buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
535bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
536bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
537bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
538
539bmsr @r{if }O2 & O3@r{ goto }O1
540bmsi @r{if }O2 & O3@r{ goto }O1
541bmcr @r{if }!(O2 & O3)@r{ goto }O1
542bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
543boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
544boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
545bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
546bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
547bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
548bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
549bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
550bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
551@end example
552
553@item Jump and return operations
554These accept one argument except @code{ret} and @code{jmpi} which
555have none; the difference between @code{finishi} and @code{calli}
556is that the latter does not clean the stack from pushed parameters
557(if any) and the former must @strong{always} follow a @code{prepare}
558instruction.
559@example
560callr (not specified) @r{function call to register O1}
561calli (not specified) @r{function call to immediate O1}
562finishr (not specified) @r{function call to register O1}
563finishi (not specified) @r{function call to immediate O1}
564jmpr (not specified) @r{unconditional jump to register}
565jmpi (not specified) @r{unconditional jump}
566ret (not specified) @r{return from subroutine}
567retr _c _uc _s _us _i _ui _l _f _d
568reti _c _uc _s _us _i _ui _l _f _d
569retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
570 @r{to register}
571@end example
572
573Like branch instruction, @code{jmpi} also returns a value which is to
574be used to compile forward branches. @xref{Fibonacci, , Fibonacci
575numbers}.
576
577@item Labels
578There are 3 @lightning{} instructions to create labels:
579@example
580label (not specified) @r{simple label}
581forward (not specified) @r{forward label}
582indirect (not specified) @r{special simple label}
583@end example
584
585@code{label} is normally used as @code{patch_at} argument for backward
586jumps.
587
588@example
589 jit_node_t *jump, *label;
590label = jit_label();
591 ...
592 jump = jit_beqr(JIT_R0, JIT_R1);
593 jit_patch_at(jump, label);
594@end example
595
596@code{forward} is used to patch code generation before the actual
597position of the label is known.
598
599@example
600 jit_node_t *jump, *label;
601label = jit_forward();
602 jump = jit_beqr(JIT_R0, JIT_R1);
603 jit_patch_at(jump, label);
604 ...
605 jit_link(label);
606@end example
607
608@code{indirect} is useful when creating jump tables, and tells
609@lightning{} to not optimize out a label that is not the target of
610any jump, because an indirect jump may land where it is defined.
611
612@example
613 jit_node_t *jump, *label;
614 ...
615 jmpr(JIT_R0); @rem{/* may jump to label */}
616 ...
617label = jit_indirect();
618@end example
619
620@code{indirect} is an special case of @code{note} and @code{name}
621because it is a valid argument to @code{address}.
622
623Note that the usual idiom to write the previous example is
624@example
625 jit_node_t *addr, *jump;
626addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
627 ...
628 jmpr(JIT_R0);
629 ...
630 jit_patch(addr); @rem{/* implicit label added */}
631@end example
632
633that automatically binds the implicit label added by @code{patch} with
634the @code{movi}, but on some special conditions it is required to create
635an "unbound" label.
636
637@item Function prolog
638
639These macros are used to set up a function prolog. The @code{allocai}
640call accept a single integer argument and returns an offset value
641for stack storage access. The @code{allocar} accepts two registers
642arguments, the first is set to the offset for stack access, and the
643second is the size in bytes argument.
644
645@example
646prolog (not specified) @r{function prolog}
647allocai (not specified) @r{reserve space on the stack}
648allocar (not specified) @r{allocate space on the stack}
649@end example
650
651@code{allocai} receives the number of bytes to allocate and returns
652the offset from the frame pointer register @code{FP} to the base of
653the area.
654
655@code{allocar} receives two register arguments. The first is where
656to store the offset from the frame pointer register @code{FP} to the
657base of the area. The second argument is the size in bytes. Note
658that @code{allocar} is dynamic allocation, and special attention
659should be taken when using it. If called in a loop, every iteration
660will allocate stack space. Stack space is aligned from 8 to 64 bytes
661depending on backend requirements, even if allocating only one byte.
662It is advisable to not use it with @code{frame} and @code{tramp}; it
663should work with @code{frame} with special care to call only once,
664but is not supported if used in @code{tramp}, even if called only
665once.
666
667As a small appetizer, here is a small function that adds 1 to the input
668parameter (an @code{int}). I'm using an assembly-like syntax here which
669is a bit different from the one used when writing real subroutines with
670@lightning{}; the real syntax will be introduced in @xref{GNU lightning
671examples, , Generating code at run-time}.
672
673@example
674incr:
675 prolog
676in = arg @rem{! We have an integer argument}
677 getarg R0, in @rem{! Move it to R0}
678 addi R0, R0, 1 @rem{! Add 1}
679 retr R0 @rem{! And return the result}
680@end example
681
682And here is another function which uses the @code{printf} function from
683the standard C library to write a number in hexadecimal notation:
684
685@example
686printhex:
687 prolog
688in = arg @rem{! Same as above}
689 getarg R0, in
690 prepare @rem{! Begin call sequence for printf}
691 pushargi "%x" @rem{! Push format string}
692 ellipsis @rem{! Varargs start here}
693 pushargr R0 @rem{! Push second argument}
694 finishi printf @rem{! Call printf}
695 ret @rem{! Return to caller}
696@end example
697
698@item Trampolines, continuations and tail call optimization
699
700Frequently it is required to generate jit code that must jump to
701code generated later, possibly from another @code{jit_context_t}.
702These require compatible stack frames.
703
704@lightning{} provides two primitives from where trampolines,
705continuations and tail call optimization can be implemented.
706
707@example
708frame (not specified) @r{create stack frame}
709tramp (not specified) @r{assume stack frame}
710@end example
711
712@code{frame} receives an integer argument@footnote{It is not
713automatically computed because it does not know about the
714requirement of later generated code.} that defines the size in
715bytes for the stack frame of the current, @code{C} callable,
716jit function. To calculate this value, a good formula is maximum
717number of arguments to any called native function times
718eight@footnote{Times eight so that it works for double arguments.
719And would not need conditionals for ports that pass arguments in
720the stack.}, plus the sum of the arguments to any call to
721@code{jit_allocai}. @lightning{} automatically adjusts this value
722for any backend specific stack memory it may need, or any
723alignment constraint.
724
725@code{frame} also instructs @lightning{} to save all callee
726save registers in the prolog and reload in the epilog.
727
728@example
729main: @rem{! jit entry point}
730 prolog @rem{! function prolog}
731 frame 256 @rem{! save all callee save registers and}
732 @rem{! reserve at least 256 bytes in stack}
733main_loop:
734 ...
735 jmpi handler @rem{! jumps to external code}
736 ...
737 ret @rem{! return to the caller}
738@end example
739
740@code{tramp} differs from @code{frame} only that a prolog and epilog
741will not be generated. Note that @code{prolog} must still be used.
742The code under @code{tramp} must be ready to be entered with a jump
743at the prolog position, and instead of a return, it must end with
744a non conditional jump. @code{tramp} exists solely for the fact
745that it allows optimizing out prolog and epilog code that would
746never be executed.
747
748@example
749handler: @rem{! handler entry point}
750 prolog @rem{! function prolog}
751 tramp 256 @rem{! assumes all callee save registers}
752 @rem{! are saved and there is at least}
753 @rem{! 256 bytes in stack}
754 ...
755 jmpi main_loop @rem{! return to the main loop}
756@end example
757
758@lightning{} only supports Tail Call Optimization using the
759@code{tramp} construct. Any other way is not guaranteed to
760work on all ports.
761
762An example of a simple (recursive) tail call optimization:
763
764@example
765factorial: @rem{! Entry point of the factorial function}
766 prolog
767in = arg @rem{! Receive an integer argument}
768 getarg R0, in @rem{! Move argument to RO}
769 prepare
770 pushargi 1 @rem{! This is the accumulator}
771 pushargr R0 @rem{! This is the argument}
772 finishi fact @rem{! Call the tail call optimized function}
773 retval R0 @rem{! Fetch the result}
774 retr R0 @rem{! Return it}
775 epilog @rem{! Epilog *before* label before prolog}
776
777fact: @rem{! Entry point of the helper function}
778 prolog
779 frame 16 @rem{! Reserve 16 bytes in the stack}
780fact_entry: @rem{! This is the tail call entry point}
781ac = arg @rem{! The accumulator is the first argument}
782in = arg @rem{! The factorial argument}
783 getarg R0, ac @rem{! Move the accumulator to R0}
784 getarg R1, in @rem{! Move the argument to R1}
785 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
786 mulr R0, R0, R1 @rem{! accumulator *= argument}
787 putargr R0, ac @rem{! Update the accumulator}
788 subi R1, R1, 1 @rem{! argument -= 1}
789 putargr R1, in @rem{! Update the argument}
790 jmpi fact_entry @rem{! Tail Call Optimize it!}
791fact_out:
792 retr R0 @rem{! Return the accumulator}
793@end example
794
795@item Predicates
796@example
797forward_p (not specified) @r{forward label predicate}
798indirect_p (not specified) @r{indirect label predicate}
799target_p (not specified) @r{used label predicate}
800arg_register_p (not specified) @r{argument kind predicate}
801callee_save_p (not specified) @r{callee save predicate}
802pointer_p (not specified) @r{pointer predicate}
803@end example
804
805@code{forward_p} expects a @code{jit_node_t*} argument, and
806returns non zero if it is a forward label reference, that is,
807a label returned by @code{forward}, that still needs a
808@code{link} call.
809
810@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
811non zero if it is an indirect label reference, that is, a label that
812was returned by @code{indirect}.
813
814@code{target_p} expects a @code{jit_node_t*} argument, that is any
815kind of label, and will return non zero if there is at least one
816jump or move referencing it.
817
818@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
819have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
820will return non zero if the argument lives in a register. This call
821is useful to know the live range of register arguments, as those
822are very fast to read and write, but have volatile values.
823
824@code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
825@code{JIT_Fn}, and will return non zero if the register is callee
826save. This call is useful because on several ports, the @code{JIT_Rn}
827and @code{JIT_Fn} registers are actually callee save; no need
828to save and load the values when making function calls.
829
830@code{pointer_p} expects a pointer argument, and will return non
831zero if the pointer is inside the generated jit code. Must be
832called after @code{jit_emit} and before @code{jit_destroy_state}.
833@end table
834
835@node GNU lightning examples
836@chapter Generating code at run-time
837
838To use @lightning{}, you should include the @file{lightning.h} file that
839is put in your include directory by the @samp{make install} command.
840
841Each of the instructions above translates to a macro or function call.
842All you have to do is prepend @code{jit_} (lowercase) to opcode names
843and @code{JIT_} (uppercase) to register names. Of course, parameters
844are to be put between parentheses.
845
846This small tutorial presents three examples:
847
848@iftex
849@itemize @bullet
850@item
851The @code{incr} function found in @ref{The instruction set, ,
852@lightning{}'s instruction set}:
853
854@item
855A simple function call to @code{printf}
856
857@item
858An RPN calculator.
859
860@item
861Fibonacci numbers
862@end itemize
863@end iftex
864@ifnottex
865@menu
866* incr:: A function which increments a number by one
867* printf:: A simple function call to printf
868* RPN calculator:: A more complex example, an RPN calculator
869* Fibonacci:: Calculating Fibonacci numbers
870@end menu
871@end ifnottex
872
873@node incr
874@section A function which increments a number by one
875
876Let's see how to create and use the sample @code{incr} function created
877in @ref{The instruction set, , @lightning{}'s instruction set}:
878
879@example
880#include <stdio.h>
881#include <lightning.h>
882
883static jit_state_t *_jit;
884
885typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
886
887int main(int argc, char *argv[])
888@{
889 jit_node_t *in;
890 pifi incr;
891
892 init_jit(argv[0]);
893 _jit = jit_new_state();
894
895 jit_prolog(); @rem{/* @t{ prolog } */}
896 in = jit_arg(); @rem{/* @t{ in = arg } */}
897 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
898 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
899 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
900
901 incr = jit_emit();
902 jit_clear_state();
903
904 @rem{/* call the generated code@comma{} passing 5 as an argument */}
905 printf("%d + 1 = %d\n", 5, incr(5));
906
907 jit_destroy_state();
908 finish_jit();
909 return 0;
910@}
911@end example
912
913Let's examine the code line by line (well, almost@dots{}):
914
915@table @t
916@item #include <lightning.h>
917You already know about this. It defines all of @lightning{}'s macros.
918
919@item static jit_state_t *_jit;
920You might wonder about what is @code{jit_state_t}. It is a structure
921that stores jit code generation information. The name @code{_jit} is
922special, because since multiple jit generators can run at the same
923time, you must either @r{#define _jit my_jit_state} or name it
924@code{_jit}.
925
926@item typedef int (*pifi)(int);
927Just a handy typedef for a pointer to a function that takes an
928@code{int} and returns another.
929
930@item jit_node_t *in;
931Declares a variable to hold an identifier for a function argument. It
932is an opaque pointer, that will hold the return of a call to @code{arg}
933and be used as argument to @code{getarg}.
934
935@item pifi incr;
936Declares a function pointer variable to a function that receives an
937@code{int} and returns an @code{int}.
938
939@item init_jit(argv[0]);
940You must call this function before creating a @code{jit_state_t}
941object. This function does global state initialization, and may need
942to detect CPU or Operating System features. It receives a string
943argument that is later used to read symbols from a shared object using
944GNU binutils if disassembly was enabled at configure time. If no
945disassembly will be performed a NULL pointer can be used as argument.
946
947@item _jit = jit_new_state();
948This call initializes a @lightning{} jit state.
949
950@item jit_prolog();
951Ok, so we start generating code for our beloved function@dots{}
952
953@item in = jit_arg();
954@itemx jit_getarg(JIT_R0, in);
955We retrieve the first (and only) argument, an integer, and store it
956into the general-purpose register @code{R0}.
957
958@item jit_addi(JIT_R0, JIT_R0, 1);
959We add one to the content of the register.
960
961@item jit_retr(JIT_R0);
962This instruction generates a standard function epilog that returns
963the contents of the @code{R0} register.
964
965@item incr = jit_emit();
966This instruction is very important. It actually translates the
967@lightning{} macros used before to machine code, flushes the generated
968code area out of the processor's instruction cache and return a
969pointer to the start of the code.
970
971@item jit_clear_state();
972This call cleanups any data not required for jit execution. Note
973that it must be called after any call to @code{jit_print} or
974@code{jit_address}, as this call destroy the @lightning{}
975intermediate representation.
976
977@item printf("%d + 1 = %d", 5, incr(5));
978Calling our function is this simple---it is not distinguishable from
979a normal C function call, the only difference being that @code{incr}
980is a variable.
981
982@item jit_destroy_state();
983Releases all memory associated with the jit context. It should be
984called after known the jit will no longer be called.
985
986@item finish_jit();
987This call cleanups any global state hold by @lightning{}, and is
988advisable to call it once jit code will no longer be generated.
989@end table
990
991@lightning{} abstracts two phases of dynamic code generation: selecting
992instructions that map the standard representation, and emitting binary
993code for these instructions. The client program has the responsibility
994of describing the code to be generated using the standard @lightning{}
995instruction set.
996
997Let's examine the code generated for @code{incr} on the SPARC and x86_64
998architecture (on the right is the code that an assembly-language
999programmer would write):
1000
1001@table @b
1002@item SPARC
1003@example
1004 save %sp, -112, %sp
1005 mov %i0, %g2 retl
1006 inc %g2 inc %o0
1007 mov %g2, %i0
1008 restore
1009 retl
1010 nop
1011@end example
1012In this case, @lightning{} introduces overhead to create a register
1013window (not knowing that the procedure is a leaf procedure) and to
1014move the argument to the general purpose register @code{R0} (which
1015maps to @code{%g2} on the SPARC).
1016@end table
1017
1018@table @b
1019@item x86_64
1020@example
1021 sub $0x30,%rsp
1022 mov %rbp,(%rsp)
1023 mov %rsp,%rbp
1024 sub $0x18,%rsp
1025 mov %rdi,%rax mov %rdi, %rax
1026 add $0x1,%rax inc %rax
1027 mov %rbp,%rsp
1028 mov (%rsp),%rbp
1029 add $0x30,%rsp
1030 retq retq
1031@end example
1032In this case, the main overhead is due to the function's prolog and
1033epilog, and stack alignment after reserving stack space for word
1034to/from float conversions or moving data from/to x87 to/from SSE.
1035Note that besides allocating space to save callee saved registers,
1036no registers are saved/restored because @lightning{} notices those
1037registers are not modified. There is currently no logic to detect
1038if it needs to allocate stack space for type conversions neither
1039proper leaf function detection, but these are subject to change
1040(FIXME).
1041@end table
1042
1043@node printf
1044@section A simple function call to @code{printf}
1045
1046Again, here is the code for the example:
1047
1048@example
1049#include <stdio.h>
1050#include <lightning.h>
1051
1052static jit_state_t *_jit;
1053
1054typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1055
1056int main(int argc, char *argv[])
1057@{
1058 pvfi myFunction; @rem{/* ptr to generated code */}
1059 jit_node_t *start, *end; @rem{/* a couple of labels */}
1060 jit_node_t *in; @rem{/* to get the argument */}
1061
1062 init_jit(argv[0]);
1063 _jit = jit_new_state();
1064
1065 start = jit_note(__FILE__, __LINE__);
1066 jit_prolog();
1067 in = jit_arg();
1068 jit_getarg(JIT_R1, in);
1069 jit_prepare();
1070 jit_pushargi((jit_word_t)"generated %d bytes\n");
1071 jit_ellipsis();
1072 jit_pushargr(JIT_R1);
1073 jit_finishi(printf);
1074 jit_ret();
1075 jit_epilog();
1076 end = jit_note(__FILE__, __LINE__);
1077
1078 myFunction = jit_emit();
1079
1080 @rem{/* call the generated code@comma{} passing its size as argument */}
1081 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1082 jit_clear_state();
1083
1084 jit_disassemble();
1085
1086 jit_destroy_state();
1087 finish_jit();
1088 return 0;
1089@}
1090@end example
1091
1092The function shows how many bytes were generated. Most of the code
1093is not very interesting, as it resembles very closely the program
1094presented in @ref{incr, , A function which increments a number by one}.
1095
1096For this reason, we're going to concentrate on just a few statements.
1097
1098@table @t
1099@item start = jit_note(__FILE__, __LINE__);
1100@itemx @r{@dots{}}
1101@itemx end = jit_note(__FILE__, __LINE__);
1102These two instruction call the @code{jit_note} macro, which creates
1103a note in the jit code; arguments to @code{jit_note} usually are a
1104filename string and line number integer, but using NULL for the
1105string argument is perfectly valid if only need to create a simple
1106marker in the code.
1107
1108@item jit_ellipsis();
1109@code{ellipsis} usually is only required if calling varargs functions
1110with double arguments, but it is a good practice to properly describe
1111the @r{@dots{}} in the call sequence.
1112
1113@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1114Note the use of the @code{(jit_word_t)} cast, that is used only
1115to avoid a compiler warning, due to using a pointer where a
1116wordsize integer type was expected.
1117
1118@item jit_prepare();
1119@itemx @r{@dots{}}
1120@itemx jit_finishi(printf);
1121Once the arguments to @code{printf} have been pushed, what means
1122moving them to stack or register arguments, the @code{printf}
1123function is called and the stack cleaned. Note how @lightning{}
1124abstracts the differences between different architectures and
1125ABI's -- the client program does not know how parameter passing
1126works on the host architecture.
1127
1128@item jit_epilog();
1129Usually it is not required to call @code{epilog}, but because it
1130is implicitly called when noticing the end of a function, if the
1131@code{end} variable was set with a @code{note} call after the
1132@code{ret}, it would not consider the function epilog.
1133
1134@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1135This calls the generate jit function passing as argument the offset
1136difference from the @code{start} and @code{end} notes. The @code{address}
1137call must be done after the @code{emit} call or either a fatal error
1138will happen (if @lightning{} is built with assertions enable) or an
1139undefined value will be returned.
1140
1141@item jit_clear_state();
1142Note that @code{jit_clear_state} was called after executing jit in
1143this example. It was done because it must be called after any call
1144to @code{jit_address} or @code{jit_print}.
1145
1146@item jit_disassemble();
1147@code{disassemble} will dump the generated code to standard output,
1148unless @lightning{} was built with the disassembler disabled, in which
1149case no output will be shown.
1150@end table
1151
1152@node RPN calculator
1153@section A more complex example, an RPN calculator
1154
1155We create a small stack-based RPN calculator which applies a series
1156of operators to a given parameter and to other numeric operands.
1157Unlike previous examples, the code generator is fully parameterized
1158and is able to compile different formulas to different functions.
1159Here is the code for the expression compiler; a sample usage will
1160follow.
1161
1162Since @lightning{} does not provide push/pop instruction, this
1163example uses a stack-allocated area to store the data. Such an
1164area can be allocated using the macro @code{allocai}, which
1165receives the number of bytes to allocate and returns the offset
1166from the frame pointer register @code{FP} to the base of the
1167area.
1168
1169Usually, you will use the @code{ldxi} and @code{stxi} instruction
1170to access stack-allocated variables. However, it is possible to
1171use operations such as @code{add} to compute the address of the
1172variables, and pass the address around.
1173
1174@example
1175#include <stdio.h>
1176#include <lightning.h>
1177
1178typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1179
1180static jit_state_t *_jit;
1181
1182void stack_push(int reg, int *sp)
1183@{
1184 jit_stxi_i (*sp, JIT_FP, reg);
1185 *sp += sizeof (int);
1186@}
1187
1188void stack_pop(int reg, int *sp)
1189@{
1190 *sp -= sizeof (int);
1191 jit_ldxi_i (reg, JIT_FP, *sp);
1192@}
1193
1194jit_node_t *compile_rpn(char *expr)
1195@{
1196 jit_node_t *in, *fn;
1197 int stack_base, stack_ptr;
1198
1199 fn = jit_note(NULL, 0);
1200 jit_prolog();
1201 in = jit_arg();
1202 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1203
1204 jit_getarg_i(JIT_R2, in);
1205
1206 while (*expr) @{
1207 char buf[32];
1208 int n;
1209 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1210 expr += n - 1;
1211 stack_push(JIT_R0, &stack_ptr);
1212 jit_movi(JIT_R0, atoi(buf));
1213 @} else if (*expr == 'x') @{
1214 stack_push(JIT_R0, &stack_ptr);
1215 jit_movr(JIT_R0, JIT_R2);
1216 @} else if (*expr == '+') @{
1217 stack_pop(JIT_R1, &stack_ptr);
1218 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1219 @} else if (*expr == '-') @{
1220 stack_pop(JIT_R1, &stack_ptr);
1221 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1222 @} else if (*expr == '*') @{
1223 stack_pop(JIT_R1, &stack_ptr);
1224 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1225 @} else if (*expr == '/') @{
1226 stack_pop(JIT_R1, &stack_ptr);
1227 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1228 @} else @{
1229 fprintf(stderr, "cannot compile: %s\n", expr);
1230 abort();
1231 @}
1232 ++expr;
1233 @}
1234 jit_retr(JIT_R0);
1235 jit_epilog();
1236 return fn;
1237@}
1238@end example
1239
1240The principle on which the calculator is based is easy: the stack top
1241is held in R0, while the remaining items of the stack are held in the
1242memory area that we allocate with @code{allocai}. Compiling a numeric
1243operand or the argument @code{x} pushes the old stack top onto the
1244stack and moves the operand into R0; compiling an operator pops the
1245second operand off the stack into R1, and compiles the operation so
1246that the result goes into R0, thus becoming the new stack top.
1247
1248This example allocates a fixed area for 32 @code{int}s. This is not
1249a problem when the function is a leaf like in this case; in a full-blown
1250compiler you will want to analyze the input and determine the number
1251of needed stack slots---a very simple example of register allocation.
1252The area is then managed like a stack using @code{stack_push} and
1253@code{stack_pop}.
1254
1255Source code for the client (which lies in the same source file) follows:
1256
1257@example
1258int main(int argc, char *argv[])
1259@{
1260 jit_node_t *nc, *nf;
1261 pifi c2f, f2c;
1262 int i;
1263
1264 init_jit(argv[0]);
1265 _jit = jit_new_state();
1266
1267 nc = compile_rpn("32x9*5/+");
1268 nf = compile_rpn("x32-5*9/");
1269 (void)jit_emit();
1270 c2f = (pifi)jit_address(nc);
1271 f2c = (pifi)jit_address(nf);
1272 jit_clear_state();
1273
1274 printf("\nC:");
1275 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1276 printf("\nF:");
1277 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1278 printf("\n");
1279
1280 printf("\nF:");
1281 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1282 printf("\nC:");
1283 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1284 printf("\n");
1285
1286 jit_destroy_state();
1287 finish_jit();
1288 return 0;
1289@}
1290@end example
1291
1292The client displays a conversion table between Celsius and Fahrenheit
1293degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1294formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1295respectively.
1296
1297Providing the formula as an argument to @code{compile_rpn} effectively
1298parameterizes code generation, making it possible to use the same code
1299to compile different functions; this is what makes dynamic code
1300generation so powerful.
1301
1302@node Fibonacci
1303@section Fibonacci numbers
1304
1305The code in this section calculates the Fibonacci sequence. That is
1306modeled by the recurrence relation:
1307@display
1308 f(0) = 0
1309 f(1) = f(2) = 1
1310 f(n) = f(n-1) + f(n-2)
1311@end display
1312
1313The purpose of this example is to introduce branches. There are two
1314kind of branches: backward branches and forward branches. We'll
1315present the calculation in a recursive and iterative form; the
1316former only uses forward branches, while the latter uses both.
1317
1318@example
1319#include <stdio.h>
1320#include <lightning.h>
1321
1322static jit_state_t *_jit;
1323
1324typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1325
1326int main(int argc, char *argv[])
1327@{
1328 pifi fib;
1329 jit_node_t *label;
1330 jit_node_t *call;
1331 jit_node_t *in; @rem{/* offset of the argument */}
1332 jit_node_t *ref; @rem{/* to patch the forward reference */}
1333 jit_node_t *zero; @rem{/* to patch the forward reference */}
1334
1335 init_jit(argv[0]);
1336 _jit = jit_new_state();
1337
1338 label = jit_label();
1339 jit_prolog ();
1340 in = jit_arg ();
1341 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1342 zero = jit_beqi (JIT_R0, 0);
1343 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1344 jit_movi (JIT_R0, 1);
1345 ref = jit_blei (JIT_V0, 2);
1346 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1347 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1348 jit_prepare();
1349 jit_pushargr(JIT_V1);
1350 call = jit_finishi(NULL);
1351 jit_patch_at(call, label);
1352 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1353 jit_prepare();
1354 jit_pushargr(JIT_V2);
1355 call = jit_finishi(NULL);
1356 jit_patch_at(call, label);
1357 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1358 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1359
1360 jit_patch(ref); @rem{/* patch jump */}
1361 jit_patch(zero); @rem{/* patch jump */}
1362 jit_retr(JIT_R0);
1363
1364 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1365 fib = jit_emit();
1366 jit_clear_state();
1367 printf("fib(%d) = %d\n", 32, fib(32));
1368 jit_destroy_state();
1369 finish_jit();
1370 return 0;
1371@}
1372@end example
1373
1374As said above, this is the first example of dynamically compiling
1375branches. Branch instructions have two operands containing the
1376values to be compared, and return a @code{jit_note_t *} object
1377to be patched.
1378
1379Because labels final address are only known after calling @code{emit},
1380it is required to call @code{patch} or @code{patch_at}, what does
1381tell @lightning{} that the target to patch is actually a pointer to
1382a @code{jit_node_t *} object, otherwise, it would assume that is
1383a pointer to a C function. Note that conditional branches do not
1384receive a label argument, so they must be patched.
1385
1386You need to call @code{patch_at} on the return of value @code{calli},
1387@code{finishi}, and @code{calli} if it is actually referencing a label
1388in the jit code. All branch instructions do not receive a label
1389argument. Note that @code{movi} is an special case, and patching it
1390is usually done to get the final address of a label, usually to later
1391call @code{jmpr}.
1392
1393Now, here is the iterative version:
1394
1395@example
1396#include <stdio.h>
1397#include <lightning.h>
1398
1399static jit_state_t *_jit;
1400
1401typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1402
1403int main(int argc, char *argv[])
1404@{
1405 pifi fib;
1406 jit_node_t *in; @rem{/* offset of the argument */}
1407 jit_node_t *ref; @rem{/* to patch the forward reference */}
1408 jit_node_t *zero; @rem{/* to patch the forward reference */}
1409 jit_node_t *jump; @rem{/* jump to start of loop */}
1410 jit_node_t *loop; @rem{/* start of the loop */}
1411
1412 init_jit(argv[0]);
1413 _jit = jit_new_state();
1414
1415 jit_prolog ();
1416 in = jit_arg ();
1417 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1418 zero = jit_beqi (JIT_R0, 0);
1419 jit_movr (JIT_R1, JIT_R0);
1420 jit_movi (JIT_R0, 1);
1421 ref = jit_blti (JIT_R1, 2);
1422 jit_subi (JIT_R2, JIT_R2, 2);
1423 jit_movr (JIT_R1, JIT_R0);
1424
1425 loop= jit_label();
1426 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1427 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1428 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1429 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1430 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1431 jit_patch_at(jump, loop);
1432
1433 jit_patch(ref); @rem{/* patch forward jump */}
1434 jit_patch(zero); @rem{/* patch forward jump */}
1435 jit_retr (JIT_R0);
1436
1437 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1438 fib = jit_emit();
1439 jit_clear_state();
1440 printf("fib(%d) = %d\n", 36, fib(36));
1441 jit_destroy_state();
1442 finish_jit();
1443 return 0;
1444@}
1445@end example
1446
1447This code calculates the recurrence relation using iteration (a
1448@code{for} loop in high-level languages). There are no function
1449calls anymore: instead, there is a backward jump (the @code{bnei} at
1450the end of the loop).
1451
1452Note that the program must remember the address for backward jumps;
1453for forward jumps it is only required to remember the jump code,
1454and call @code{patch} for the implicit label.
1455
1456@node Reentrancy
1457@chapter Re-entrant usage of @lightning{}
1458
1459@lightning{} uses the special @code{_jit} identifier. To be able
1460to be able to use multiple jit generation states at the same
1461time, it is required to used code similar to:
1462
1463@example
1464 struct jit_state lightning;
1465 #define lightning _jit
1466@end example
1467
1468This will cause the symbol defined to @code{_jit} to be passed as
1469the first argument to the underlying @lightning{} implementation,
1470that is usually a function with an @code{_} (underscode) prefix
1471and with an argument named @code{_jit}, in the pattern:
1472
1473@example
1474 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1475 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1476@end example
1477
1478The reason for this is to use the same syntax as the initial lightning
1479implementation and to avoid needing the user to keep adding an extra
1480argument to every call, as multiple jit states generating code in
1481paralell should be very uncommon.
1482
1483@section Registers
1484@chapter Accessing the whole register file
1485
1486As mentioned earlier in this chapter, all @lightning{} back-ends are
1487guaranteed to have at least six general-purpose integer registers and
1488six floating-point registers, but many back-ends will have more.
1489
1490To access the entire register files, you can use the
1491@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1492accept a parameter that identifies the register number, which
1493must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1494and @code{JIT_F_NUM} respectively; the number need not be
1495constant. Of course, expressions like @code{JIT_R0} and
1496@code{JIT_R(0)} denote the same register, and likewise for
1497integer callee-saved, or floating-point, registers.
1498
1499@node Customizations
1500@chapter Customizations
1501
1502Frequently it is desirable to have more control over how code is
1503generated or how memory is used during jit generation or execution.
1504
1505@section Memory functions
1506To aid in complete control of memory allocation and deallocation
1507@lightning{} provides wrappers that default to standard @code{malloc},
1508@code{realloc} and @code{free}. These are loosely based on the
1509GNU GMP counterparts, with the difference that they use the same
1510prototype of the system allocation functions, that is, no @code{size}
1511for @code{free} or @code{old_size} for @code{realloc}.
1512
1513@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1514@lightning{} guarantees that memory is only allocated or released
1515using these wrapped functions, but you must note that if lightning
1516was linked to GNU binutils, malloc is probably will be called multiple
1517times from there when initializing the disassembler.
1518
1519Because @code{init_jit} may call memory functions, if you need to call
1520@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1521otherwise, when calling @code{finish_jit}, a pointer allocated with the
1522previous or default wrappers will be passed.
1523@end deftypefun
1524
1525@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1526Get the current memory allocation function. Also, unlike the GNU GMP
1527counterpart, it is an error to pass @code{NULL} pointers as arguments.
1528@end deftypefun
1529
1530@section Alternate code buffer
1531To instruct @lightning{} to use an alternate code buffer it is required
1532to call @code{jit_realize} before @code{jit_emit}, and then query states
1533and customize as appropriate.
1534
1535@deftypefun void jit_realize ()
1536Must be called once, before @code{jit_emit}, to instruct @lightning{}
1537that no other @code{jit_xyz} call will be made.
1538@end deftypefun
1539
1540@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1541Returns NULL or the previous value set with @code{jit_set_code}, and
1542sets the @var{code_size} argument to an appropriate value.
1543If @code{jit_get_code} is called before @code{jit_emit}, the
1544@var{code_size} argument is set to the expected amount of bytes
1545required to generate code.
1546If @code{jit_get_code} is called after @code{jit_emit}, the
1547@var{code_size} argument is set to the exact amount of bytes used
1548by the code.
1549@end deftypefun
1550
1551@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1552Instructs @lightning{} to output to the @var{code} argument and
1553use @var{size} as a guard to not write to invalid memory. If during
1554@code{jit_emit} @lightning{} finds out that the code would not fit
1555in @var{size} bytes, it halts code emit and returns @code{NULL}.
1556@end deftypefun
1557
1558A simple example of a loop using an alternate buffer is:
1559
1560@example
1561 jit_uint8_t *code;
1562 int *(func)(int); @rem{/* function pointer */}
1563 jit_word_t code_size;
1564 jit_word_t real_code_size;
1565 @rem{...}
1566 jit_realize(); @rem{/* ready to generate code */}
1567 jit_get_code(&code_size); @rem{/* get expected code size */}
1568 code_size = (code_size + 4095) & -4096;
1569 do (;;) @{
1570 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1571 MAP_PRIVATE | MAP_ANON, -1, 0);
1572 jit_set_code(code, code_size);
1573 if ((func = jit_emit()) == NULL) @{
1574 munmap(code, code_size);
1575 code_size += 4096;
1576 @}
1577 @} while (func == NULL);
1578 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1579@end example
1580
1581The first call to @code{jit_get_code} should return @code{NULL} and set
1582the @code{code_size} argument to the expected amount of bytes required
1583to emit code.
1584The second call to @code{jit_get_code} is after a successful call to
1585@code{jit_emit}, and will return the value previously set with
1586@code{jit_set_code} and set the @code{real_code_size} argument to the
1587exact amount of bytes used to emit the code.
1588
1589@section Alternate data buffer
1590Sometimes it may be desirable to customize how, or to prevent
1591@lightning{} from using an extra buffer for constants or debug
1592annotation. Usually when also using an alternate code buffer.
1593
1594@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1595Returns @code{NULL} or the previous value set with @code{jit_set_data},
1596and sets the @var{data_size} argument to how many bytes are required
1597for the constants data buffer, and @var{note_size} to how many bytes
1598are required to store the debug note information.
1599Note that it always preallocate one debug note entry even if
1600@code{jit_name} or @code{jit_note} are never called, but will return
1601zero in the @var{data_size} argument if no constant is required;
1602constants are only used for the @code{float} and @code{double} operations
1603that have an immediate argument, and not in all @lightning{} ports.
1604@end deftypefun
1605
1606@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1607
1608@var{data} can be NULL if disabling constants and annotations, otherwise,
1609a valid pointer must be passed. An assertion is done that the data will
1610fit in @var{size} bytes (but that is a noop if @lightning{} was built
1611with @code{-DNDEBUG}).
1612
1613@var{size} tells the space in bytes available in @var{data}.
1614
1615@var{flags} can be zero to tell to just use the alternate data buffer,
1616or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1617
1618@table @t
1619@item JIT_DISABLE_DATA
1620@cindex JIT_DISABLE_DATA
1621Instructs @lightning{} to not use a constant table, but to use an
1622alternate method to synthesize those, usually with a larger code
1623sequence using stack space to transfer the value from a GPR to a
1624FPR register.
1625
1626@item JIT_DISABLE_NOTE
1627@cindex JIT_DISABLE_NOTE
1628Instructs @lightning{} to not store file or function name, and
1629line numbers in the constant buffer.
1630@end table
1631@end deftypefun
1632
1633A simple example of a preventing usage of a data buffer is:
1634
1635@example
1636 @rem{...}
1637 jit_realize(); @rem{/* ready to generate code */}
1638 jit_get_data(NULL, NULL);
1639 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1640 @rem{...}
1641@end example
1642
1643Or to only use a data buffer, if required:
1644
1645@example
1646 jit_uint8_t *data;
1647 jit_word_t data_size;
1648 @rem{...}
1649 jit_realize(); @rem{/* ready to generate code */}
1650 jit_get_data(&data_size, NULL);
1651 if (data_size)
1652 data = malloc(data_size);
1653 else
1654 data = NULL;
1655 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1656 @rem{...}
1657 if (data)
1658 free(data);
1659 @rem{...}
1660@end example
1661
1662@node Acknowledgements
1663@chapter Acknowledgements
1664
1665As far as I know, the first general-purpose portable dynamic code
1666generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1667Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1668unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1669directly inspired @lightning{}.
1670
1671Thanks go to Ian Piumarta, who kindly accepted to release his own
1672program @sc{ccg} under the GNU General Public License, thereby allowing
1673@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1674@sc{ccg} provides a way of dynamically assemble programs written in the
1675underlying architecture's assembly language. So it is not portable,
1676yet very interesting.
1677
1678I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1679was first developed as a tool to be used in GNU Smalltalk's dynamic
1680translator from bytecodes to native code.