Merge pull request #657 from pcercuei/update-lightrec-20220529
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
519a9ea1 33* Registers:: Accessing the whole register file
4a71579b
PC
34* Customizations:: Advanced code generation customizations
35* Acknowledgements:: Acknowledgements for GNU lightning
36@end menu
37@end ifnottex
38
39@node Overview
40@chapter Introduction to @lightning{}
41
42@iftex
43This document describes @value{TOPIC} the @lightning{} library for
44dynamic code generation.
45@end iftex
46
519a9ea1
PC
47Dynamic code generation is the generation of machine code
48at runtime. It is typically used to strip a layer of interpretation
4a71579b
PC
49by allowing compilation to occur at runtime. One of the most
50well-known applications of dynamic code generation is perhaps that
51of interpreters that compile source code to an intermediate bytecode
52form, which is then recompiled to machine code at run-time: this
53approach effectively combines the portability of bytecode
54representations with the speed of machine code. Another common
55application of dynamic code generation is in the field of hardware
56simulators and binary emulators, which can use the same techniques
519a9ea1 57to translate simulated instructions to the instructions of the
4a71579b
PC
58underlying machine.
59
60Yet other applications come to mind: for example, windowing
61@dfn{bitblt} operations, matrix manipulations, and network packet
62filters. Albeit very powerful and relatively well known within the
63compiler community, dynamic code generation techniques are rarely
64exploited to their full potential and, with the exception of the
65two applications described above, have remained curiosities because
66of their portability and functionality barriers: binary instructions
67are generated, so programs using dynamic code generation must be
68retargeted for each machine; in addition, coding a run-time code
69generator is a tedious and error-prone task more than a difficult one.
70
71@lightning{} provides a portable, fast and easily retargetable dynamic
519a9ea1 72code generation system.
4a71579b
PC
73
74To be portable, @lightning{} abstracts over current architectures'
75quirks and unorthogonalities. The interface that it exposes to is that
76of a standardized RISC architecture loosely based on the SPARC and MIPS
77chips. There are a few general-purpose registers (six, not including
78those used to receive and pass parameters between subroutines), and
79arithmetic operations involve three operands---either three registers
80or two registers and an arbitrarily sized immediate value.
81
82On one hand, this architecture is general enough that it is possible to
83generate pretty efficient code even on CISC architectures such as the
84Intel x86 or the Motorola 68k families. On the other hand, it matches
85real architectures closely enough that, most of the time, the
86compiler's constant folding pass ends up generating code which
87assembles machine instructions without further tests.
88
89@node Installation
90@chapter Configuring and installing @lightning{}
91
40a44dcb
PC
92Here we will assume that your system already has the dependencies
93necessary to build @lightning{}. For more on dependencies, see
94@lightning{}'s @file{README-hacking} file.
95
96The first thing to do to build @lightning{} is to configure the
4a71579b
PC
97program, picking the set of macros to be used on the host
98architecture; this configuration is automatically performed by
99the @file{configure} shell script; to run it, merely type:
100@example
101 ./configure
102@end example
103
104@lightning{} supports the @code{--enable-disassembler} option, that
105enables linking to GNU binutils and optionally print human readable
106disassembly of the jit code. This option can be disabled by the
107@code{--disable-disassembler} option.
108
109Another option that @file{configure} accepts is
110@code{--enable-assertions}, which enables several consistency checks in
111the run-time assemblers. These are not usually needed, so you can
112decide to simply forget about it; also remember that these consistency
113checks tend to slow down your code generator.
114
115After you've configured @lightning{}, run @file{make} as usual.
116
117@lightning{} has an extensive set of tests to validate it is working
118correctly in the build host. To test it run:
119@example
120 make check
121@end example
122
123The next important step is:
124@example
125 make install
126@end example
127
128This ends the process of installing @lightning{}.
129
130@node The instruction set
131@chapter @lightning{}'s instruction set
132
133@lightning{}'s instruction set was designed by deriving instructions
134that closely match those of most existing RISC architectures, or
135that can be easily syntesized if absent. Each instruction is composed
136of:
137@itemize @bullet
138@item
139an operation, like @code{sub} or @code{mul}
140
141@item
142most times, a register/immediate flag (@code{r} or @code{i})
143
144@item
145an unsigned modifier (@code{u}), a type identifier or two, when applicable.
146@end itemize
147
148Examples of legal mnemonics are @code{addr} (integer add, with three
149register operands) and @code{muli} (integer multiply, with two
150register operands and an immediate operand). Each instruction takes
151two or three operands; in most cases, one of them can be an immediate
152value instead of a register.
153
154Most @lightning{} integer operations are signed wordsize operations,
155with the exception of operations that convert types, or load or store
156values to/from memory. When applicable, the types and C types are as
157follow:
158
159@example
160 _c @r{signed char}
161 _uc @r{unsigned char}
162 _s @r{short}
163 _us @r{unsigned short}
164 _i @r{int}
165 _ui @r{unsigned int}
166 _l @r{long}
167 _f @r{float}
168 _d @r{double}
169@end example
170
171Most integer operations do not need a type modifier, and when loading or
172storing values to memory there is an alias to the proper operation
173using wordsize operands, that is, if ommited, the type is @r{int} on
17432-bit architectures and @r{long} on 64-bit architectures. Note
175that lightning also expects @code{sizeof(void*)} to match the wordsize.
176
177When an unsigned operation result differs from the equivalent signed
178operation, there is a the @code{_u} modifier.
179
180There are at least seven integer registers, of which six are
181general-purpose, while the last is used to contain the frame pointer
182(@code{FP}). The frame pointer can be used to allocate and access local
183variables on the stack, using the @code{allocai} or @code{allocar}
184instruction.
185
186Of the general-purpose registers, at least three are guaranteed to be
187preserved across function calls (@code{V0}, @code{V1} and
188@code{V2}) and at least three are not (@code{R0}, @code{R1} and
189@code{R2}). Six registers are not very much, but this
190restriction was forced by the need to target CISC architectures
191which, like the x86, are poor of registers; anyway, backends can
192specify the actual number of available registers with the calls
193@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
194(for callee-save registers).
195
196There are at least six floating-point registers, named @code{F0} to
197@code{F5}. These are usually caller-save and are separate from the integer
198registers on the supported architectures; on Intel architectures,
199in 32 bit mode if SSE2 is not available or use of X87 is forced,
200the register stack is mapped to a flat register file. As for the
201integer registers, the macro @code{JIT_F_NUM} yields the number of
202floating-point registers.
203
204The complete instruction set follows; as you can see, most non-memory
205operations only take integers (either signed or unsigned) as operands;
206this was done in order to reduce the instruction set, and because most
207architectures only provide word and long word operations on registers.
208There are instructions that allow operands to be extended to fit a larger
209data type, both in a signed and in an unsigned way.
210
211@table @b
212@item Binary ALU operations
213These accept three operands; the last one can be an immediate.
214@code{addx} operations must directly follow @code{addc}, and
215@code{subx} must follow @code{subc}; otherwise, results are undefined.
216Most, if not all, architectures do not support @r{float} or @r{double}
217immediate operands; lightning emulates those operations by moving the
218immediate to a temporary register and emiting the call with only
219register operands.
220@example
221addr _f _d O1 = O2 + O3
222addi _f _d O1 = O2 + O3
223addxr O1 = O2 + (O3 + carry)
224addxi O1 = O2 + (O3 + carry)
225addcr O1 = O2 + O3, set carry
226addci O1 = O2 + O3, set carry
227subr _f _d O1 = O2 - O3
228subi _f _d O1 = O2 - O3
229subxr O1 = O2 - (O3 + carry)
230subxi O1 = O2 - (O3 + carry)
231subcr O1 = O2 - O3, set carry
232subci O1 = O2 - O3, set carry
233rsbr _f _d O1 = O3 - O1
234rsbi _f _d O1 = O3 - O1
235mulr _f _d O1 = O2 * O3
236muli _f _d O1 = O2 * O3
237divr _u _f _d O1 = O2 / O3
238divi _u _f _d O1 = O2 / O3
239remr _u O1 = O2 % O3
240remi _u O1 = O2 % O3
241andr O1 = O2 & O3
242andi O1 = O2 & O3
243orr O1 = O2 | O3
244ori O1 = O2 | O3
245xorr O1 = O2 ^ O3
246xori O1 = O2 ^ O3
247lshr O1 = O2 << O3
248lshi O1 = O2 << O3
249rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
250rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
1f22b268
PC
251movzr O1 = O3 ? O1 : O2
252movnr O1 = O3 ? O2 : O1
4a71579b
PC
253@end example
254
255@item Four operand binary ALU operations
256These accept two result registers, and two operands; the last one can
257be an immediate. The first two arguments cannot be the same register.
258
259@code{qmul} stores the low word of the result in @code{O1} and the
260high word in @code{O2}. For unsigned multiplication, @code{O2} zero
261means there was no overflow. For signed multiplication, no overflow
262check is based on sign, and can be detected if @code{O2} is zero or
263minus one.
264
265@code{qdiv} stores the quotient in @code{O1} and the remainder in
266@code{O2}. It can be used as quick way to check if a division is
267exact, in which case the remainder is zero.
268
269@example
270qmulr _u O1 O2 = O3 * O4
271qmuli _u O1 O2 = O3 * O4
272qdivr _u O1 O2 = O3 / O4
273qdivi _u O1 O2 = O3 / O4
274@end example
275
276@item Unary ALU operations
277These accept two operands, both of which must be registers.
278@example
279negr _f _d O1 = -O2
280comr O1 = ~O2
281@end example
282
283These unary ALU operations are only defined for float operands.
284@example
285absr _f _d O1 = fabs(O2)
286sqrtr O1 = sqrt(O2)
287@end example
288
289Besides requiring the @code{r} modifier, there are no unary operations
290with an immediate operand.
291
292@item Compare instructions
293These accept three operands; again, the last can be an immediate.
294The last two operands are compared, and the first operand, that must be
295an integer register, is set to either 0 or 1, according to whether the
296given condition was met or not.
297
298The conditions given below are for the standard behavior of C,
299where the ``unordered'' comparison result is mapped to false.
300
301@example
302ltr _u _f _d O1 = (O2 < O3)
303lti _u _f _d O1 = (O2 < O3)
304ler _u _f _d O1 = (O2 <= O3)
305lei _u _f _d O1 = (O2 <= O3)
306gtr _u _f _d O1 = (O2 > O3)
307gti _u _f _d O1 = (O2 > O3)
308ger _u _f _d O1 = (O2 >= O3)
309gei _u _f _d O1 = (O2 >= O3)
310eqr _f _d O1 = (O2 == O3)
311eqi _f _d O1 = (O2 == O3)
312ner _f _d O1 = (O2 != O3)
313nei _f _d O1 = (O2 != O3)
314unltr _f _d O1 = !(O2 >= O3)
315unler _f _d O1 = !(O2 > O3)
316ungtr _f _d O1 = !(O2 <= O3)
317unger _f _d O1 = !(O2 < O3)
318uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
319ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
320ordr _f _d O1 = (O2 == O2) && (O3 == O3)
321unordr _f _d O1 = (O2 != O2) || (O3 != O3)
322@end example
323
324@item Transfer operations
325These accept two operands; for @code{ext} both of them must be
326registers, while @code{mov} accepts an immediate value as the second
327operand.
328
329Unlike @code{movr} and @code{movi}, the other instructions are used
330to truncate a wordsize operand to a smaller integer data type or to
331convert float data types. You can also use @code{extr} to convert an
332integer to a floating point value: the usual options are @code{extr_f}
333and @code{extr_d}.
334
335@example
336movr _f _d O1 = O2
337movi _f _d O1 = O2
338extr _c _uc _s _us _i _ui _f _d O1 = O2
339truncr _f _d O1 = trunc(O2)
340@end example
341
342In 64-bit architectures it may be required to use @code{truncr_f_i},
343@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
344the equivalent C code. Only the @code{_i} modifier is available in
34532-bit architectures.
346
347@example
348truncr_f_i = <int> O1 = <float> O2
349truncr_f_l = <long>O1 = <float> O2
350truncr_d_i = <int> O1 = <double>O2
351truncr_d_l = <long>O1 = <double>O2
352@end example
353
354The float conversion operations are @emph{destination first,
355source second}, but the order of the types is reversed. This happens
356for historical reasons.
357
358@example
359extr_f_d = <double>O1 = <float> O2
360extr_d_f = <float> O1 = <double>O2
361@end example
362
363@item Network extensions
364These accept two operands, both of which must be registers; these
365two instructions actually perform the same task, yet they are
366assigned to two mnemonics for the sake of convenience and
367completeness. As usual, the first operand is the destination and
368the second is the source.
369The @code{_ul} variant is only available in 64-bit architectures.
370@example
371htonr _us _ui _ul @r{Host-to-network (big endian) order}
372ntohr _us _ui _ul @r{Network-to-host order }
373@end example
374
40a44dcb
PC
375@code{bswapr} can be used to unconditionally byte-swap an operand.
376On little-endian architectures, @code{htonr} and @code{ntohr} resolve
377to this.
378The @code{_ul} variant is only available in 64-bit architectures.
379@example
380bswapr _us _ui _ul 01 = byte_swap(02)
381@end example
382
4a71579b
PC
383@item Load operations
384@code{ld} accepts two operands while @code{ldx} accepts three;
385in both cases, the last can be either a register or an immediate
386value. Values are extended (with or without sign, according to
387the data type specification) to fit a whole register.
388The @code{_ui} and @code{_l} types are only available in 64-bit
389architectures. For convenience, there is a version without a
390type modifier for integer or pointer operands that uses the
391appropriate wordsize call.
392@example
393ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
394ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
395ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
396ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
397@end example
398
399@item Store operations
400@code{st} accepts two operands while @code{stx} accepts three; in
401both cases, the first can be either a register or an immediate
402value. Values are sign-extended to fit a whole register.
403@example
404str _c _uc _s _us _i _ui _l _f _d *O1 = O2
405sti _c _uc _s _us _i _ui _l _f _d *O1 = O2
406stxr _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
407stxi _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
408@end example
409As for the load operations, the @code{_ui} and @code{_l} types are
410only available in 64-bit architectures, and for convenience, there
411is a version without a type modifier for integer or pointer operands
412that uses the appropriate wordsize call.
413
414@item Argument management
415These are:
416@example
417prepare (not specified)
418va_start (not specified)
419pushargr _f _d
420pushargi _f _d
421va_push (not specified)
422arg _f _d
423getarg _c _uc _s _us _i _ui _l _f _d
424va_arg _d
425putargr _f _d
426putargi _f _d
427ret (not specified)
428retr _f _d
429reti _f _d
430va_end (not specified)
431retval _c _uc _s _us _i _ui _l _f _d
432epilog (not specified)
433@end example
434As with other operations that use a type modifier, the @code{_ui} and
435@code{_l} types are only available in 64-bit architectures, but there
436are operations without a type modifier that alias to the appropriate
437integer operation with wordsize operands.
438
439@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
440while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
441A code snippet that wants to call another procedure and has to pass
442arguments must, in order: use the @code{prepare} instruction and use
443the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
444left to right order}; and use @code{finish} or @code{call} (explained below)
445to perform the actual call.
446
447@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
448arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
449@code{va_push} is required when passing a @code{va_list} to another function,
450because not all architectures expect it as a single pointer. Known case
451is DEC Alpha, that requires it as a structure passed by value.
452
453@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
454@code{arg} is different from other instruction in that it does not
455actually generate any code: instead, it is a function which returns
456a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
457a value'' means that @lightning{} code that compile these
458instructions return a value when expanded.} You should call
459@code{arg} as soon as possible, before any function call or, more
460easily, right after the @code{prolog} instructions
461(which is treated later).
462
463@code{getarg} accepts a register argument and a value returned by
464@code{arg}, and will move that argument to the register, extending
465it (with or without sign, according to the data type specification)
466to fit a whole register. These instructions are more intimately
467related to the usage of the @lightning{} instruction set in code
468that generates other code, so they will be treated more
469specifically in @ref{GNU lightning examples, , Generating code at
470run-time}.
471
472@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
473it accepts as first argument a register or immediate, and as
474second argument a value returned by @code{arg}. It allows changing,
475or restoring an argument to the current function, and is a
476construct required to implement tail call optimization. Note that
477arguments in registers are very cheap, but will be overwritten
478at any moment, including on some operations, for example division,
479that on several ports is implemented as a function call.
480
481Finally, the @code{retval} instruction fetches the return value of a
482called function in a register. The @code{retval} instruction takes a
483register argument and copies the return value of the previously called
484function in that register. A function with a return value should use
485@code{retr} or @code{reti} to put the return value in the return register
486before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
487
488@code{epilog} is an optional call, that marks the end of a function
489body. It is automatically generated by @lightning{} if starting a new
490function (what should be done after a @code{ret} call) or finishing
491generating jit.
492It is very important to note that the fact that @code{epilog} being
493optional may cause a common mistake. Consider this:
494@example
495fun1:
496 prolog
497 ...
498 ret
499fun2:
500 prolog
501@end example
502Because @code{epilog} is added when finding a new @code{prolog},
503this will cause the @code{fun2} label to actually be before the
504return from @code{fun1}. Because @lightning{} will actually
505understand it as:
506@example
507fun1:
508 prolog
509 ...
510 ret
511fun2:
512 epilog
513 prolog
514@end example
515
516You should observe a few rules when using these macros. First of
517all, if calling a varargs function, you should use the @code{ellipsis}
518call to mark the position of the ellipsis in the C prototype.
519
520You should not nest calls to @code{prepare} inside a
521@code{prepare/finish} block. Doing this will result in undefined
522behavior. Note that for functions with zero arguments you can use
523just @code{call}.
524
525@item Branch instructions
526Like @code{arg}, these also return a value which, in this case,
527is to be used to compile forward branches as explained in
528@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
529compared; of these, the last can be either a register or an immediate.
530They are:
531@example
532bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
533blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
534bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
535blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
536bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
537bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
538bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
539bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
540beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
541beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
542bner _f _d @r{if }(O2 != O3)@r{ goto }O1
543bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
544
545bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
546bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
547bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
548bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
549buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
550bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
551bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
552bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
553
554bmsr @r{if }O2 & O3@r{ goto }O1
555bmsi @r{if }O2 & O3@r{ goto }O1
556bmcr @r{if }!(O2 & O3)@r{ goto }O1
557bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
558boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
559boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
560bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
561bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
562bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
563bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
564bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
565bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
566@end example
567
568@item Jump and return operations
569These accept one argument except @code{ret} and @code{jmpi} which
570have none; the difference between @code{finishi} and @code{calli}
571is that the latter does not clean the stack from pushed parameters
572(if any) and the former must @strong{always} follow a @code{prepare}
573instruction.
574@example
575callr (not specified) @r{function call to register O1}
576calli (not specified) @r{function call to immediate O1}
577finishr (not specified) @r{function call to register O1}
578finishi (not specified) @r{function call to immediate O1}
579jmpr (not specified) @r{unconditional jump to register}
580jmpi (not specified) @r{unconditional jump}
581ret (not specified) @r{return from subroutine}
582retr _c _uc _s _us _i _ui _l _f _d
583reti _c _uc _s _us _i _ui _l _f _d
584retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
585 @r{to register}
586@end example
587
588Like branch instruction, @code{jmpi} also returns a value which is to
589be used to compile forward branches. @xref{Fibonacci, , Fibonacci
590numbers}.
591
592@item Labels
593There are 3 @lightning{} instructions to create labels:
594@example
595label (not specified) @r{simple label}
596forward (not specified) @r{forward label}
597indirect (not specified) @r{special simple label}
598@end example
599
600@code{label} is normally used as @code{patch_at} argument for backward
601jumps.
602
603@example
604 jit_node_t *jump, *label;
605label = jit_label();
606 ...
607 jump = jit_beqr(JIT_R0, JIT_R1);
608 jit_patch_at(jump, label);
609@end example
610
611@code{forward} is used to patch code generation before the actual
612position of the label is known.
613
614@example
615 jit_node_t *jump, *label;
616label = jit_forward();
617 jump = jit_beqr(JIT_R0, JIT_R1);
618 jit_patch_at(jump, label);
619 ...
620 jit_link(label);
621@end example
622
623@code{indirect} is useful when creating jump tables, and tells
624@lightning{} to not optimize out a label that is not the target of
625any jump, because an indirect jump may land where it is defined.
626
627@example
628 jit_node_t *jump, *label;
629 ...
630 jmpr(JIT_R0); @rem{/* may jump to label */}
631 ...
632label = jit_indirect();
633@end example
634
635@code{indirect} is an special case of @code{note} and @code{name}
636because it is a valid argument to @code{address}.
637
638Note that the usual idiom to write the previous example is
639@example
640 jit_node_t *addr, *jump;
641addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
642 ...
643 jmpr(JIT_R0);
644 ...
645 jit_patch(addr); @rem{/* implicit label added */}
646@end example
647
648that automatically binds the implicit label added by @code{patch} with
649the @code{movi}, but on some special conditions it is required to create
650an "unbound" label.
651
652@item Function prolog
653
654These macros are used to set up a function prolog. The @code{allocai}
655call accept a single integer argument and returns an offset value
656for stack storage access. The @code{allocar} accepts two registers
657arguments, the first is set to the offset for stack access, and the
658second is the size in bytes argument.
659
660@example
661prolog (not specified) @r{function prolog}
662allocai (not specified) @r{reserve space on the stack}
663allocar (not specified) @r{allocate space on the stack}
664@end example
665
666@code{allocai} receives the number of bytes to allocate and returns
667the offset from the frame pointer register @code{FP} to the base of
668the area.
669
670@code{allocar} receives two register arguments. The first is where
671to store the offset from the frame pointer register @code{FP} to the
672base of the area. The second argument is the size in bytes. Note
673that @code{allocar} is dynamic allocation, and special attention
674should be taken when using it. If called in a loop, every iteration
675will allocate stack space. Stack space is aligned from 8 to 64 bytes
676depending on backend requirements, even if allocating only one byte.
677It is advisable to not use it with @code{frame} and @code{tramp}; it
678should work with @code{frame} with special care to call only once,
679but is not supported if used in @code{tramp}, even if called only
680once.
681
682As a small appetizer, here is a small function that adds 1 to the input
683parameter (an @code{int}). I'm using an assembly-like syntax here which
684is a bit different from the one used when writing real subroutines with
685@lightning{}; the real syntax will be introduced in @xref{GNU lightning
686examples, , Generating code at run-time}.
687
688@example
689incr:
690 prolog
691in = arg @rem{! We have an integer argument}
692 getarg R0, in @rem{! Move it to R0}
693 addi R0, R0, 1 @rem{! Add 1}
694 retr R0 @rem{! And return the result}
695@end example
696
697And here is another function which uses the @code{printf} function from
698the standard C library to write a number in hexadecimal notation:
699
700@example
701printhex:
702 prolog
703in = arg @rem{! Same as above}
704 getarg R0, in
705 prepare @rem{! Begin call sequence for printf}
706 pushargi "%x" @rem{! Push format string}
707 ellipsis @rem{! Varargs start here}
708 pushargr R0 @rem{! Push second argument}
709 finishi printf @rem{! Call printf}
710 ret @rem{! Return to caller}
711@end example
712
519a9ea1
PC
713@item Register liveness
714
715During code generation, @lightning{} occasionally needs scratch registers
716or needs to use architecture-defined registers. For that, @lightning{}
717internally maintains register liveness information.
718
719In the following example, @code{qdivr} will need special registers like
720@code{R0} on some architectures. As @lightning{} understands that
721@code{R0} is used in the subsequent instruction, it will create
722save/restore code for @code{R0} in case.
723
724@example
725...
726qdivr V0, V1, V2, V3
727movr V3, R0
728...
729@end example
730
731The same is not true in the example that follows. Here, @code{R0} is
732not alive after the division operation because @code{R0} is neither an
733argument register nor a callee-save register. Thus, no save/restore
734code for @code{R0} will be created in case.
735
736@example
737...
738qdivr V0, V1, V2, V3
739jmpr R1
740...
741@end example
742
743The @code{live} instruction can be used to mark a register as live after
744it as in the following example. Here, @code{R0} will be preserved
745across the division.
746
747@example
748...
749qdivr V0, V1, V2, V3
750live R0
751jmpr R1
752...
753@end example
754
755The @code{live} instruction is useful at code entry and exit points,
756like after and before a @code{callr} instruction.
757
4a71579b
PC
758@item Trampolines, continuations and tail call optimization
759
760Frequently it is required to generate jit code that must jump to
761code generated later, possibly from another @code{jit_context_t}.
762These require compatible stack frames.
763
764@lightning{} provides two primitives from where trampolines,
765continuations and tail call optimization can be implemented.
766
767@example
768frame (not specified) @r{create stack frame}
769tramp (not specified) @r{assume stack frame}
770@end example
771
772@code{frame} receives an integer argument@footnote{It is not
773automatically computed because it does not know about the
774requirement of later generated code.} that defines the size in
775bytes for the stack frame of the current, @code{C} callable,
776jit function. To calculate this value, a good formula is maximum
777number of arguments to any called native function times
778eight@footnote{Times eight so that it works for double arguments.
779And would not need conditionals for ports that pass arguments in
780the stack.}, plus the sum of the arguments to any call to
781@code{jit_allocai}. @lightning{} automatically adjusts this value
782for any backend specific stack memory it may need, or any
783alignment constraint.
784
785@code{frame} also instructs @lightning{} to save all callee
786save registers in the prolog and reload in the epilog.
787
788@example
789main: @rem{! jit entry point}
790 prolog @rem{! function prolog}
791 frame 256 @rem{! save all callee save registers and}
792 @rem{! reserve at least 256 bytes in stack}
793main_loop:
794 ...
795 jmpi handler @rem{! jumps to external code}
796 ...
797 ret @rem{! return to the caller}
798@end example
799
800@code{tramp} differs from @code{frame} only that a prolog and epilog
801will not be generated. Note that @code{prolog} must still be used.
802The code under @code{tramp} must be ready to be entered with a jump
803at the prolog position, and instead of a return, it must end with
804a non conditional jump. @code{tramp} exists solely for the fact
805that it allows optimizing out prolog and epilog code that would
806never be executed.
807
808@example
809handler: @rem{! handler entry point}
810 prolog @rem{! function prolog}
811 tramp 256 @rem{! assumes all callee save registers}
812 @rem{! are saved and there is at least}
813 @rem{! 256 bytes in stack}
814 ...
815 jmpi main_loop @rem{! return to the main loop}
816@end example
817
818@lightning{} only supports Tail Call Optimization using the
819@code{tramp} construct. Any other way is not guaranteed to
820work on all ports.
821
822An example of a simple (recursive) tail call optimization:
823
824@example
825factorial: @rem{! Entry point of the factorial function}
826 prolog
827in = arg @rem{! Receive an integer argument}
828 getarg R0, in @rem{! Move argument to RO}
829 prepare
830 pushargi 1 @rem{! This is the accumulator}
831 pushargr R0 @rem{! This is the argument}
832 finishi fact @rem{! Call the tail call optimized function}
833 retval R0 @rem{! Fetch the result}
834 retr R0 @rem{! Return it}
835 epilog @rem{! Epilog *before* label before prolog}
836
837fact: @rem{! Entry point of the helper function}
838 prolog
839 frame 16 @rem{! Reserve 16 bytes in the stack}
840fact_entry: @rem{! This is the tail call entry point}
841ac = arg @rem{! The accumulator is the first argument}
842in = arg @rem{! The factorial argument}
843 getarg R0, ac @rem{! Move the accumulator to R0}
844 getarg R1, in @rem{! Move the argument to R1}
845 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
846 mulr R0, R0, R1 @rem{! accumulator *= argument}
847 putargr R0, ac @rem{! Update the accumulator}
848 subi R1, R1, 1 @rem{! argument -= 1}
849 putargr R1, in @rem{! Update the argument}
850 jmpi fact_entry @rem{! Tail Call Optimize it!}
851fact_out:
852 retr R0 @rem{! Return the accumulator}
853@end example
854
855@item Predicates
856@example
857forward_p (not specified) @r{forward label predicate}
858indirect_p (not specified) @r{indirect label predicate}
859target_p (not specified) @r{used label predicate}
860arg_register_p (not specified) @r{argument kind predicate}
861callee_save_p (not specified) @r{callee save predicate}
862pointer_p (not specified) @r{pointer predicate}
863@end example
864
865@code{forward_p} expects a @code{jit_node_t*} argument, and
866returns non zero if it is a forward label reference, that is,
867a label returned by @code{forward}, that still needs a
868@code{link} call.
869
870@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
871non zero if it is an indirect label reference, that is, a label that
872was returned by @code{indirect}.
873
874@code{target_p} expects a @code{jit_node_t*} argument, that is any
875kind of label, and will return non zero if there is at least one
876jump or move referencing it.
877
878@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
879have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
880will return non zero if the argument lives in a register. This call
881is useful to know the live range of register arguments, as those
882are very fast to read and write, but have volatile values.
883
884@code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
885@code{JIT_Fn}, and will return non zero if the register is callee
886save. This call is useful because on several ports, the @code{JIT_Rn}
887and @code{JIT_Fn} registers are actually callee save; no need
888to save and load the values when making function calls.
889
890@code{pointer_p} expects a pointer argument, and will return non
891zero if the pointer is inside the generated jit code. Must be
892called after @code{jit_emit} and before @code{jit_destroy_state}.
893@end table
894
895@node GNU lightning examples
896@chapter Generating code at run-time
897
898To use @lightning{}, you should include the @file{lightning.h} file that
899is put in your include directory by the @samp{make install} command.
900
901Each of the instructions above translates to a macro or function call.
902All you have to do is prepend @code{jit_} (lowercase) to opcode names
903and @code{JIT_} (uppercase) to register names. Of course, parameters
904are to be put between parentheses.
905
906This small tutorial presents three examples:
907
908@iftex
909@itemize @bullet
910@item
911The @code{incr} function found in @ref{The instruction set, ,
912@lightning{}'s instruction set}:
913
914@item
915A simple function call to @code{printf}
916
917@item
918An RPN calculator.
919
920@item
921Fibonacci numbers
922@end itemize
923@end iftex
924@ifnottex
925@menu
926* incr:: A function which increments a number by one
927* printf:: A simple function call to printf
928* RPN calculator:: A more complex example, an RPN calculator
929* Fibonacci:: Calculating Fibonacci numbers
930@end menu
931@end ifnottex
932
933@node incr
934@section A function which increments a number by one
935
936Let's see how to create and use the sample @code{incr} function created
937in @ref{The instruction set, , @lightning{}'s instruction set}:
938
939@example
940#include <stdio.h>
941#include <lightning.h>
942
943static jit_state_t *_jit;
944
945typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
946
947int main(int argc, char *argv[])
948@{
949 jit_node_t *in;
950 pifi incr;
951
952 init_jit(argv[0]);
953 _jit = jit_new_state();
954
955 jit_prolog(); @rem{/* @t{ prolog } */}
956 in = jit_arg(); @rem{/* @t{ in = arg } */}
957 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
958 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
959 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
960
961 incr = jit_emit();
962 jit_clear_state();
963
964 @rem{/* call the generated code@comma{} passing 5 as an argument */}
965 printf("%d + 1 = %d\n", 5, incr(5));
966
967 jit_destroy_state();
968 finish_jit();
969 return 0;
970@}
971@end example
972
973Let's examine the code line by line (well, almost@dots{}):
974
975@table @t
976@item #include <lightning.h>
977You already know about this. It defines all of @lightning{}'s macros.
978
979@item static jit_state_t *_jit;
980You might wonder about what is @code{jit_state_t}. It is a structure
981that stores jit code generation information. The name @code{_jit} is
982special, because since multiple jit generators can run at the same
983time, you must either @r{#define _jit my_jit_state} or name it
984@code{_jit}.
985
986@item typedef int (*pifi)(int);
987Just a handy typedef for a pointer to a function that takes an
988@code{int} and returns another.
989
990@item jit_node_t *in;
991Declares a variable to hold an identifier for a function argument. It
992is an opaque pointer, that will hold the return of a call to @code{arg}
993and be used as argument to @code{getarg}.
994
995@item pifi incr;
996Declares a function pointer variable to a function that receives an
997@code{int} and returns an @code{int}.
998
999@item init_jit(argv[0]);
1000You must call this function before creating a @code{jit_state_t}
1001object. This function does global state initialization, and may need
1002to detect CPU or Operating System features. It receives a string
1003argument that is later used to read symbols from a shared object using
1004GNU binutils if disassembly was enabled at configure time. If no
1005disassembly will be performed a NULL pointer can be used as argument.
1006
1007@item _jit = jit_new_state();
1008This call initializes a @lightning{} jit state.
1009
1010@item jit_prolog();
1011Ok, so we start generating code for our beloved function@dots{}
1012
1013@item in = jit_arg();
1014@itemx jit_getarg(JIT_R0, in);
1015We retrieve the first (and only) argument, an integer, and store it
1016into the general-purpose register @code{R0}.
1017
1018@item jit_addi(JIT_R0, JIT_R0, 1);
1019We add one to the content of the register.
1020
1021@item jit_retr(JIT_R0);
1022This instruction generates a standard function epilog that returns
1023the contents of the @code{R0} register.
1024
1025@item incr = jit_emit();
1026This instruction is very important. It actually translates the
1027@lightning{} macros used before to machine code, flushes the generated
1028code area out of the processor's instruction cache and return a
1029pointer to the start of the code.
1030
1031@item jit_clear_state();
1032This call cleanups any data not required for jit execution. Note
1033that it must be called after any call to @code{jit_print} or
1034@code{jit_address}, as this call destroy the @lightning{}
1035intermediate representation.
1036
1037@item printf("%d + 1 = %d", 5, incr(5));
1038Calling our function is this simple---it is not distinguishable from
1039a normal C function call, the only difference being that @code{incr}
1040is a variable.
1041
1042@item jit_destroy_state();
1043Releases all memory associated with the jit context. It should be
1044called after known the jit will no longer be called.
1045
1046@item finish_jit();
1047This call cleanups any global state hold by @lightning{}, and is
1048advisable to call it once jit code will no longer be generated.
1049@end table
1050
1051@lightning{} abstracts two phases of dynamic code generation: selecting
1052instructions that map the standard representation, and emitting binary
1053code for these instructions. The client program has the responsibility
1054of describing the code to be generated using the standard @lightning{}
1055instruction set.
1056
1057Let's examine the code generated for @code{incr} on the SPARC and x86_64
1058architecture (on the right is the code that an assembly-language
1059programmer would write):
1060
1061@table @b
1062@item SPARC
1063@example
1064 save %sp, -112, %sp
1065 mov %i0, %g2 retl
1066 inc %g2 inc %o0
1067 mov %g2, %i0
519a9ea1
PC
1068 restore
1069 retl
1070 nop
4a71579b
PC
1071@end example
1072In this case, @lightning{} introduces overhead to create a register
1073window (not knowing that the procedure is a leaf procedure) and to
1074move the argument to the general purpose register @code{R0} (which
1075maps to @code{%g2} on the SPARC).
1076@end table
1077
1078@table @b
1079@item x86_64
1080@example
1081 sub $0x30,%rsp
1082 mov %rbp,(%rsp)
1083 mov %rsp,%rbp
1084 sub $0x18,%rsp
1085 mov %rdi,%rax mov %rdi, %rax
1086 add $0x1,%rax inc %rax
1087 mov %rbp,%rsp
1088 mov (%rsp),%rbp
1089 add $0x30,%rsp
1090 retq retq
1091@end example
1092In this case, the main overhead is due to the function's prolog and
1093epilog, and stack alignment after reserving stack space for word
1094to/from float conversions or moving data from/to x87 to/from SSE.
1095Note that besides allocating space to save callee saved registers,
1096no registers are saved/restored because @lightning{} notices those
1097registers are not modified. There is currently no logic to detect
1098if it needs to allocate stack space for type conversions neither
1099proper leaf function detection, but these are subject to change
1100(FIXME).
1101@end table
1102
1103@node printf
1104@section A simple function call to @code{printf}
1105
1106Again, here is the code for the example:
1107
1108@example
1109#include <stdio.h>
1110#include <lightning.h>
1111
1112static jit_state_t *_jit;
1113
1114typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1115
1116int main(int argc, char *argv[])
1117@{
1118 pvfi myFunction; @rem{/* ptr to generated code */}
1119 jit_node_t *start, *end; @rem{/* a couple of labels */}
1120 jit_node_t *in; @rem{/* to get the argument */}
1121
1122 init_jit(argv[0]);
1123 _jit = jit_new_state();
1124
1125 start = jit_note(__FILE__, __LINE__);
1126 jit_prolog();
1127 in = jit_arg();
1128 jit_getarg(JIT_R1, in);
1129 jit_prepare();
1130 jit_pushargi((jit_word_t)"generated %d bytes\n");
1131 jit_ellipsis();
1132 jit_pushargr(JIT_R1);
1133 jit_finishi(printf);
1134 jit_ret();
1135 jit_epilog();
1136 end = jit_note(__FILE__, __LINE__);
1137
1138 myFunction = jit_emit();
1139
1140 @rem{/* call the generated code@comma{} passing its size as argument */}
1141 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1142 jit_clear_state();
1143
1144 jit_disassemble();
1145
1146 jit_destroy_state();
1147 finish_jit();
1148 return 0;
1149@}
1150@end example
1151
1152The function shows how many bytes were generated. Most of the code
1153is not very interesting, as it resembles very closely the program
1154presented in @ref{incr, , A function which increments a number by one}.
1155
1156For this reason, we're going to concentrate on just a few statements.
1157
1158@table @t
1159@item start = jit_note(__FILE__, __LINE__);
1160@itemx @r{@dots{}}
1161@itemx end = jit_note(__FILE__, __LINE__);
1162These two instruction call the @code{jit_note} macro, which creates
1163a note in the jit code; arguments to @code{jit_note} usually are a
1164filename string and line number integer, but using NULL for the
1165string argument is perfectly valid if only need to create a simple
1166marker in the code.
1167
1168@item jit_ellipsis();
1169@code{ellipsis} usually is only required if calling varargs functions
1170with double arguments, but it is a good practice to properly describe
1171the @r{@dots{}} in the call sequence.
1172
1173@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1174Note the use of the @code{(jit_word_t)} cast, that is used only
1175to avoid a compiler warning, due to using a pointer where a
1176wordsize integer type was expected.
1177
1178@item jit_prepare();
1179@itemx @r{@dots{}}
1180@itemx jit_finishi(printf);
1181Once the arguments to @code{printf} have been pushed, what means
1182moving them to stack or register arguments, the @code{printf}
1183function is called and the stack cleaned. Note how @lightning{}
1184abstracts the differences between different architectures and
1185ABI's -- the client program does not know how parameter passing
1186works on the host architecture.
1187
1188@item jit_epilog();
1189Usually it is not required to call @code{epilog}, but because it
1190is implicitly called when noticing the end of a function, if the
1191@code{end} variable was set with a @code{note} call after the
1192@code{ret}, it would not consider the function epilog.
1193
1194@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1195This calls the generate jit function passing as argument the offset
1196difference from the @code{start} and @code{end} notes. The @code{address}
1197call must be done after the @code{emit} call or either a fatal error
1198will happen (if @lightning{} is built with assertions enable) or an
1199undefined value will be returned.
1200
1201@item jit_clear_state();
1202Note that @code{jit_clear_state} was called after executing jit in
1203this example. It was done because it must be called after any call
1204to @code{jit_address} or @code{jit_print}.
1205
1206@item jit_disassemble();
1207@code{disassemble} will dump the generated code to standard output,
1208unless @lightning{} was built with the disassembler disabled, in which
1209case no output will be shown.
1210@end table
1211
1212@node RPN calculator
1213@section A more complex example, an RPN calculator
1214
1215We create a small stack-based RPN calculator which applies a series
1216of operators to a given parameter and to other numeric operands.
1217Unlike previous examples, the code generator is fully parameterized
1218and is able to compile different formulas to different functions.
1219Here is the code for the expression compiler; a sample usage will
1220follow.
1221
1222Since @lightning{} does not provide push/pop instruction, this
1223example uses a stack-allocated area to store the data. Such an
1224area can be allocated using the macro @code{allocai}, which
1225receives the number of bytes to allocate and returns the offset
1226from the frame pointer register @code{FP} to the base of the
1227area.
1228
1229Usually, you will use the @code{ldxi} and @code{stxi} instruction
1230to access stack-allocated variables. However, it is possible to
1231use operations such as @code{add} to compute the address of the
1232variables, and pass the address around.
1233
1234@example
1235#include <stdio.h>
1236#include <lightning.h>
1237
1238typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1239
1240static jit_state_t *_jit;
1241
1242void stack_push(int reg, int *sp)
1243@{
1244 jit_stxi_i (*sp, JIT_FP, reg);
1245 *sp += sizeof (int);
1246@}
1247
1248void stack_pop(int reg, int *sp)
1249@{
1250 *sp -= sizeof (int);
1251 jit_ldxi_i (reg, JIT_FP, *sp);
1252@}
1253
1254jit_node_t *compile_rpn(char *expr)
1255@{
1256 jit_node_t *in, *fn;
1257 int stack_base, stack_ptr;
1258
1259 fn = jit_note(NULL, 0);
1260 jit_prolog();
1261 in = jit_arg();
1262 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1263
1264 jit_getarg_i(JIT_R2, in);
1265
1266 while (*expr) @{
1267 char buf[32];
1268 int n;
1269 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1270 expr += n - 1;
1271 stack_push(JIT_R0, &stack_ptr);
1272 jit_movi(JIT_R0, atoi(buf));
1273 @} else if (*expr == 'x') @{
1274 stack_push(JIT_R0, &stack_ptr);
1275 jit_movr(JIT_R0, JIT_R2);
1276 @} else if (*expr == '+') @{
1277 stack_pop(JIT_R1, &stack_ptr);
1278 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1279 @} else if (*expr == '-') @{
1280 stack_pop(JIT_R1, &stack_ptr);
1281 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1282 @} else if (*expr == '*') @{
1283 stack_pop(JIT_R1, &stack_ptr);
1284 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1285 @} else if (*expr == '/') @{
1286 stack_pop(JIT_R1, &stack_ptr);
1287 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1288 @} else @{
1289 fprintf(stderr, "cannot compile: %s\n", expr);
1290 abort();
1291 @}
1292 ++expr;
1293 @}
1294 jit_retr(JIT_R0);
1295 jit_epilog();
1296 return fn;
1297@}
1298@end example
1299
1300The principle on which the calculator is based is easy: the stack top
1301is held in R0, while the remaining items of the stack are held in the
1302memory area that we allocate with @code{allocai}. Compiling a numeric
1303operand or the argument @code{x} pushes the old stack top onto the
1304stack and moves the operand into R0; compiling an operator pops the
1305second operand off the stack into R1, and compiles the operation so
1306that the result goes into R0, thus becoming the new stack top.
1307
1308This example allocates a fixed area for 32 @code{int}s. This is not
1309a problem when the function is a leaf like in this case; in a full-blown
1310compiler you will want to analyze the input and determine the number
1311of needed stack slots---a very simple example of register allocation.
1312The area is then managed like a stack using @code{stack_push} and
1313@code{stack_pop}.
1314
1315Source code for the client (which lies in the same source file) follows:
1316
1317@example
1318int main(int argc, char *argv[])
1319@{
1320 jit_node_t *nc, *nf;
1321 pifi c2f, f2c;
1322 int i;
1323
1324 init_jit(argv[0]);
1325 _jit = jit_new_state();
1326
1327 nc = compile_rpn("32x9*5/+");
1328 nf = compile_rpn("x32-5*9/");
1329 (void)jit_emit();
1330 c2f = (pifi)jit_address(nc);
1331 f2c = (pifi)jit_address(nf);
1332 jit_clear_state();
1333
1334 printf("\nC:");
1335 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1336 printf("\nF:");
1337 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1338 printf("\n");
1339
1340 printf("\nF:");
1341 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1342 printf("\nC:");
1343 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1344 printf("\n");
1345
1346 jit_destroy_state();
1347 finish_jit();
1348 return 0;
1349@}
1350@end example
1351
1352The client displays a conversion table between Celsius and Fahrenheit
1353degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1354formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1355respectively.
1356
1357Providing the formula as an argument to @code{compile_rpn} effectively
1358parameterizes code generation, making it possible to use the same code
1359to compile different functions; this is what makes dynamic code
1360generation so powerful.
1361
1362@node Fibonacci
1363@section Fibonacci numbers
1364
1365The code in this section calculates the Fibonacci sequence. That is
1366modeled by the recurrence relation:
1367@display
1368 f(0) = 0
1369 f(1) = f(2) = 1
1370 f(n) = f(n-1) + f(n-2)
1371@end display
1372
1373The purpose of this example is to introduce branches. There are two
1374kind of branches: backward branches and forward branches. We'll
1375present the calculation in a recursive and iterative form; the
1376former only uses forward branches, while the latter uses both.
1377
1378@example
1379#include <stdio.h>
1380#include <lightning.h>
1381
1382static jit_state_t *_jit;
1383
1384typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1385
1386int main(int argc, char *argv[])
1387@{
1388 pifi fib;
1389 jit_node_t *label;
1390 jit_node_t *call;
1391 jit_node_t *in; @rem{/* offset of the argument */}
1392 jit_node_t *ref; @rem{/* to patch the forward reference */}
1393 jit_node_t *zero; @rem{/* to patch the forward reference */}
1394
1395 init_jit(argv[0]);
1396 _jit = jit_new_state();
1397
1398 label = jit_label();
1399 jit_prolog ();
1400 in = jit_arg ();
1401 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1402 zero = jit_beqi (JIT_R0, 0);
1403 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1404 jit_movi (JIT_R0, 1);
1405 ref = jit_blei (JIT_V0, 2);
1406 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1407 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1408 jit_prepare();
1409 jit_pushargr(JIT_V1);
1410 call = jit_finishi(NULL);
1411 jit_patch_at(call, label);
1412 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1413 jit_prepare();
1414 jit_pushargr(JIT_V2);
1415 call = jit_finishi(NULL);
1416 jit_patch_at(call, label);
1417 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1418 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1419
1420 jit_patch(ref); @rem{/* patch jump */}
1421 jit_patch(zero); @rem{/* patch jump */}
1422 jit_retr(JIT_R0);
1423
1424 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1425 fib = jit_emit();
1426 jit_clear_state();
1427 printf("fib(%d) = %d\n", 32, fib(32));
1428 jit_destroy_state();
1429 finish_jit();
1430 return 0;
1431@}
1432@end example
1433
1434As said above, this is the first example of dynamically compiling
1435branches. Branch instructions have two operands containing the
1436values to be compared, and return a @code{jit_note_t *} object
1437to be patched.
1438
1439Because labels final address are only known after calling @code{emit},
1440it is required to call @code{patch} or @code{patch_at}, what does
1441tell @lightning{} that the target to patch is actually a pointer to
1442a @code{jit_node_t *} object, otherwise, it would assume that is
1443a pointer to a C function. Note that conditional branches do not
1444receive a label argument, so they must be patched.
1445
1446You need to call @code{patch_at} on the return of value @code{calli},
1447@code{finishi}, and @code{calli} if it is actually referencing a label
1448in the jit code. All branch instructions do not receive a label
1449argument. Note that @code{movi} is an special case, and patching it
1450is usually done to get the final address of a label, usually to later
1451call @code{jmpr}.
1452
1453Now, here is the iterative version:
1454
1455@example
1456#include <stdio.h>
1457#include <lightning.h>
1458
1459static jit_state_t *_jit;
1460
1461typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1462
1463int main(int argc, char *argv[])
1464@{
1465 pifi fib;
1466 jit_node_t *in; @rem{/* offset of the argument */}
1467 jit_node_t *ref; @rem{/* to patch the forward reference */}
1468 jit_node_t *zero; @rem{/* to patch the forward reference */}
1469 jit_node_t *jump; @rem{/* jump to start of loop */}
1470 jit_node_t *loop; @rem{/* start of the loop */}
1471
1472 init_jit(argv[0]);
1473 _jit = jit_new_state();
1474
1475 jit_prolog ();
1476 in = jit_arg ();
1477 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1478 zero = jit_beqi (JIT_R0, 0);
1479 jit_movr (JIT_R1, JIT_R0);
1480 jit_movi (JIT_R0, 1);
1481 ref = jit_blti (JIT_R1, 2);
1482 jit_subi (JIT_R2, JIT_R2, 2);
1483 jit_movr (JIT_R1, JIT_R0);
1484
1485 loop= jit_label();
1486 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1487 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1488 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1489 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1490 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1491 jit_patch_at(jump, loop);
1492
1493 jit_patch(ref); @rem{/* patch forward jump */}
1494 jit_patch(zero); @rem{/* patch forward jump */}
1495 jit_retr (JIT_R0);
1496
1497 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1498 fib = jit_emit();
1499 jit_clear_state();
1500 printf("fib(%d) = %d\n", 36, fib(36));
1501 jit_destroy_state();
1502 finish_jit();
1503 return 0;
1504@}
1505@end example
1506
1507This code calculates the recurrence relation using iteration (a
1508@code{for} loop in high-level languages). There are no function
1509calls anymore: instead, there is a backward jump (the @code{bnei} at
1510the end of the loop).
1511
1512Note that the program must remember the address for backward jumps;
1513for forward jumps it is only required to remember the jump code,
1514and call @code{patch} for the implicit label.
1515
1516@node Reentrancy
1517@chapter Re-entrant usage of @lightning{}
1518
1519@lightning{} uses the special @code{_jit} identifier. To be able
1520to be able to use multiple jit generation states at the same
1521time, it is required to used code similar to:
1522
1523@example
1524 struct jit_state lightning;
1525 #define lightning _jit
1526@end example
1527
1528This will cause the symbol defined to @code{_jit} to be passed as
1529the first argument to the underlying @lightning{} implementation,
1530that is usually a function with an @code{_} (underscode) prefix
1531and with an argument named @code{_jit}, in the pattern:
1532
1533@example
1534 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1535 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1536@end example
1537
1538The reason for this is to use the same syntax as the initial lightning
1539implementation and to avoid needing the user to keep adding an extra
1540argument to every call, as multiple jit states generating code in
1541paralell should be very uncommon.
1542
519a9ea1 1543@node Registers
4a71579b
PC
1544@chapter Accessing the whole register file
1545
1546As mentioned earlier in this chapter, all @lightning{} back-ends are
1547guaranteed to have at least six general-purpose integer registers and
1548six floating-point registers, but many back-ends will have more.
1549
1550To access the entire register files, you can use the
1551@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1552accept a parameter that identifies the register number, which
1553must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1554and @code{JIT_F_NUM} respectively; the number need not be
1555constant. Of course, expressions like @code{JIT_R0} and
1556@code{JIT_R(0)} denote the same register, and likewise for
1557integer callee-saved, or floating-point, registers.
1558
519a9ea1
PC
1559@section Scratch registers
1560
1561For operations, @lightning{} does not support directly, like storing
1562a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1563acquire and release a scratch register as in the following pattern:
1564
1565@example
1566 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1567 jit_movi (reg, immediate);
1568 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1569 jit_unget_reg (reg);
1570@end example
1571
1572As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1573reloads but don't follow branches, the code between both must be in
1574the same basic block and must not contain any branches as in the
1575following (bad) example.
1576
1577@example
1578 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1579 jit_ldxi (reg, JIT_V0, offset);
1580 jump = jit_bnei (reg, V0);
1581 jit_movr (JIT_V1, reg);
1582 jit_patch (jump);
1583 jit_unget_reg (reg);
1584@end example
1585
4a71579b
PC
1586@node Customizations
1587@chapter Customizations
1588
1589Frequently it is desirable to have more control over how code is
1590generated or how memory is used during jit generation or execution.
1591
1592@section Memory functions
1593To aid in complete control of memory allocation and deallocation
1594@lightning{} provides wrappers that default to standard @code{malloc},
1595@code{realloc} and @code{free}. These are loosely based on the
1596GNU GMP counterparts, with the difference that they use the same
1597prototype of the system allocation functions, that is, no @code{size}
1598for @code{free} or @code{old_size} for @code{realloc}.
1599
1600@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1601@lightning{} guarantees that memory is only allocated or released
1602using these wrapped functions, but you must note that if lightning
1603was linked to GNU binutils, malloc is probably will be called multiple
1604times from there when initializing the disassembler.
1605
1606Because @code{init_jit} may call memory functions, if you need to call
1607@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1608otherwise, when calling @code{finish_jit}, a pointer allocated with the
1609previous or default wrappers will be passed.
1610@end deftypefun
1611
1612@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1613Get the current memory allocation function. Also, unlike the GNU GMP
1614counterpart, it is an error to pass @code{NULL} pointers as arguments.
1615@end deftypefun
1616
1617@section Alternate code buffer
1618To instruct @lightning{} to use an alternate code buffer it is required
1619to call @code{jit_realize} before @code{jit_emit}, and then query states
1620and customize as appropriate.
1621
1622@deftypefun void jit_realize ()
1623Must be called once, before @code{jit_emit}, to instruct @lightning{}
1624that no other @code{jit_xyz} call will be made.
1625@end deftypefun
1626
1627@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1628Returns NULL or the previous value set with @code{jit_set_code}, and
1629sets the @var{code_size} argument to an appropriate value.
1630If @code{jit_get_code} is called before @code{jit_emit}, the
1631@var{code_size} argument is set to the expected amount of bytes
1632required to generate code.
1633If @code{jit_get_code} is called after @code{jit_emit}, the
1634@var{code_size} argument is set to the exact amount of bytes used
1635by the code.
1636@end deftypefun
1637
1638@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1639Instructs @lightning{} to output to the @var{code} argument and
1640use @var{size} as a guard to not write to invalid memory. If during
1641@code{jit_emit} @lightning{} finds out that the code would not fit
1642in @var{size} bytes, it halts code emit and returns @code{NULL}.
1643@end deftypefun
1644
1645A simple example of a loop using an alternate buffer is:
1646
1647@example
1648 jit_uint8_t *code;
1649 int *(func)(int); @rem{/* function pointer */}
1650 jit_word_t code_size;
1651 jit_word_t real_code_size;
1652 @rem{...}
1653 jit_realize(); @rem{/* ready to generate code */}
1654 jit_get_code(&code_size); @rem{/* get expected code size */}
1655 code_size = (code_size + 4095) & -4096;
1656 do (;;) @{
1657 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1658 MAP_PRIVATE | MAP_ANON, -1, 0);
1659 jit_set_code(code, code_size);
1660 if ((func = jit_emit()) == NULL) @{
1661 munmap(code, code_size);
1662 code_size += 4096;
1663 @}
1664 @} while (func == NULL);
1665 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1666@end example
1667
1668The first call to @code{jit_get_code} should return @code{NULL} and set
1669the @code{code_size} argument to the expected amount of bytes required
1670to emit code.
1671The second call to @code{jit_get_code} is after a successful call to
1672@code{jit_emit}, and will return the value previously set with
1673@code{jit_set_code} and set the @code{real_code_size} argument to the
1674exact amount of bytes used to emit the code.
1675
1676@section Alternate data buffer
1677Sometimes it may be desirable to customize how, or to prevent
1678@lightning{} from using an extra buffer for constants or debug
1679annotation. Usually when also using an alternate code buffer.
1680
1681@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1682Returns @code{NULL} or the previous value set with @code{jit_set_data},
1683and sets the @var{data_size} argument to how many bytes are required
1684for the constants data buffer, and @var{note_size} to how many bytes
1685are required to store the debug note information.
1686Note that it always preallocate one debug note entry even if
1687@code{jit_name} or @code{jit_note} are never called, but will return
1688zero in the @var{data_size} argument if no constant is required;
1689constants are only used for the @code{float} and @code{double} operations
1690that have an immediate argument, and not in all @lightning{} ports.
1691@end deftypefun
1692
1693@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1694
1695@var{data} can be NULL if disabling constants and annotations, otherwise,
1696a valid pointer must be passed. An assertion is done that the data will
1697fit in @var{size} bytes (but that is a noop if @lightning{} was built
1698with @code{-DNDEBUG}).
1699
1700@var{size} tells the space in bytes available in @var{data}.
1701
1702@var{flags} can be zero to tell to just use the alternate data buffer,
1703or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1704
1705@table @t
1706@item JIT_DISABLE_DATA
1707@cindex JIT_DISABLE_DATA
1708Instructs @lightning{} to not use a constant table, but to use an
1709alternate method to synthesize those, usually with a larger code
1710sequence using stack space to transfer the value from a GPR to a
1711FPR register.
1712
1713@item JIT_DISABLE_NOTE
1714@cindex JIT_DISABLE_NOTE
1715Instructs @lightning{} to not store file or function name, and
1716line numbers in the constant buffer.
1717@end table
1718@end deftypefun
1719
1720A simple example of a preventing usage of a data buffer is:
1721
1722@example
1723 @rem{...}
1724 jit_realize(); @rem{/* ready to generate code */}
1725 jit_get_data(NULL, NULL);
1726 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1727 @rem{...}
1728@end example
1729
1730Or to only use a data buffer, if required:
1731
1732@example
1733 jit_uint8_t *data;
1734 jit_word_t data_size;
1735 @rem{...}
1736 jit_realize(); @rem{/* ready to generate code */}
1737 jit_get_data(&data_size, NULL);
1738 if (data_size)
1739 data = malloc(data_size);
1740 else
1741 data = NULL;
1742 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1743 @rem{...}
1744 if (data)
1745 free(data);
1746 @rem{...}
1747@end example
1748
1749@node Acknowledgements
1750@chapter Acknowledgements
1751
1752As far as I know, the first general-purpose portable dynamic code
1753generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1754Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1755unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1756directly inspired @lightning{}.
1757
1758Thanks go to Ian Piumarta, who kindly accepted to release his own
1759program @sc{ccg} under the GNU General Public License, thereby allowing
1760@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1761@sc{ccg} provides a way of dynamically assemble programs written in the
1762underlying architecture's assembly language. So it is not portable,
1763yet very interesting.
1764
1765I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1766was first developed as a tool to be used in GNU Smalltalk's dynamic
1767translator from bytecodes to native code.