Update lightrec 20220910 (#686)
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
519a9ea1 33* Registers:: Accessing the whole register file
4a71579b
PC
34* Customizations:: Advanced code generation customizations
35* Acknowledgements:: Acknowledgements for GNU lightning
36@end menu
37@end ifnottex
38
39@node Overview
40@chapter Introduction to @lightning{}
41
42@iftex
43This document describes @value{TOPIC} the @lightning{} library for
44dynamic code generation.
45@end iftex
46
519a9ea1
PC
47Dynamic code generation is the generation of machine code
48at runtime. It is typically used to strip a layer of interpretation
4a71579b
PC
49by allowing compilation to occur at runtime. One of the most
50well-known applications of dynamic code generation is perhaps that
51of interpreters that compile source code to an intermediate bytecode
52form, which is then recompiled to machine code at run-time: this
53approach effectively combines the portability of bytecode
54representations with the speed of machine code. Another common
55application of dynamic code generation is in the field of hardware
56simulators and binary emulators, which can use the same techniques
519a9ea1 57to translate simulated instructions to the instructions of the
4a71579b
PC
58underlying machine.
59
60Yet other applications come to mind: for example, windowing
61@dfn{bitblt} operations, matrix manipulations, and network packet
62filters. Albeit very powerful and relatively well known within the
63compiler community, dynamic code generation techniques are rarely
64exploited to their full potential and, with the exception of the
65two applications described above, have remained curiosities because
66of their portability and functionality barriers: binary instructions
67are generated, so programs using dynamic code generation must be
68retargeted for each machine; in addition, coding a run-time code
69generator is a tedious and error-prone task more than a difficult one.
70
71@lightning{} provides a portable, fast and easily retargetable dynamic
519a9ea1 72code generation system.
4a71579b
PC
73
74To be portable, @lightning{} abstracts over current architectures'
75quirks and unorthogonalities. The interface that it exposes to is that
76of a standardized RISC architecture loosely based on the SPARC and MIPS
77chips. There are a few general-purpose registers (six, not including
78those used to receive and pass parameters between subroutines), and
79arithmetic operations involve three operands---either three registers
80or two registers and an arbitrarily sized immediate value.
81
82On one hand, this architecture is general enough that it is possible to
83generate pretty efficient code even on CISC architectures such as the
84Intel x86 or the Motorola 68k families. On the other hand, it matches
85real architectures closely enough that, most of the time, the
86compiler's constant folding pass ends up generating code which
87assembles machine instructions without further tests.
88
89@node Installation
90@chapter Configuring and installing @lightning{}
91
40a44dcb
PC
92Here we will assume that your system already has the dependencies
93necessary to build @lightning{}. For more on dependencies, see
94@lightning{}'s @file{README-hacking} file.
95
96The first thing to do to build @lightning{} is to configure the
4a71579b
PC
97program, picking the set of macros to be used on the host
98architecture; this configuration is automatically performed by
99the @file{configure} shell script; to run it, merely type:
100@example
101 ./configure
102@end example
103
104@lightning{} supports the @code{--enable-disassembler} option, that
105enables linking to GNU binutils and optionally print human readable
106disassembly of the jit code. This option can be disabled by the
107@code{--disable-disassembler} option.
108
109Another option that @file{configure} accepts is
110@code{--enable-assertions}, which enables several consistency checks in
111the run-time assemblers. These are not usually needed, so you can
112decide to simply forget about it; also remember that these consistency
113checks tend to slow down your code generator.
114
115After you've configured @lightning{}, run @file{make} as usual.
116
117@lightning{} has an extensive set of tests to validate it is working
118correctly in the build host. To test it run:
119@example
120 make check
121@end example
122
123The next important step is:
124@example
125 make install
126@end example
127
128This ends the process of installing @lightning{}.
129
130@node The instruction set
131@chapter @lightning{}'s instruction set
132
133@lightning{}'s instruction set was designed by deriving instructions
134that closely match those of most existing RISC architectures, or
135that can be easily syntesized if absent. Each instruction is composed
136of:
137@itemize @bullet
138@item
139an operation, like @code{sub} or @code{mul}
140
141@item
142most times, a register/immediate flag (@code{r} or @code{i})
143
144@item
145an unsigned modifier (@code{u}), a type identifier or two, when applicable.
146@end itemize
147
148Examples of legal mnemonics are @code{addr} (integer add, with three
149register operands) and @code{muli} (integer multiply, with two
150register operands and an immediate operand). Each instruction takes
151two or three operands; in most cases, one of them can be an immediate
152value instead of a register.
153
154Most @lightning{} integer operations are signed wordsize operations,
155with the exception of operations that convert types, or load or store
156values to/from memory. When applicable, the types and C types are as
157follow:
158
159@example
160 _c @r{signed char}
161 _uc @r{unsigned char}
162 _s @r{short}
163 _us @r{unsigned short}
164 _i @r{int}
165 _ui @r{unsigned int}
166 _l @r{long}
167 _f @r{float}
168 _d @r{double}
169@end example
170
171Most integer operations do not need a type modifier, and when loading or
172storing values to memory there is an alias to the proper operation
173using wordsize operands, that is, if ommited, the type is @r{int} on
17432-bit architectures and @r{long} on 64-bit architectures. Note
175that lightning also expects @code{sizeof(void*)} to match the wordsize.
176
177When an unsigned operation result differs from the equivalent signed
178operation, there is a the @code{_u} modifier.
179
180There are at least seven integer registers, of which six are
181general-purpose, while the last is used to contain the frame pointer
182(@code{FP}). The frame pointer can be used to allocate and access local
183variables on the stack, using the @code{allocai} or @code{allocar}
184instruction.
185
186Of the general-purpose registers, at least three are guaranteed to be
187preserved across function calls (@code{V0}, @code{V1} and
188@code{V2}) and at least three are not (@code{R0}, @code{R1} and
189@code{R2}). Six registers are not very much, but this
190restriction was forced by the need to target CISC architectures
191which, like the x86, are poor of registers; anyway, backends can
192specify the actual number of available registers with the calls
193@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
194(for callee-save registers).
195
196There are at least six floating-point registers, named @code{F0} to
197@code{F5}. These are usually caller-save and are separate from the integer
198registers on the supported architectures; on Intel architectures,
199in 32 bit mode if SSE2 is not available or use of X87 is forced,
200the register stack is mapped to a flat register file. As for the
201integer registers, the macro @code{JIT_F_NUM} yields the number of
202floating-point registers.
203
204The complete instruction set follows; as you can see, most non-memory
205operations only take integers (either signed or unsigned) as operands;
206this was done in order to reduce the instruction set, and because most
207architectures only provide word and long word operations on registers.
208There are instructions that allow operands to be extended to fit a larger
209data type, both in a signed and in an unsigned way.
210
211@table @b
212@item Binary ALU operations
213These accept three operands; the last one can be an immediate.
214@code{addx} operations must directly follow @code{addc}, and
215@code{subx} must follow @code{subc}; otherwise, results are undefined.
216Most, if not all, architectures do not support @r{float} or @r{double}
217immediate operands; lightning emulates those operations by moving the
218immediate to a temporary register and emiting the call with only
219register operands.
220@example
221addr _f _d O1 = O2 + O3
222addi _f _d O1 = O2 + O3
223addxr O1 = O2 + (O3 + carry)
224addxi O1 = O2 + (O3 + carry)
225addcr O1 = O2 + O3, set carry
226addci O1 = O2 + O3, set carry
227subr _f _d O1 = O2 - O3
228subi _f _d O1 = O2 - O3
229subxr O1 = O2 - (O3 + carry)
230subxi O1 = O2 - (O3 + carry)
231subcr O1 = O2 - O3, set carry
232subci O1 = O2 - O3, set carry
233rsbr _f _d O1 = O3 - O1
234rsbi _f _d O1 = O3 - O1
235mulr _f _d O1 = O2 * O3
236muli _f _d O1 = O2 * O3
237divr _u _f _d O1 = O2 / O3
238divi _u _f _d O1 = O2 / O3
239remr _u O1 = O2 % O3
240remi _u O1 = O2 % O3
241andr O1 = O2 & O3
242andi O1 = O2 & O3
243orr O1 = O2 | O3
244ori O1 = O2 | O3
245xorr O1 = O2 ^ O3
246xori O1 = O2 ^ O3
247lshr O1 = O2 << O3
248lshi O1 = O2 << O3
249rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
250rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
1f22b268
PC
251movzr O1 = O3 ? O1 : O2
252movnr O1 = O3 ? O2 : O1
4a71579b
PC
253@end example
254
255@item Four operand binary ALU operations
256These accept two result registers, and two operands; the last one can
257be an immediate. The first two arguments cannot be the same register.
258
259@code{qmul} stores the low word of the result in @code{O1} and the
260high word in @code{O2}. For unsigned multiplication, @code{O2} zero
261means there was no overflow. For signed multiplication, no overflow
262check is based on sign, and can be detected if @code{O2} is zero or
263minus one.
264
265@code{qdiv} stores the quotient in @code{O1} and the remainder in
266@code{O2}. It can be used as quick way to check if a division is
267exact, in which case the remainder is zero.
268
269@example
270qmulr _u O1 O2 = O3 * O4
271qmuli _u O1 O2 = O3 * O4
272qdivr _u O1 O2 = O3 / O4
273qdivi _u O1 O2 = O3 / O4
274@end example
275
276@item Unary ALU operations
277These accept two operands, both of which must be registers.
278@example
279negr _f _d O1 = -O2
280comr O1 = ~O2
281@end example
282
283These unary ALU operations are only defined for float operands.
284@example
285absr _f _d O1 = fabs(O2)
286sqrtr O1 = sqrt(O2)
287@end example
288
289Besides requiring the @code{r} modifier, there are no unary operations
290with an immediate operand.
291
292@item Compare instructions
293These accept three operands; again, the last can be an immediate.
294The last two operands are compared, and the first operand, that must be
295an integer register, is set to either 0 or 1, according to whether the
296given condition was met or not.
297
298The conditions given below are for the standard behavior of C,
299where the ``unordered'' comparison result is mapped to false.
300
301@example
302ltr _u _f _d O1 = (O2 < O3)
303lti _u _f _d O1 = (O2 < O3)
304ler _u _f _d O1 = (O2 <= O3)
305lei _u _f _d O1 = (O2 <= O3)
306gtr _u _f _d O1 = (O2 > O3)
307gti _u _f _d O1 = (O2 > O3)
308ger _u _f _d O1 = (O2 >= O3)
309gei _u _f _d O1 = (O2 >= O3)
310eqr _f _d O1 = (O2 == O3)
311eqi _f _d O1 = (O2 == O3)
312ner _f _d O1 = (O2 != O3)
313nei _f _d O1 = (O2 != O3)
314unltr _f _d O1 = !(O2 >= O3)
315unler _f _d O1 = !(O2 > O3)
316ungtr _f _d O1 = !(O2 <= O3)
317unger _f _d O1 = !(O2 < O3)
318uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
319ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
320ordr _f _d O1 = (O2 == O2) && (O3 == O3)
321unordr _f _d O1 = (O2 != O2) || (O3 != O3)
322@end example
323
324@item Transfer operations
325These accept two operands; for @code{ext} both of them must be
326registers, while @code{mov} accepts an immediate value as the second
327operand.
328
329Unlike @code{movr} and @code{movi}, the other instructions are used
330to truncate a wordsize operand to a smaller integer data type or to
331convert float data types. You can also use @code{extr} to convert an
332integer to a floating point value: the usual options are @code{extr_f}
333and @code{extr_d}.
334
335@example
336movr _f _d O1 = O2
337movi _f _d O1 = O2
338extr _c _uc _s _us _i _ui _f _d O1 = O2
339truncr _f _d O1 = trunc(O2)
340@end example
341
342In 64-bit architectures it may be required to use @code{truncr_f_i},
343@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
344the equivalent C code. Only the @code{_i} modifier is available in
34532-bit architectures.
346
347@example
348truncr_f_i = <int> O1 = <float> O2
349truncr_f_l = <long>O1 = <float> O2
350truncr_d_i = <int> O1 = <double>O2
351truncr_d_l = <long>O1 = <double>O2
352@end example
353
354The float conversion operations are @emph{destination first,
355source second}, but the order of the types is reversed. This happens
356for historical reasons.
357
358@example
359extr_f_d = <double>O1 = <float> O2
360extr_d_f = <float> O1 = <double>O2
361@end example
362
363@item Network extensions
364These accept two operands, both of which must be registers; these
365two instructions actually perform the same task, yet they are
366assigned to two mnemonics for the sake of convenience and
367completeness. As usual, the first operand is the destination and
368the second is the source.
369The @code{_ul} variant is only available in 64-bit architectures.
370@example
371htonr _us _ui _ul @r{Host-to-network (big endian) order}
372ntohr _us _ui _ul @r{Network-to-host order }
373@end example
374
40a44dcb
PC
375@code{bswapr} can be used to unconditionally byte-swap an operand.
376On little-endian architectures, @code{htonr} and @code{ntohr} resolve
377to this.
378The @code{_ul} variant is only available in 64-bit architectures.
379@example
380bswapr _us _ui _ul 01 = byte_swap(02)
381@end example
382
4a71579b
PC
383@item Load operations
384@code{ld} accepts two operands while @code{ldx} accepts three;
385in both cases, the last can be either a register or an immediate
386value. Values are extended (with or without sign, according to
387the data type specification) to fit a whole register.
388The @code{_ui} and @code{_l} types are only available in 64-bit
389architectures. For convenience, there is a version without a
390type modifier for integer or pointer operands that uses the
391appropriate wordsize call.
392@example
393ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
394ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
395ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
396ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
397@end example
398
399@item Store operations
400@code{st} accepts two operands while @code{stx} accepts three; in
401both cases, the first can be either a register or an immediate
402value. Values are sign-extended to fit a whole register.
403@example
404str _c _uc _s _us _i _ui _l _f _d *O1 = O2
405sti _c _uc _s _us _i _ui _l _f _d *O1 = O2
406stxr _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
407stxi _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
408@end example
409As for the load operations, the @code{_ui} and @code{_l} types are
410only available in 64-bit architectures, and for convenience, there
411is a version without a type modifier for integer or pointer operands
412that uses the appropriate wordsize call.
413
414@item Argument management
415These are:
416@example
417prepare (not specified)
418va_start (not specified)
419pushargr _f _d
420pushargi _f _d
421va_push (not specified)
422arg _f _d
423getarg _c _uc _s _us _i _ui _l _f _d
424va_arg _d
425putargr _f _d
426putargi _f _d
427ret (not specified)
428retr _f _d
429reti _f _d
430va_end (not specified)
431retval _c _uc _s _us _i _ui _l _f _d
432epilog (not specified)
433@end example
434As with other operations that use a type modifier, the @code{_ui} and
435@code{_l} types are only available in 64-bit architectures, but there
436are operations without a type modifier that alias to the appropriate
437integer operation with wordsize operands.
438
439@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
440while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
441A code snippet that wants to call another procedure and has to pass
442arguments must, in order: use the @code{prepare} instruction and use
443the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
444left to right order}; and use @code{finish} or @code{call} (explained below)
445to perform the actual call.
446
447@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
448arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
449@code{va_push} is required when passing a @code{va_list} to another function,
450because not all architectures expect it as a single pointer. Known case
451is DEC Alpha, that requires it as a structure passed by value.
452
453@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
454@code{arg} is different from other instruction in that it does not
455actually generate any code: instead, it is a function which returns
456a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
457a value'' means that @lightning{} code that compile these
458instructions return a value when expanded.} You should call
459@code{arg} as soon as possible, before any function call or, more
460easily, right after the @code{prolog} instructions
461(which is treated later).
462
463@code{getarg} accepts a register argument and a value returned by
464@code{arg}, and will move that argument to the register, extending
465it (with or without sign, according to the data type specification)
466to fit a whole register. These instructions are more intimately
467related to the usage of the @lightning{} instruction set in code
468that generates other code, so they will be treated more
469specifically in @ref{GNU lightning examples, , Generating code at
470run-time}.
471
472@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
473it accepts as first argument a register or immediate, and as
474second argument a value returned by @code{arg}. It allows changing,
475or restoring an argument to the current function, and is a
476construct required to implement tail call optimization. Note that
477arguments in registers are very cheap, but will be overwritten
478at any moment, including on some operations, for example division,
479that on several ports is implemented as a function call.
480
481Finally, the @code{retval} instruction fetches the return value of a
482called function in a register. The @code{retval} instruction takes a
483register argument and copies the return value of the previously called
484function in that register. A function with a return value should use
485@code{retr} or @code{reti} to put the return value in the return register
486before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
487
488@code{epilog} is an optional call, that marks the end of a function
489body. It is automatically generated by @lightning{} if starting a new
490function (what should be done after a @code{ret} call) or finishing
491generating jit.
492It is very important to note that the fact that @code{epilog} being
493optional may cause a common mistake. Consider this:
494@example
495fun1:
496 prolog
497 ...
498 ret
499fun2:
500 prolog
501@end example
502Because @code{epilog} is added when finding a new @code{prolog},
503this will cause the @code{fun2} label to actually be before the
504return from @code{fun1}. Because @lightning{} will actually
505understand it as:
506@example
507fun1:
508 prolog
509 ...
510 ret
511fun2:
512 epilog
513 prolog
514@end example
515
516You should observe a few rules when using these macros. First of
517all, if calling a varargs function, you should use the @code{ellipsis}
518call to mark the position of the ellipsis in the C prototype.
519
520You should not nest calls to @code{prepare} inside a
521@code{prepare/finish} block. Doing this will result in undefined
522behavior. Note that for functions with zero arguments you can use
523just @code{call}.
524
525@item Branch instructions
526Like @code{arg}, these also return a value which, in this case,
527is to be used to compile forward branches as explained in
528@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
529compared; of these, the last can be either a register or an immediate.
530They are:
531@example
532bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
533blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
534bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
535blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
536bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
537bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
538bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
539bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
540beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
541beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
542bner _f _d @r{if }(O2 != O3)@r{ goto }O1
543bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
544
545bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
546bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
547bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
548bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
549buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
550bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
551bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
552bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
553
554bmsr @r{if }O2 & O3@r{ goto }O1
555bmsi @r{if }O2 & O3@r{ goto }O1
556bmcr @r{if }!(O2 & O3)@r{ goto }O1
557bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
558boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
559boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
560bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
561bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
562bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
563bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
564bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
565bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
566@end example
567
568@item Jump and return operations
569These accept one argument except @code{ret} and @code{jmpi} which
570have none; the difference between @code{finishi} and @code{calli}
571is that the latter does not clean the stack from pushed parameters
572(if any) and the former must @strong{always} follow a @code{prepare}
573instruction.
574@example
575callr (not specified) @r{function call to register O1}
576calli (not specified) @r{function call to immediate O1}
577finishr (not specified) @r{function call to register O1}
578finishi (not specified) @r{function call to immediate O1}
579jmpr (not specified) @r{unconditional jump to register}
580jmpi (not specified) @r{unconditional jump}
581ret (not specified) @r{return from subroutine}
582retr _c _uc _s _us _i _ui _l _f _d
583reti _c _uc _s _us _i _ui _l _f _d
584retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
585 @r{to register}
586@end example
587
588Like branch instruction, @code{jmpi} also returns a value which is to
589be used to compile forward branches. @xref{Fibonacci, , Fibonacci
590numbers}.
591
592@item Labels
593There are 3 @lightning{} instructions to create labels:
594@example
595label (not specified) @r{simple label}
596forward (not specified) @r{forward label}
597indirect (not specified) @r{special simple label}
598@end example
599
ba3814c1
PC
600The following instruction is used to specify a minimal alignment for
601the next instruction, usually with a label:
602@example
603align (not specified) @r{align code}
604@end example
605
4a71579b
PC
606@code{label} is normally used as @code{patch_at} argument for backward
607jumps.
608
609@example
610 jit_node_t *jump, *label;
611label = jit_label();
612 ...
613 jump = jit_beqr(JIT_R0, JIT_R1);
614 jit_patch_at(jump, label);
615@end example
616
617@code{forward} is used to patch code generation before the actual
618position of the label is known.
619
620@example
621 jit_node_t *jump, *label;
622label = jit_forward();
623 jump = jit_beqr(JIT_R0, JIT_R1);
624 jit_patch_at(jump, label);
625 ...
626 jit_link(label);
627@end example
628
629@code{indirect} is useful when creating jump tables, and tells
630@lightning{} to not optimize out a label that is not the target of
631any jump, because an indirect jump may land where it is defined.
632
633@example
634 jit_node_t *jump, *label;
635 ...
636 jmpr(JIT_R0); @rem{/* may jump to label */}
637 ...
638label = jit_indirect();
639@end example
640
641@code{indirect} is an special case of @code{note} and @code{name}
642because it is a valid argument to @code{address}.
643
644Note that the usual idiom to write the previous example is
645@example
646 jit_node_t *addr, *jump;
647addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
648 ...
649 jmpr(JIT_R0);
650 ...
651 jit_patch(addr); @rem{/* implicit label added */}
652@end example
653
654that automatically binds the implicit label added by @code{patch} with
655the @code{movi}, but on some special conditions it is required to create
656an "unbound" label.
657
ba3814c1
PC
658@code{align} is useful for creating multiple entry points to a
659(trampoline) function that are all accessible through a single
660function pointer. @code{align} receives an integer argument that
661defines the minimal alignment of the address of a label directly
662following the @code{align} instruction. The integer argument must be
663a power of two and the effective alignment will be a power of two no
664less than the argument to @code{align}. If the argument to
665@code{align} is 16 or more, the effective alignment will match the
666specified minimal alignment exactly.
667
668@example
669 jit_node_t *forward, *label1, *label2, *jump;
670 unsigned char *addr1, *addr2;
671forward = jit_forward();
672 jit_align(16);
673label1 = jit_indirect(); @rem{/* first entry point */}
674jump = jit_jmpi(); @rem{/* jump to first handler */}
675 jit_patch_at(jump, forward);
676 jit_align(16);
677label2 = jit_indirect(); @rem{/* second entry point */}
678 ... @rem{/* second handler */}
679 jit_jmpr(...);
680 jit_link(forward);
681 ... @rem{/* first handler /*}
682 jit_jmpr(...);
683 ...
684 jit_emit();
685 addr1 = jit_address(label1);
686 addr2 = jit_address(label2);
687 assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
688@end example
689
4a71579b
PC
690@item Function prolog
691
692These macros are used to set up a function prolog. The @code{allocai}
693call accept a single integer argument and returns an offset value
694for stack storage access. The @code{allocar} accepts two registers
695arguments, the first is set to the offset for stack access, and the
696second is the size in bytes argument.
697
698@example
699prolog (not specified) @r{function prolog}
700allocai (not specified) @r{reserve space on the stack}
701allocar (not specified) @r{allocate space on the stack}
702@end example
703
704@code{allocai} receives the number of bytes to allocate and returns
705the offset from the frame pointer register @code{FP} to the base of
706the area.
707
708@code{allocar} receives two register arguments. The first is where
709to store the offset from the frame pointer register @code{FP} to the
710base of the area. The second argument is the size in bytes. Note
711that @code{allocar} is dynamic allocation, and special attention
712should be taken when using it. If called in a loop, every iteration
713will allocate stack space. Stack space is aligned from 8 to 64 bytes
714depending on backend requirements, even if allocating only one byte.
715It is advisable to not use it with @code{frame} and @code{tramp}; it
716should work with @code{frame} with special care to call only once,
717but is not supported if used in @code{tramp}, even if called only
718once.
719
720As a small appetizer, here is a small function that adds 1 to the input
721parameter (an @code{int}). I'm using an assembly-like syntax here which
722is a bit different from the one used when writing real subroutines with
723@lightning{}; the real syntax will be introduced in @xref{GNU lightning
724examples, , Generating code at run-time}.
725
726@example
727incr:
728 prolog
729in = arg @rem{! We have an integer argument}
730 getarg R0, in @rem{! Move it to R0}
731 addi R0, R0, 1 @rem{! Add 1}
732 retr R0 @rem{! And return the result}
733@end example
734
735And here is another function which uses the @code{printf} function from
736the standard C library to write a number in hexadecimal notation:
737
738@example
739printhex:
740 prolog
741in = arg @rem{! Same as above}
742 getarg R0, in
743 prepare @rem{! Begin call sequence for printf}
744 pushargi "%x" @rem{! Push format string}
745 ellipsis @rem{! Varargs start here}
746 pushargr R0 @rem{! Push second argument}
747 finishi printf @rem{! Call printf}
748 ret @rem{! Return to caller}
749@end example
750
519a9ea1
PC
751@item Register liveness
752
753During code generation, @lightning{} occasionally needs scratch registers
754or needs to use architecture-defined registers. For that, @lightning{}
755internally maintains register liveness information.
756
757In the following example, @code{qdivr} will need special registers like
758@code{R0} on some architectures. As @lightning{} understands that
759@code{R0} is used in the subsequent instruction, it will create
760save/restore code for @code{R0} in case.
761
762@example
763...
764qdivr V0, V1, V2, V3
765movr V3, R0
766...
767@end example
768
769The same is not true in the example that follows. Here, @code{R0} is
770not alive after the division operation because @code{R0} is neither an
771argument register nor a callee-save register. Thus, no save/restore
772code for @code{R0} will be created in case.
773
774@example
775...
776qdivr V0, V1, V2, V3
777jmpr R1
778...
779@end example
780
781The @code{live} instruction can be used to mark a register as live after
782it as in the following example. Here, @code{R0} will be preserved
783across the division.
784
785@example
786...
787qdivr V0, V1, V2, V3
788live R0
789jmpr R1
790...
791@end example
792
793The @code{live} instruction is useful at code entry and exit points,
794like after and before a @code{callr} instruction.
795
4a71579b
PC
796@item Trampolines, continuations and tail call optimization
797
798Frequently it is required to generate jit code that must jump to
799code generated later, possibly from another @code{jit_context_t}.
800These require compatible stack frames.
801
802@lightning{} provides two primitives from where trampolines,
803continuations and tail call optimization can be implemented.
804
805@example
806frame (not specified) @r{create stack frame}
807tramp (not specified) @r{assume stack frame}
808@end example
809
810@code{frame} receives an integer argument@footnote{It is not
811automatically computed because it does not know about the
812requirement of later generated code.} that defines the size in
813bytes for the stack frame of the current, @code{C} callable,
814jit function. To calculate this value, a good formula is maximum
815number of arguments to any called native function times
816eight@footnote{Times eight so that it works for double arguments.
817And would not need conditionals for ports that pass arguments in
818the stack.}, plus the sum of the arguments to any call to
819@code{jit_allocai}. @lightning{} automatically adjusts this value
820for any backend specific stack memory it may need, or any
821alignment constraint.
822
823@code{frame} also instructs @lightning{} to save all callee
824save registers in the prolog and reload in the epilog.
825
826@example
827main: @rem{! jit entry point}
828 prolog @rem{! function prolog}
829 frame 256 @rem{! save all callee save registers and}
830 @rem{! reserve at least 256 bytes in stack}
831main_loop:
832 ...
833 jmpi handler @rem{! jumps to external code}
834 ...
835 ret @rem{! return to the caller}
836@end example
837
838@code{tramp} differs from @code{frame} only that a prolog and epilog
839will not be generated. Note that @code{prolog} must still be used.
840The code under @code{tramp} must be ready to be entered with a jump
841at the prolog position, and instead of a return, it must end with
842a non conditional jump. @code{tramp} exists solely for the fact
843that it allows optimizing out prolog and epilog code that would
844never be executed.
845
846@example
847handler: @rem{! handler entry point}
848 prolog @rem{! function prolog}
849 tramp 256 @rem{! assumes all callee save registers}
850 @rem{! are saved and there is at least}
851 @rem{! 256 bytes in stack}
852 ...
853 jmpi main_loop @rem{! return to the main loop}
854@end example
855
856@lightning{} only supports Tail Call Optimization using the
857@code{tramp} construct. Any other way is not guaranteed to
858work on all ports.
859
860An example of a simple (recursive) tail call optimization:
861
862@example
863factorial: @rem{! Entry point of the factorial function}
864 prolog
865in = arg @rem{! Receive an integer argument}
866 getarg R0, in @rem{! Move argument to RO}
867 prepare
868 pushargi 1 @rem{! This is the accumulator}
869 pushargr R0 @rem{! This is the argument}
870 finishi fact @rem{! Call the tail call optimized function}
871 retval R0 @rem{! Fetch the result}
872 retr R0 @rem{! Return it}
873 epilog @rem{! Epilog *before* label before prolog}
874
875fact: @rem{! Entry point of the helper function}
876 prolog
877 frame 16 @rem{! Reserve 16 bytes in the stack}
878fact_entry: @rem{! This is the tail call entry point}
879ac = arg @rem{! The accumulator is the first argument}
880in = arg @rem{! The factorial argument}
881 getarg R0, ac @rem{! Move the accumulator to R0}
882 getarg R1, in @rem{! Move the argument to R1}
883 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
884 mulr R0, R0, R1 @rem{! accumulator *= argument}
885 putargr R0, ac @rem{! Update the accumulator}
886 subi R1, R1, 1 @rem{! argument -= 1}
887 putargr R1, in @rem{! Update the argument}
888 jmpi fact_entry @rem{! Tail Call Optimize it!}
889fact_out:
890 retr R0 @rem{! Return the accumulator}
891@end example
892
893@item Predicates
894@example
895forward_p (not specified) @r{forward label predicate}
896indirect_p (not specified) @r{indirect label predicate}
897target_p (not specified) @r{used label predicate}
898arg_register_p (not specified) @r{argument kind predicate}
899callee_save_p (not specified) @r{callee save predicate}
900pointer_p (not specified) @r{pointer predicate}
901@end example
902
903@code{forward_p} expects a @code{jit_node_t*} argument, and
904returns non zero if it is a forward label reference, that is,
905a label returned by @code{forward}, that still needs a
906@code{link} call.
907
908@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
909non zero if it is an indirect label reference, that is, a label that
910was returned by @code{indirect}.
911
912@code{target_p} expects a @code{jit_node_t*} argument, that is any
913kind of label, and will return non zero if there is at least one
914jump or move referencing it.
915
916@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
917have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
918will return non zero if the argument lives in a register. This call
919is useful to know the live range of register arguments, as those
920are very fast to read and write, but have volatile values.
921
922@code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
923@code{JIT_Fn}, and will return non zero if the register is callee
924save. This call is useful because on several ports, the @code{JIT_Rn}
925and @code{JIT_Fn} registers are actually callee save; no need
926to save and load the values when making function calls.
927
928@code{pointer_p} expects a pointer argument, and will return non
929zero if the pointer is inside the generated jit code. Must be
930called after @code{jit_emit} and before @code{jit_destroy_state}.
ba3814c1
PC
931
932@item Atomic operations
933Only compare-and-swap is implemented. It accepts four operands;
934the second can be an immediate.
935
936The first argument is set with a boolean value telling if the operation
937did succeed.
938
939Arguments must be different, cannot use the result register to also pass
940an argument.
941
942The second argument is the address of a machine word.
943
944The third argument is the old value.
945
946The fourth argument is the new value.
947
948@example
949casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
950casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
951@end example
952
953If value at the address in the second argument is equal to the third
954argument, the address value is atomically modified to the value of the
955fourth argument and the first argument is set to a non zero value.
956
957If the value at the address in the second argument is not equal to the
958third argument nothing is done and the first argument is set to zero.
4a71579b
PC
959@end table
960
961@node GNU lightning examples
962@chapter Generating code at run-time
963
964To use @lightning{}, you should include the @file{lightning.h} file that
965is put in your include directory by the @samp{make install} command.
966
967Each of the instructions above translates to a macro or function call.
968All you have to do is prepend @code{jit_} (lowercase) to opcode names
969and @code{JIT_} (uppercase) to register names. Of course, parameters
970are to be put between parentheses.
971
972This small tutorial presents three examples:
973
974@iftex
975@itemize @bullet
976@item
977The @code{incr} function found in @ref{The instruction set, ,
978@lightning{}'s instruction set}:
979
980@item
981A simple function call to @code{printf}
982
983@item
984An RPN calculator.
985
986@item
987Fibonacci numbers
988@end itemize
989@end iftex
990@ifnottex
991@menu
992* incr:: A function which increments a number by one
993* printf:: A simple function call to printf
994* RPN calculator:: A more complex example, an RPN calculator
995* Fibonacci:: Calculating Fibonacci numbers
996@end menu
997@end ifnottex
998
999@node incr
1000@section A function which increments a number by one
1001
1002Let's see how to create and use the sample @code{incr} function created
1003in @ref{The instruction set, , @lightning{}'s instruction set}:
1004
1005@example
1006#include <stdio.h>
1007#include <lightning.h>
1008
1009static jit_state_t *_jit;
1010
1011typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1012
1013int main(int argc, char *argv[])
1014@{
1015 jit_node_t *in;
1016 pifi incr;
1017
1018 init_jit(argv[0]);
1019 _jit = jit_new_state();
1020
1021 jit_prolog(); @rem{/* @t{ prolog } */}
1022 in = jit_arg(); @rem{/* @t{ in = arg } */}
1023 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
1024 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
1025 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
1026
1027 incr = jit_emit();
1028 jit_clear_state();
1029
1030 @rem{/* call the generated code@comma{} passing 5 as an argument */}
1031 printf("%d + 1 = %d\n", 5, incr(5));
1032
1033 jit_destroy_state();
1034 finish_jit();
1035 return 0;
1036@}
1037@end example
1038
1039Let's examine the code line by line (well, almost@dots{}):
1040
1041@table @t
1042@item #include <lightning.h>
1043You already know about this. It defines all of @lightning{}'s macros.
1044
1045@item static jit_state_t *_jit;
1046You might wonder about what is @code{jit_state_t}. It is a structure
1047that stores jit code generation information. The name @code{_jit} is
1048special, because since multiple jit generators can run at the same
1049time, you must either @r{#define _jit my_jit_state} or name it
1050@code{_jit}.
1051
1052@item typedef int (*pifi)(int);
1053Just a handy typedef for a pointer to a function that takes an
1054@code{int} and returns another.
1055
1056@item jit_node_t *in;
1057Declares a variable to hold an identifier for a function argument. It
1058is an opaque pointer, that will hold the return of a call to @code{arg}
1059and be used as argument to @code{getarg}.
1060
1061@item pifi incr;
1062Declares a function pointer variable to a function that receives an
1063@code{int} and returns an @code{int}.
1064
1065@item init_jit(argv[0]);
1066You must call this function before creating a @code{jit_state_t}
1067object. This function does global state initialization, and may need
1068to detect CPU or Operating System features. It receives a string
1069argument that is later used to read symbols from a shared object using
1070GNU binutils if disassembly was enabled at configure time. If no
1071disassembly will be performed a NULL pointer can be used as argument.
1072
1073@item _jit = jit_new_state();
1074This call initializes a @lightning{} jit state.
1075
1076@item jit_prolog();
1077Ok, so we start generating code for our beloved function@dots{}
1078
1079@item in = jit_arg();
1080@itemx jit_getarg(JIT_R0, in);
1081We retrieve the first (and only) argument, an integer, and store it
1082into the general-purpose register @code{R0}.
1083
1084@item jit_addi(JIT_R0, JIT_R0, 1);
1085We add one to the content of the register.
1086
1087@item jit_retr(JIT_R0);
1088This instruction generates a standard function epilog that returns
1089the contents of the @code{R0} register.
1090
1091@item incr = jit_emit();
1092This instruction is very important. It actually translates the
1093@lightning{} macros used before to machine code, flushes the generated
1094code area out of the processor's instruction cache and return a
1095pointer to the start of the code.
1096
1097@item jit_clear_state();
1098This call cleanups any data not required for jit execution. Note
1099that it must be called after any call to @code{jit_print} or
1100@code{jit_address}, as this call destroy the @lightning{}
1101intermediate representation.
1102
1103@item printf("%d + 1 = %d", 5, incr(5));
1104Calling our function is this simple---it is not distinguishable from
1105a normal C function call, the only difference being that @code{incr}
1106is a variable.
1107
1108@item jit_destroy_state();
1109Releases all memory associated with the jit context. It should be
1110called after known the jit will no longer be called.
1111
1112@item finish_jit();
1113This call cleanups any global state hold by @lightning{}, and is
1114advisable to call it once jit code will no longer be generated.
1115@end table
1116
1117@lightning{} abstracts two phases of dynamic code generation: selecting
1118instructions that map the standard representation, and emitting binary
1119code for these instructions. The client program has the responsibility
1120of describing the code to be generated using the standard @lightning{}
1121instruction set.
1122
1123Let's examine the code generated for @code{incr} on the SPARC and x86_64
1124architecture (on the right is the code that an assembly-language
1125programmer would write):
1126
1127@table @b
1128@item SPARC
1129@example
1130 save %sp, -112, %sp
1131 mov %i0, %g2 retl
1132 inc %g2 inc %o0
1133 mov %g2, %i0
519a9ea1
PC
1134 restore
1135 retl
1136 nop
4a71579b
PC
1137@end example
1138In this case, @lightning{} introduces overhead to create a register
1139window (not knowing that the procedure is a leaf procedure) and to
1140move the argument to the general purpose register @code{R0} (which
1141maps to @code{%g2} on the SPARC).
1142@end table
1143
1144@table @b
1145@item x86_64
1146@example
1147 sub $0x30,%rsp
1148 mov %rbp,(%rsp)
1149 mov %rsp,%rbp
1150 sub $0x18,%rsp
1151 mov %rdi,%rax mov %rdi, %rax
1152 add $0x1,%rax inc %rax
1153 mov %rbp,%rsp
1154 mov (%rsp),%rbp
1155 add $0x30,%rsp
1156 retq retq
1157@end example
1158In this case, the main overhead is due to the function's prolog and
1159epilog, and stack alignment after reserving stack space for word
1160to/from float conversions or moving data from/to x87 to/from SSE.
1161Note that besides allocating space to save callee saved registers,
1162no registers are saved/restored because @lightning{} notices those
1163registers are not modified. There is currently no logic to detect
1164if it needs to allocate stack space for type conversions neither
1165proper leaf function detection, but these are subject to change
1166(FIXME).
1167@end table
1168
1169@node printf
1170@section A simple function call to @code{printf}
1171
1172Again, here is the code for the example:
1173
1174@example
1175#include <stdio.h>
1176#include <lightning.h>
1177
1178static jit_state_t *_jit;
1179
1180typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1181
1182int main(int argc, char *argv[])
1183@{
1184 pvfi myFunction; @rem{/* ptr to generated code */}
1185 jit_node_t *start, *end; @rem{/* a couple of labels */}
1186 jit_node_t *in; @rem{/* to get the argument */}
1187
1188 init_jit(argv[0]);
1189 _jit = jit_new_state();
1190
1191 start = jit_note(__FILE__, __LINE__);
1192 jit_prolog();
1193 in = jit_arg();
1194 jit_getarg(JIT_R1, in);
1195 jit_prepare();
1196 jit_pushargi((jit_word_t)"generated %d bytes\n");
1197 jit_ellipsis();
1198 jit_pushargr(JIT_R1);
1199 jit_finishi(printf);
1200 jit_ret();
1201 jit_epilog();
1202 end = jit_note(__FILE__, __LINE__);
1203
1204 myFunction = jit_emit();
1205
1206 @rem{/* call the generated code@comma{} passing its size as argument */}
1207 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1208 jit_clear_state();
1209
1210 jit_disassemble();
1211
1212 jit_destroy_state();
1213 finish_jit();
1214 return 0;
1215@}
1216@end example
1217
1218The function shows how many bytes were generated. Most of the code
1219is not very interesting, as it resembles very closely the program
1220presented in @ref{incr, , A function which increments a number by one}.
1221
1222For this reason, we're going to concentrate on just a few statements.
1223
1224@table @t
1225@item start = jit_note(__FILE__, __LINE__);
1226@itemx @r{@dots{}}
1227@itemx end = jit_note(__FILE__, __LINE__);
1228These two instruction call the @code{jit_note} macro, which creates
1229a note in the jit code; arguments to @code{jit_note} usually are a
1230filename string and line number integer, but using NULL for the
1231string argument is perfectly valid if only need to create a simple
1232marker in the code.
1233
1234@item jit_ellipsis();
1235@code{ellipsis} usually is only required if calling varargs functions
1236with double arguments, but it is a good practice to properly describe
1237the @r{@dots{}} in the call sequence.
1238
1239@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1240Note the use of the @code{(jit_word_t)} cast, that is used only
1241to avoid a compiler warning, due to using a pointer where a
1242wordsize integer type was expected.
1243
1244@item jit_prepare();
1245@itemx @r{@dots{}}
1246@itemx jit_finishi(printf);
1247Once the arguments to @code{printf} have been pushed, what means
1248moving them to stack or register arguments, the @code{printf}
1249function is called and the stack cleaned. Note how @lightning{}
1250abstracts the differences between different architectures and
1251ABI's -- the client program does not know how parameter passing
1252works on the host architecture.
1253
1254@item jit_epilog();
1255Usually it is not required to call @code{epilog}, but because it
1256is implicitly called when noticing the end of a function, if the
1257@code{end} variable was set with a @code{note} call after the
1258@code{ret}, it would not consider the function epilog.
1259
1260@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1261This calls the generate jit function passing as argument the offset
1262difference from the @code{start} and @code{end} notes. The @code{address}
1263call must be done after the @code{emit} call or either a fatal error
1264will happen (if @lightning{} is built with assertions enable) or an
1265undefined value will be returned.
1266
1267@item jit_clear_state();
1268Note that @code{jit_clear_state} was called after executing jit in
1269this example. It was done because it must be called after any call
1270to @code{jit_address} or @code{jit_print}.
1271
1272@item jit_disassemble();
1273@code{disassemble} will dump the generated code to standard output,
1274unless @lightning{} was built with the disassembler disabled, in which
1275case no output will be shown.
1276@end table
1277
1278@node RPN calculator
1279@section A more complex example, an RPN calculator
1280
1281We create a small stack-based RPN calculator which applies a series
1282of operators to a given parameter and to other numeric operands.
1283Unlike previous examples, the code generator is fully parameterized
1284and is able to compile different formulas to different functions.
1285Here is the code for the expression compiler; a sample usage will
1286follow.
1287
1288Since @lightning{} does not provide push/pop instruction, this
1289example uses a stack-allocated area to store the data. Such an
1290area can be allocated using the macro @code{allocai}, which
1291receives the number of bytes to allocate and returns the offset
1292from the frame pointer register @code{FP} to the base of the
1293area.
1294
1295Usually, you will use the @code{ldxi} and @code{stxi} instruction
1296to access stack-allocated variables. However, it is possible to
1297use operations such as @code{add} to compute the address of the
1298variables, and pass the address around.
1299
1300@example
1301#include <stdio.h>
1302#include <lightning.h>
1303
1304typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1305
1306static jit_state_t *_jit;
1307
1308void stack_push(int reg, int *sp)
1309@{
1310 jit_stxi_i (*sp, JIT_FP, reg);
1311 *sp += sizeof (int);
1312@}
1313
1314void stack_pop(int reg, int *sp)
1315@{
1316 *sp -= sizeof (int);
1317 jit_ldxi_i (reg, JIT_FP, *sp);
1318@}
1319
1320jit_node_t *compile_rpn(char *expr)
1321@{
1322 jit_node_t *in, *fn;
1323 int stack_base, stack_ptr;
1324
1325 fn = jit_note(NULL, 0);
1326 jit_prolog();
1327 in = jit_arg();
1328 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1329
1330 jit_getarg_i(JIT_R2, in);
1331
1332 while (*expr) @{
1333 char buf[32];
1334 int n;
1335 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1336 expr += n - 1;
1337 stack_push(JIT_R0, &stack_ptr);
1338 jit_movi(JIT_R0, atoi(buf));
1339 @} else if (*expr == 'x') @{
1340 stack_push(JIT_R0, &stack_ptr);
1341 jit_movr(JIT_R0, JIT_R2);
1342 @} else if (*expr == '+') @{
1343 stack_pop(JIT_R1, &stack_ptr);
1344 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1345 @} else if (*expr == '-') @{
1346 stack_pop(JIT_R1, &stack_ptr);
1347 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1348 @} else if (*expr == '*') @{
1349 stack_pop(JIT_R1, &stack_ptr);
1350 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1351 @} else if (*expr == '/') @{
1352 stack_pop(JIT_R1, &stack_ptr);
1353 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1354 @} else @{
1355 fprintf(stderr, "cannot compile: %s\n", expr);
1356 abort();
1357 @}
1358 ++expr;
1359 @}
1360 jit_retr(JIT_R0);
1361 jit_epilog();
1362 return fn;
1363@}
1364@end example
1365
1366The principle on which the calculator is based is easy: the stack top
1367is held in R0, while the remaining items of the stack are held in the
1368memory area that we allocate with @code{allocai}. Compiling a numeric
1369operand or the argument @code{x} pushes the old stack top onto the
1370stack and moves the operand into R0; compiling an operator pops the
1371second operand off the stack into R1, and compiles the operation so
1372that the result goes into R0, thus becoming the new stack top.
1373
1374This example allocates a fixed area for 32 @code{int}s. This is not
1375a problem when the function is a leaf like in this case; in a full-blown
1376compiler you will want to analyze the input and determine the number
1377of needed stack slots---a very simple example of register allocation.
1378The area is then managed like a stack using @code{stack_push} and
1379@code{stack_pop}.
1380
1381Source code for the client (which lies in the same source file) follows:
1382
1383@example
1384int main(int argc, char *argv[])
1385@{
1386 jit_node_t *nc, *nf;
1387 pifi c2f, f2c;
1388 int i;
1389
1390 init_jit(argv[0]);
1391 _jit = jit_new_state();
1392
1393 nc = compile_rpn("32x9*5/+");
1394 nf = compile_rpn("x32-5*9/");
1395 (void)jit_emit();
1396 c2f = (pifi)jit_address(nc);
1397 f2c = (pifi)jit_address(nf);
1398 jit_clear_state();
1399
1400 printf("\nC:");
1401 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1402 printf("\nF:");
1403 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1404 printf("\n");
1405
1406 printf("\nF:");
1407 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1408 printf("\nC:");
1409 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1410 printf("\n");
1411
1412 jit_destroy_state();
1413 finish_jit();
1414 return 0;
1415@}
1416@end example
1417
1418The client displays a conversion table between Celsius and Fahrenheit
1419degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1420formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1421respectively.
1422
1423Providing the formula as an argument to @code{compile_rpn} effectively
1424parameterizes code generation, making it possible to use the same code
1425to compile different functions; this is what makes dynamic code
1426generation so powerful.
1427
1428@node Fibonacci
1429@section Fibonacci numbers
1430
1431The code in this section calculates the Fibonacci sequence. That is
1432modeled by the recurrence relation:
1433@display
1434 f(0) = 0
1435 f(1) = f(2) = 1
1436 f(n) = f(n-1) + f(n-2)
1437@end display
1438
1439The purpose of this example is to introduce branches. There are two
1440kind of branches: backward branches and forward branches. We'll
1441present the calculation in a recursive and iterative form; the
1442former only uses forward branches, while the latter uses both.
1443
1444@example
1445#include <stdio.h>
1446#include <lightning.h>
1447
1448static jit_state_t *_jit;
1449
1450typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1451
1452int main(int argc, char *argv[])
1453@{
1454 pifi fib;
1455 jit_node_t *label;
1456 jit_node_t *call;
1457 jit_node_t *in; @rem{/* offset of the argument */}
1458 jit_node_t *ref; @rem{/* to patch the forward reference */}
1459 jit_node_t *zero; @rem{/* to patch the forward reference */}
1460
1461 init_jit(argv[0]);
1462 _jit = jit_new_state();
1463
1464 label = jit_label();
1465 jit_prolog ();
1466 in = jit_arg ();
1467 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1468 zero = jit_beqi (JIT_R0, 0);
1469 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1470 jit_movi (JIT_R0, 1);
1471 ref = jit_blei (JIT_V0, 2);
1472 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1473 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1474 jit_prepare();
1475 jit_pushargr(JIT_V1);
1476 call = jit_finishi(NULL);
1477 jit_patch_at(call, label);
1478 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1479 jit_prepare();
1480 jit_pushargr(JIT_V2);
1481 call = jit_finishi(NULL);
1482 jit_patch_at(call, label);
1483 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1484 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1485
1486 jit_patch(ref); @rem{/* patch jump */}
1487 jit_patch(zero); @rem{/* patch jump */}
1488 jit_retr(JIT_R0);
1489
1490 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1491 fib = jit_emit();
1492 jit_clear_state();
1493 printf("fib(%d) = %d\n", 32, fib(32));
1494 jit_destroy_state();
1495 finish_jit();
1496 return 0;
1497@}
1498@end example
1499
1500As said above, this is the first example of dynamically compiling
1501branches. Branch instructions have two operands containing the
1502values to be compared, and return a @code{jit_note_t *} object
1503to be patched.
1504
1505Because labels final address are only known after calling @code{emit},
1506it is required to call @code{patch} or @code{patch_at}, what does
1507tell @lightning{} that the target to patch is actually a pointer to
1508a @code{jit_node_t *} object, otherwise, it would assume that is
1509a pointer to a C function. Note that conditional branches do not
1510receive a label argument, so they must be patched.
1511
1512You need to call @code{patch_at} on the return of value @code{calli},
1513@code{finishi}, and @code{calli} if it is actually referencing a label
1514in the jit code. All branch instructions do not receive a label
1515argument. Note that @code{movi} is an special case, and patching it
1516is usually done to get the final address of a label, usually to later
1517call @code{jmpr}.
1518
1519Now, here is the iterative version:
1520
1521@example
1522#include <stdio.h>
1523#include <lightning.h>
1524
1525static jit_state_t *_jit;
1526
1527typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1528
1529int main(int argc, char *argv[])
1530@{
1531 pifi fib;
1532 jit_node_t *in; @rem{/* offset of the argument */}
1533 jit_node_t *ref; @rem{/* to patch the forward reference */}
1534 jit_node_t *zero; @rem{/* to patch the forward reference */}
1535 jit_node_t *jump; @rem{/* jump to start of loop */}
1536 jit_node_t *loop; @rem{/* start of the loop */}
1537
1538 init_jit(argv[0]);
1539 _jit = jit_new_state();
1540
1541 jit_prolog ();
1542 in = jit_arg ();
1543 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1544 zero = jit_beqi (JIT_R0, 0);
1545 jit_movr (JIT_R1, JIT_R0);
1546 jit_movi (JIT_R0, 1);
1547 ref = jit_blti (JIT_R1, 2);
1548 jit_subi (JIT_R2, JIT_R2, 2);
1549 jit_movr (JIT_R1, JIT_R0);
1550
1551 loop= jit_label();
1552 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1553 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1554 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1555 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1556 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1557 jit_patch_at(jump, loop);
1558
1559 jit_patch(ref); @rem{/* patch forward jump */}
1560 jit_patch(zero); @rem{/* patch forward jump */}
1561 jit_retr (JIT_R0);
1562
1563 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1564 fib = jit_emit();
1565 jit_clear_state();
1566 printf("fib(%d) = %d\n", 36, fib(36));
1567 jit_destroy_state();
1568 finish_jit();
1569 return 0;
1570@}
1571@end example
1572
1573This code calculates the recurrence relation using iteration (a
1574@code{for} loop in high-level languages). There are no function
1575calls anymore: instead, there is a backward jump (the @code{bnei} at
1576the end of the loop).
1577
1578Note that the program must remember the address for backward jumps;
1579for forward jumps it is only required to remember the jump code,
1580and call @code{patch} for the implicit label.
1581
1582@node Reentrancy
1583@chapter Re-entrant usage of @lightning{}
1584
1585@lightning{} uses the special @code{_jit} identifier. To be able
1586to be able to use multiple jit generation states at the same
1587time, it is required to used code similar to:
1588
1589@example
1590 struct jit_state lightning;
1591 #define lightning _jit
1592@end example
1593
1594This will cause the symbol defined to @code{_jit} to be passed as
1595the first argument to the underlying @lightning{} implementation,
1596that is usually a function with an @code{_} (underscode) prefix
1597and with an argument named @code{_jit}, in the pattern:
1598
1599@example
1600 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1601 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1602@end example
1603
1604The reason for this is to use the same syntax as the initial lightning
1605implementation and to avoid needing the user to keep adding an extra
1606argument to every call, as multiple jit states generating code in
1607paralell should be very uncommon.
1608
519a9ea1 1609@node Registers
4a71579b
PC
1610@chapter Accessing the whole register file
1611
1612As mentioned earlier in this chapter, all @lightning{} back-ends are
1613guaranteed to have at least six general-purpose integer registers and
1614six floating-point registers, but many back-ends will have more.
1615
1616To access the entire register files, you can use the
1617@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1618accept a parameter that identifies the register number, which
1619must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1620and @code{JIT_F_NUM} respectively; the number need not be
1621constant. Of course, expressions like @code{JIT_R0} and
1622@code{JIT_R(0)} denote the same register, and likewise for
1623integer callee-saved, or floating-point, registers.
1624
519a9ea1
PC
1625@section Scratch registers
1626
1627For operations, @lightning{} does not support directly, like storing
1628a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1629acquire and release a scratch register as in the following pattern:
1630
1631@example
1632 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1633 jit_movi (reg, immediate);
1634 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1635 jit_unget_reg (reg);
1636@end example
1637
1638As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1639reloads but don't follow branches, the code between both must be in
1640the same basic block and must not contain any branches as in the
1641following (bad) example.
1642
1643@example
1644 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1645 jit_ldxi (reg, JIT_V0, offset);
1646 jump = jit_bnei (reg, V0);
1647 jit_movr (JIT_V1, reg);
1648 jit_patch (jump);
1649 jit_unget_reg (reg);
1650@end example
1651
4a71579b
PC
1652@node Customizations
1653@chapter Customizations
1654
1655Frequently it is desirable to have more control over how code is
1656generated or how memory is used during jit generation or execution.
1657
1658@section Memory functions
1659To aid in complete control of memory allocation and deallocation
1660@lightning{} provides wrappers that default to standard @code{malloc},
1661@code{realloc} and @code{free}. These are loosely based on the
1662GNU GMP counterparts, with the difference that they use the same
1663prototype of the system allocation functions, that is, no @code{size}
1664for @code{free} or @code{old_size} for @code{realloc}.
1665
1666@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1667@lightning{} guarantees that memory is only allocated or released
1668using these wrapped functions, but you must note that if lightning
1669was linked to GNU binutils, malloc is probably will be called multiple
1670times from there when initializing the disassembler.
1671
1672Because @code{init_jit} may call memory functions, if you need to call
1673@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1674otherwise, when calling @code{finish_jit}, a pointer allocated with the
1675previous or default wrappers will be passed.
1676@end deftypefun
1677
1678@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1679Get the current memory allocation function. Also, unlike the GNU GMP
1680counterpart, it is an error to pass @code{NULL} pointers as arguments.
1681@end deftypefun
1682
1683@section Alternate code buffer
1684To instruct @lightning{} to use an alternate code buffer it is required
1685to call @code{jit_realize} before @code{jit_emit}, and then query states
1686and customize as appropriate.
1687
1688@deftypefun void jit_realize ()
1689Must be called once, before @code{jit_emit}, to instruct @lightning{}
1690that no other @code{jit_xyz} call will be made.
1691@end deftypefun
1692
1693@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1694Returns NULL or the previous value set with @code{jit_set_code}, and
1695sets the @var{code_size} argument to an appropriate value.
1696If @code{jit_get_code} is called before @code{jit_emit}, the
1697@var{code_size} argument is set to the expected amount of bytes
1698required to generate code.
1699If @code{jit_get_code} is called after @code{jit_emit}, the
1700@var{code_size} argument is set to the exact amount of bytes used
1701by the code.
1702@end deftypefun
1703
1704@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1705Instructs @lightning{} to output to the @var{code} argument and
1706use @var{size} as a guard to not write to invalid memory. If during
1707@code{jit_emit} @lightning{} finds out that the code would not fit
1708in @var{size} bytes, it halts code emit and returns @code{NULL}.
1709@end deftypefun
1710
1711A simple example of a loop using an alternate buffer is:
1712
1713@example
1714 jit_uint8_t *code;
1715 int *(func)(int); @rem{/* function pointer */}
1716 jit_word_t code_size;
1717 jit_word_t real_code_size;
1718 @rem{...}
1719 jit_realize(); @rem{/* ready to generate code */}
1720 jit_get_code(&code_size); @rem{/* get expected code size */}
1721 code_size = (code_size + 4095) & -4096;
1722 do (;;) @{
1723 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1724 MAP_PRIVATE | MAP_ANON, -1, 0);
1725 jit_set_code(code, code_size);
1726 if ((func = jit_emit()) == NULL) @{
1727 munmap(code, code_size);
1728 code_size += 4096;
1729 @}
1730 @} while (func == NULL);
1731 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1732@end example
1733
1734The first call to @code{jit_get_code} should return @code{NULL} and set
1735the @code{code_size} argument to the expected amount of bytes required
1736to emit code.
1737The second call to @code{jit_get_code} is after a successful call to
1738@code{jit_emit}, and will return the value previously set with
1739@code{jit_set_code} and set the @code{real_code_size} argument to the
1740exact amount of bytes used to emit the code.
1741
1742@section Alternate data buffer
1743Sometimes it may be desirable to customize how, or to prevent
1744@lightning{} from using an extra buffer for constants or debug
1745annotation. Usually when also using an alternate code buffer.
1746
1747@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1748Returns @code{NULL} or the previous value set with @code{jit_set_data},
1749and sets the @var{data_size} argument to how many bytes are required
1750for the constants data buffer, and @var{note_size} to how many bytes
1751are required to store the debug note information.
1752Note that it always preallocate one debug note entry even if
1753@code{jit_name} or @code{jit_note} are never called, but will return
1754zero in the @var{data_size} argument if no constant is required;
1755constants are only used for the @code{float} and @code{double} operations
1756that have an immediate argument, and not in all @lightning{} ports.
1757@end deftypefun
1758
1759@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1760
1761@var{data} can be NULL if disabling constants and annotations, otherwise,
1762a valid pointer must be passed. An assertion is done that the data will
1763fit in @var{size} bytes (but that is a noop if @lightning{} was built
1764with @code{-DNDEBUG}).
1765
1766@var{size} tells the space in bytes available in @var{data}.
1767
1768@var{flags} can be zero to tell to just use the alternate data buffer,
1769or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1770
1771@table @t
1772@item JIT_DISABLE_DATA
1773@cindex JIT_DISABLE_DATA
1774Instructs @lightning{} to not use a constant table, but to use an
1775alternate method to synthesize those, usually with a larger code
1776sequence using stack space to transfer the value from a GPR to a
1777FPR register.
1778
1779@item JIT_DISABLE_NOTE
1780@cindex JIT_DISABLE_NOTE
1781Instructs @lightning{} to not store file or function name, and
1782line numbers in the constant buffer.
1783@end table
1784@end deftypefun
1785
1786A simple example of a preventing usage of a data buffer is:
1787
1788@example
1789 @rem{...}
1790 jit_realize(); @rem{/* ready to generate code */}
1791 jit_get_data(NULL, NULL);
1792 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1793 @rem{...}
1794@end example
1795
1796Or to only use a data buffer, if required:
1797
1798@example
1799 jit_uint8_t *data;
1800 jit_word_t data_size;
1801 @rem{...}
1802 jit_realize(); @rem{/* ready to generate code */}
1803 jit_get_data(&data_size, NULL);
1804 if (data_size)
1805 data = malloc(data_size);
1806 else
1807 data = NULL;
1808 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1809 @rem{...}
1810 if (data)
1811 free(data);
1812 @rem{...}
1813@end example
1814
1815@node Acknowledgements
1816@chapter Acknowledgements
1817
1818As far as I know, the first general-purpose portable dynamic code
1819generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1820Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1821unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1822directly inspired @lightning{}.
1823
1824Thanks go to Ian Piumarta, who kindly accepted to release his own
1825program @sc{ccg} under the GNU General Public License, thereby allowing
1826@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1827@sc{ccg} provides a way of dynamically assemble programs written in the
1828underlying architecture's assembly language. So it is not portable,
1829yet very interesting.
1830
1831I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1832was first developed as a tool to be used in GNU Smalltalk's dynamic
1833translator from bytecodes to native code.