gpu_neon: fix some missing ebuf updates
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
519a9ea1 33* Registers:: Accessing the whole register file
4a71579b
PC
34* Customizations:: Advanced code generation customizations
35* Acknowledgements:: Acknowledgements for GNU lightning
36@end menu
37@end ifnottex
38
39@node Overview
40@chapter Introduction to @lightning{}
41
42@iftex
43This document describes @value{TOPIC} the @lightning{} library for
44dynamic code generation.
45@end iftex
46
519a9ea1
PC
47Dynamic code generation is the generation of machine code
48at runtime. It is typically used to strip a layer of interpretation
4a71579b
PC
49by allowing compilation to occur at runtime. One of the most
50well-known applications of dynamic code generation is perhaps that
51of interpreters that compile source code to an intermediate bytecode
52form, which is then recompiled to machine code at run-time: this
53approach effectively combines the portability of bytecode
54representations with the speed of machine code. Another common
55application of dynamic code generation is in the field of hardware
56simulators and binary emulators, which can use the same techniques
519a9ea1 57to translate simulated instructions to the instructions of the
4a71579b
PC
58underlying machine.
59
60Yet other applications come to mind: for example, windowing
61@dfn{bitblt} operations, matrix manipulations, and network packet
62filters. Albeit very powerful and relatively well known within the
63compiler community, dynamic code generation techniques are rarely
64exploited to their full potential and, with the exception of the
65two applications described above, have remained curiosities because
66of their portability and functionality barriers: binary instructions
67are generated, so programs using dynamic code generation must be
68retargeted for each machine; in addition, coding a run-time code
69generator is a tedious and error-prone task more than a difficult one.
70
71@lightning{} provides a portable, fast and easily retargetable dynamic
519a9ea1 72code generation system.
4a71579b
PC
73
74To be portable, @lightning{} abstracts over current architectures'
75quirks and unorthogonalities. The interface that it exposes to is that
76of a standardized RISC architecture loosely based on the SPARC and MIPS
77chips. There are a few general-purpose registers (six, not including
78those used to receive and pass parameters between subroutines), and
79arithmetic operations involve three operands---either three registers
80or two registers and an arbitrarily sized immediate value.
81
82On one hand, this architecture is general enough that it is possible to
83generate pretty efficient code even on CISC architectures such as the
84Intel x86 or the Motorola 68k families. On the other hand, it matches
85real architectures closely enough that, most of the time, the
86compiler's constant folding pass ends up generating code which
87assembles machine instructions without further tests.
88
89@node Installation
90@chapter Configuring and installing @lightning{}
91
40a44dcb
PC
92Here we will assume that your system already has the dependencies
93necessary to build @lightning{}. For more on dependencies, see
94@lightning{}'s @file{README-hacking} file.
95
96The first thing to do to build @lightning{} is to configure the
4a71579b
PC
97program, picking the set of macros to be used on the host
98architecture; this configuration is automatically performed by
99the @file{configure} shell script; to run it, merely type:
100@example
101 ./configure
102@end example
103
79bfeef6
PC
104The @file{configure} accepts the @code{--enable-disassembler} option,
105hat enables linking to GNU binutils and optionally print human readable
4a71579b
PC
106disassembly of the jit code. This option can be disabled by the
107@code{--disable-disassembler} option.
108
79bfeef6
PC
109@file{configure} also accepts the @code{--enable-devel-disassembler},
110option useful to check exactly hat machine instructions were generated
111for a @lightning{} instrction. Basically mixing @code{jit_print} and
112@code{jit_disassembly}.
113
114The @code{--enable-assertions} option, which enables several consistency
115hecks in the run-time assemblers. These are not usually needed, so you
116can decide to simply forget about it; also remember that these consistency
4a71579b
PC
117checks tend to slow down your code generator.
118
79bfeef6
PC
119The @code{--enable-devel-strong-type-checking} option that does extra type
120checking using @code{assert}. This option also enables the
121@code{--enable-assertions} unless it is explicitly disabled.
122
123The option @code{--enable-devel-get-jit-size} should only be used
124when doing updates or maintenance to lightning. It regenerates the
125@code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
126translating a @lightning{} instruction to machine code.
127
4a71579b
PC
128After you've configured @lightning{}, run @file{make} as usual.
129
130@lightning{} has an extensive set of tests to validate it is working
131correctly in the build host. To test it run:
132@example
133 make check
134@end example
135
136The next important step is:
137@example
138 make install
139@end example
140
141This ends the process of installing @lightning{}.
142
143@node The instruction set
144@chapter @lightning{}'s instruction set
145
146@lightning{}'s instruction set was designed by deriving instructions
147that closely match those of most existing RISC architectures, or
148that can be easily syntesized if absent. Each instruction is composed
149of:
150@itemize @bullet
151@item
152an operation, like @code{sub} or @code{mul}
153
154@item
155most times, a register/immediate flag (@code{r} or @code{i})
156
157@item
158an unsigned modifier (@code{u}), a type identifier or two, when applicable.
159@end itemize
160
161Examples of legal mnemonics are @code{addr} (integer add, with three
162register operands) and @code{muli} (integer multiply, with two
163register operands and an immediate operand). Each instruction takes
164two or three operands; in most cases, one of them can be an immediate
165value instead of a register.
166
167Most @lightning{} integer operations are signed wordsize operations,
168with the exception of operations that convert types, or load or store
169values to/from memory. When applicable, the types and C types are as
170follow:
171
172@example
173 _c @r{signed char}
174 _uc @r{unsigned char}
175 _s @r{short}
176 _us @r{unsigned short}
177 _i @r{int}
178 _ui @r{unsigned int}
179 _l @r{long}
180 _f @r{float}
181 _d @r{double}
182@end example
183
184Most integer operations do not need a type modifier, and when loading or
185storing values to memory there is an alias to the proper operation
186using wordsize operands, that is, if ommited, the type is @r{int} on
18732-bit architectures and @r{long} on 64-bit architectures. Note
188that lightning also expects @code{sizeof(void*)} to match the wordsize.
189
190When an unsigned operation result differs from the equivalent signed
191operation, there is a the @code{_u} modifier.
192
193There are at least seven integer registers, of which six are
194general-purpose, while the last is used to contain the frame pointer
195(@code{FP}). The frame pointer can be used to allocate and access local
196variables on the stack, using the @code{allocai} or @code{allocar}
197instruction.
198
199Of the general-purpose registers, at least three are guaranteed to be
200preserved across function calls (@code{V0}, @code{V1} and
201@code{V2}) and at least three are not (@code{R0}, @code{R1} and
202@code{R2}). Six registers are not very much, but this
203restriction was forced by the need to target CISC architectures
204which, like the x86, are poor of registers; anyway, backends can
205specify the actual number of available registers with the calls
206@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
207(for callee-save registers).
208
209There are at least six floating-point registers, named @code{F0} to
210@code{F5}. These are usually caller-save and are separate from the integer
211registers on the supported architectures; on Intel architectures,
212in 32 bit mode if SSE2 is not available or use of X87 is forced,
213the register stack is mapped to a flat register file. As for the
214integer registers, the macro @code{JIT_F_NUM} yields the number of
215floating-point registers.
216
217The complete instruction set follows; as you can see, most non-memory
218operations only take integers (either signed or unsigned) as operands;
219this was done in order to reduce the instruction set, and because most
220architectures only provide word and long word operations on registers.
221There are instructions that allow operands to be extended to fit a larger
222data type, both in a signed and in an unsigned way.
223
224@table @b
225@item Binary ALU operations
226These accept three operands; the last one can be an immediate.
227@code{addx} operations must directly follow @code{addc}, and
228@code{subx} must follow @code{subc}; otherwise, results are undefined.
229Most, if not all, architectures do not support @r{float} or @r{double}
230immediate operands; lightning emulates those operations by moving the
231immediate to a temporary register and emiting the call with only
232register operands.
233@example
234addr _f _d O1 = O2 + O3
235addi _f _d O1 = O2 + O3
236addxr O1 = O2 + (O3 + carry)
237addxi O1 = O2 + (O3 + carry)
238addcr O1 = O2 + O3, set carry
239addci O1 = O2 + O3, set carry
240subr _f _d O1 = O2 - O3
241subi _f _d O1 = O2 - O3
242subxr O1 = O2 - (O3 + carry)
243subxi O1 = O2 - (O3 + carry)
244subcr O1 = O2 - O3, set carry
245subci O1 = O2 - O3, set carry
246rsbr _f _d O1 = O3 - O1
247rsbi _f _d O1 = O3 - O1
248mulr _f _d O1 = O2 * O3
249muli _f _d O1 = O2 * O3
ba86ff93
PC
250hmulr _u O1 = ((O2 * O3) >> WORDSIZE)
251hmuli _u O1 = ((O2 * O3) >> WORDSIZE)
4a71579b
PC
252divr _u _f _d O1 = O2 / O3
253divi _u _f _d O1 = O2 / O3
254remr _u O1 = O2 % O3
255remi _u O1 = O2 % O3
256andr O1 = O2 & O3
257andi O1 = O2 & O3
258orr O1 = O2 | O3
259ori O1 = O2 | O3
260xorr O1 = O2 ^ O3
261xori O1 = O2 ^ O3
262lshr O1 = O2 << O3
263lshi O1 = O2 << O3
264rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
265rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
ba86ff93
PC
266lrotr O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
267lroti O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3))
268rrotr O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
269rroti O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3))
1f22b268
PC
270movzr O1 = O3 ? O1 : O2
271movnr O1 = O3 ? O2 : O1
4a71579b
PC
272@end example
273
ba86ff93
PC
274Note that @code{lrotr}, @code{lroti}, @code{rrotr} and @code{rroti}
275are described as the fallback operation. These are bit shift/rotation
276operation.
277
4a71579b
PC
278@item Four operand binary ALU operations
279These accept two result registers, and two operands; the last one can
280be an immediate. The first two arguments cannot be the same register.
281
282@code{qmul} stores the low word of the result in @code{O1} and the
283high word in @code{O2}. For unsigned multiplication, @code{O2} zero
284means there was no overflow. For signed multiplication, no overflow
285check is based on sign, and can be detected if @code{O2} is zero or
286minus one.
287
288@code{qdiv} stores the quotient in @code{O1} and the remainder in
289@code{O2}. It can be used as quick way to check if a division is
290exact, in which case the remainder is zero.
291
ba86ff93
PC
292@code{qlsh} shifts from 0 to @emph{wordsize}, doing a normal left
293shift for the first result register and setting the second result
294resister to the overflow bits. @code{qlsh} can be used as a quick
295way to multiply by powers of two.
296
297@code{qrsh} shifts from 0 to @emph{wordsize}, doing a normal right
298shift for the first result register and setting the second result
299register to the overflow bits. @code{qrsh} can be used as a quick
300way to divide by powers of two.
301
302Note that @code{qlsh} and @code{qrsh} are basically implemented as
303two shifts. It is undefined behavior to pass a value not in the range
3040 to @emph{wordsize}. Most cpus will usually @code{and} the shift
305amount with @emph{wordsize} - 1, or possible use the @emph{remainder}.
306@lightning{} only generates code to specially handle 0 and @emph{wordsize}
307shifts. Since in a code generator for a @emph{safe language} should
308usually check the shift amount, these instructions usually should be
309used as a fast path to check for division without remainder or
310multiplication that does not overflow.
311
4a71579b
PC
312@example
313qmulr _u O1 O2 = O3 * O4
314qmuli _u O1 O2 = O3 * O4
315qdivr _u O1 O2 = O3 / O4
316qdivi _u O1 O2 = O3 / O4
ba86ff93
PC
317qlshr _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
318qlshi _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4)
319qrshr _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
320qrshi _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4)
4a71579b
PC
321@end example
322
ba86ff93
PC
323These four operand ALU operations are only defined for float operands.
324
325@example
326fmar _f _d O1 = O2 * O3 + O4
327fmai _f _d O1 = O2 * O3 + O4
328fmsr _f _d O1 = O2 * O3 - O4
329fmsi _f _d O1 = O2 * O3 - O4
330fnmar _f _d O1 = -O2 * O3 - O4
331fnmai _f _d O1 = -O2 * O3 - O4
332fnmsr _f _d O1 = -O2 * O3 + O4
333fnmsi _f _d O1 = -O2 * O3 + O4
334@end example
335
336These are a family of fused multiply-add instructions.
337Note that @lightning{} does not handle rounding modes nor math exceptions.
338Also note that not all backends provide a instruction for the equivalent
339@lightning{} instruction presented above. Some are completely implemented
340as fallbacks and some are composed of one or more instructions. For common
341input this should not cause major issues, but note that when implemented by
342the cpu, these are implemented as the multiplication calculated with infinite
343precision, and after the addition step rounding is done. Due to this, For
344specially crafted input different ports might show different output. When
345implemented by the CPU, it is also possible to have exceptions that do
346not happen if implemented as a fallback.
347
4a71579b 348@item Unary ALU operations
ba86ff93
PC
349These accept two operands, the first must be a register and the
350second is a register if the @code{r} modifier is used, otherwise,
351the @code{i} modifier is used and the second argument is a constant.
352
4a71579b
PC
353@example
354negr _f _d O1 = -O2
ba86ff93 355negi _f _d O1 = -O2
4a71579b 356comr O1 = ~O2
ba86ff93
PC
357comi O1 = ~O2
358clor O1 = number of leading one bits in O2
359cloi O1 = number of leading one bits in O2
360clzr O1 = number of leading zero bits in O2
361clzi O1 = number of leading zero bits in O2
362ctor O1 = number of trailing one bits in O2
363ctoi O1 = number of trailing one bits in O2
364ctzr O1 = number of trailing zero bits in O2
365ctzi O1 = number of trailing zero bits in O2
366rbitr O1 = bits of O2 reversed
367rbiti O1 = bits of O2 reversed
368popcntr O1 = number of bits set in O2
369popcnti O1 = number of bits set in O2
4a71579b
PC
370@end example
371
79bfeef6
PC
372Note that @code{ctzr} is basically equivalent of a @code{C} call
373@code{ffs} but indexed at bit zero, not one.
374
375Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
376value of zero is not an error, it just returns the number of bits
377in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
378it returns 32.
379
380The @code{clor} and @code{ctor} are just counterparts of the versions
381that search for zero bits.
382
4a71579b 383These unary ALU operations are only defined for float operands.
ba86ff93 384
4a71579b
PC
385@example
386absr _f _d O1 = fabs(O2)
ba86ff93 387absi _f _d O1 = fabs(O2)
79bfeef6 388sqrtr _f _d O1 = sqrt(O2)
ba86ff93 389sqrti _f _d O1 = sqrt(O2)
4a71579b
PC
390@end example
391
ba86ff93
PC
392Note that for @code{float} and @code{double} unary operations, @lightning{}
393will generate code to actually execute the operation at runtime.
4a71579b
PC
394
395@item Compare instructions
396These accept three operands; again, the last can be an immediate.
397The last two operands are compared, and the first operand, that must be
398an integer register, is set to either 0 or 1, according to whether the
399given condition was met or not.
400
401The conditions given below are for the standard behavior of C,
402where the ``unordered'' comparison result is mapped to false.
403
404@example
405ltr _u _f _d O1 = (O2 < O3)
406lti _u _f _d O1 = (O2 < O3)
407ler _u _f _d O1 = (O2 <= O3)
408lei _u _f _d O1 = (O2 <= O3)
409gtr _u _f _d O1 = (O2 > O3)
410gti _u _f _d O1 = (O2 > O3)
411ger _u _f _d O1 = (O2 >= O3)
412gei _u _f _d O1 = (O2 >= O3)
413eqr _f _d O1 = (O2 == O3)
414eqi _f _d O1 = (O2 == O3)
415ner _f _d O1 = (O2 != O3)
416nei _f _d O1 = (O2 != O3)
417unltr _f _d O1 = !(O2 >= O3)
418unler _f _d O1 = !(O2 > O3)
419ungtr _f _d O1 = !(O2 <= O3)
420unger _f _d O1 = !(O2 < O3)
421uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
422ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
423ordr _f _d O1 = (O2 == O2) && (O3 == O3)
424unordr _f _d O1 = (O2 != O2) || (O3 != O3)
425@end example
426
427@item Transfer operations
428These accept two operands; for @code{ext} both of them must be
429registers, while @code{mov} accepts an immediate value as the second
430operand.
431
432Unlike @code{movr} and @code{movi}, the other instructions are used
433to truncate a wordsize operand to a smaller integer data type or to
434convert float data types. You can also use @code{extr} to convert an
435integer to a floating point value: the usual options are @code{extr_f}
436and @code{extr_d}.
437
438@example
439movr _f _d O1 = O2
440movi _f _d O1 = O2
441extr _c _uc _s _us _i _ui _f _d O1 = O2
442truncr _f _d O1 = trunc(O2)
ba86ff93
PC
443extr O1 = sign_extend(O2[O3:O3+04])
444extr_u O1 = O2[O3:O3+04]
445depr O1[O3:O3+O4] = O2
446@end example
447
448@code{extr}, @code{extr_u} and @code{depr} are useful to access @code{C}
449compatible bit fields, provided that these are contained in a machine
450word. @code{extr} is used to @emph{extract} and signed extend a value
451from a bit field. @code{extr_u} is used to @emph{extract} and zero
452extend a value from a bit field. @code{depr} is used to @emph{deposit}
453a value into a bit field.
454
455@example
456extr(result, source, offset, length)
457extr_u(result, source, offset, length)
458depr(result, source, offset, length)
459@end example
460
461A common way to declare @code{C} and @lightning{} compatible bit fields is:
462@example
463union @{
464 struct @{
465 jit_word_t signed_bits: @code{length};
466 jit_uword_t unsigned_bits: @code{length};
467 ...
468 @} s;
469 jit_word_t signed_value;
470 jit_uword_t unsigned_value;
471@} u;
4a71579b
PC
472@end example
473
474In 64-bit architectures it may be required to use @code{truncr_f_i},
475@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
476the equivalent C code. Only the @code{_i} modifier is available in
47732-bit architectures.
478
479@example
ba86ff93
PC
480truncr_f_i <int> O1 = <float> O2
481truncr_f_l <long>O1 = <float> O2
482truncr_d_i <int> O1 = <double>O2
483truncr_d_l <long>O1 = <double>O2
4a71579b
PC
484@end example
485
486The float conversion operations are @emph{destination first,
487source second}, but the order of the types is reversed. This happens
488for historical reasons.
489
490@example
ba86ff93
PC
491extr_f_d <double>O1 = <float> O2
492extr_d_f <float> O1 = <double>O2
493@end example
494
495The float to/from integer transfer operations are also @emph{destination
496first, source second}. These were added later, but follow the pattern
497of historic patterns.
498
499@example
500movr_w_f <float>O1 = <int>O2
501movi_w_f <float>O1 = <int>O2
502movr_f_w <int>O1 = <float>O2
503movi_f_w <int>O1 = <float>O2
504movr_w_d <double>O1 = <long>O2
505movi_w_d <double>O1 = <long>O2
506movr_d_w <long>O1 = <double>O2
507movi_d_w <long>O1 = <double>O2
508movr_ww_d <double>O1 = [<int>O2:<int>O3]
509movi_ww_d <double>O1 = [<int>O2:<int>O3]
510movr_d_ww [<int>O1:<int>O2] = <double>O3
511movi_d_ww [<int>O1:<int>O2] = <double>O3
4a71579b
PC
512@end example
513
ba86ff93
PC
514These are used to transfer bits to/from floats to/from integers, and are
515useful to access bits of floating point values.
516
517@code{movr_w_d}, @code{movi_w_d}, @code{movr_d_w} and @code{movi_d_w} are
518only available in 64-bit. Conversely, @code{movr_ww_d}, @code{movi_ww_d},
519@code{movr_d_ww} and @code{movi_d_ww} are only available in 32-bit.
520For the int pair to/from double transfers, integer arguments must respect
521endianess, to match how the cpu handles the verbatim byte values.
522
4a71579b
PC
523@item Network extensions
524These accept two operands, both of which must be registers; these
525two instructions actually perform the same task, yet they are
526assigned to two mnemonics for the sake of convenience and
527completeness. As usual, the first operand is the destination and
528the second is the source.
529The @code{_ul} variant is only available in 64-bit architectures.
530@example
531htonr _us _ui _ul @r{Host-to-network (big endian) order}
532ntohr _us _ui _ul @r{Network-to-host order }
533@end example
534
40a44dcb
PC
535@code{bswapr} can be used to unconditionally byte-swap an operand.
536On little-endian architectures, @code{htonr} and @code{ntohr} resolve
537to this.
538The @code{_ul} variant is only available in 64-bit architectures.
539@example
540bswapr _us _ui _ul 01 = byte_swap(02)
541@end example
542
4a71579b
PC
543@item Load operations
544@code{ld} accepts two operands while @code{ldx} accepts three;
545in both cases, the last can be either a register or an immediate
546value. Values are extended (with or without sign, according to
547the data type specification) to fit a whole register.
548The @code{_ui} and @code{_l} types are only available in 64-bit
549architectures. For convenience, there is a version without a
550type modifier for integer or pointer operands that uses the
551appropriate wordsize call.
552@example
553ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
554ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
555ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
556ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
557@end example
558
559@item Store operations
560@code{st} accepts two operands while @code{stx} accepts three; in
561both cases, the first can be either a register or an immediate
562value. Values are sign-extended to fit a whole register.
563@example
79bfeef6
PC
564str _c _s _i _l _f _d *O1 = O2
565sti _c _s _i _l _f _d *O1 = O2
566stxr _c _s _i _l _f _d *(O1+O2) = O3
567stxi _c _s _i _l _f _d *(O1+O2) = O3
4a71579b 568@end example
79bfeef6
PC
569Note that the unsigned type modifier is not available, as the store
570only writes to the 1, 2, 4 or 8 sized memory address.
571The @code{_l} type is only available in 64-bit architectures, and for
572convenience, there is a version without a type modifier for integer or
573pointer operands that uses the appropriate wordsize call.
4a71579b 574
ba86ff93
PC
575@item Unaligned memory access
576These allow access to integers of size 3, in 32-bit, and extra sizes
5775, 6 and 7 in 64-bit.
578For floating point values only support for size 4 and 8 is provided.
579@example
580unldr O1 = *(signed O3 byte integer)* = O2
581unldi O1 = *(signed O3 byte integer)* = O2
582unldr_u O1 = *(unsigned O3 byte integer)* = O2
583unldi_u O1 = *(unsigned O3 byte integer)* = O2
584unldr_x O1 = *(O3 byte float)* = O2
585unldi_x O1 = *(O3 byte float)* = O2
586unstr *(O3 byte integer)O1 = O2
587unsti *(O3 byte integer)O1 = O2
588unstr_x *(O3 byte float)O1 = O2
589unsti_x *(O3 byte float)O1 = O2
590@end example
591With the exception of non standard sized integers, these might be
592implemented as normal loads and stores, if the processor supports
593unaligned memory access, or, mode can be chosen at jit initialization
594time, to generate or not generate, code that does trap on unaligned
595memory access. Letting the kernel trap means smaller code generation
596as it is required to check alignment at runtime@footnote{This requires changing jit_cpu.unaligned to 0 to disable or 1 to enable unaligned code generation. Not all ports have the C jit_cpu.unaligned value.}.
597
4a71579b
PC
598@item Argument management
599These are:
600@example
601prepare (not specified)
602va_start (not specified)
79bfeef6
PC
603pushargr _c _uc _s _us _i _ui _l _f _d
604pushargi _c _uc _s _us _i _ui _l _f _d
4a71579b 605va_push (not specified)
79bfeef6 606arg _c _uc _s _us _i _ui _l _f _d
4a71579b
PC
607getarg _c _uc _s _us _i _ui _l _f _d
608va_arg _d
79bfeef6
PC
609putargr _c _uc _s _us _i _ui _l _f _d
610putargi _c _uc _s _us _i _ui _l _f _d
4a71579b 611ret (not specified)
79bfeef6
PC
612retr _c _uc _s _us _i _ui _l _f _d
613reti _c _uc _s _us _i _ui _l _f _d
4a71579b
PC
614reti _f _d
615va_end (not specified)
616retval _c _uc _s _us _i _ui _l _f _d
617epilog (not specified)
618@end example
619As with other operations that use a type modifier, the @code{_ui} and
620@code{_l} types are only available in 64-bit architectures, but there
621are operations without a type modifier that alias to the appropriate
622integer operation with wordsize operands.
623
624@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
625while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
626A code snippet that wants to call another procedure and has to pass
627arguments must, in order: use the @code{prepare} instruction and use
628the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
629left to right order}; and use @code{finish} or @code{call} (explained below)
630to perform the actual call.
631
79bfeef6
PC
632Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
633handling integer types can be used without a type modifier.
634It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
635and @code{getarg} otherwise problems will happen if generating jit for
636environments that require arguments to be truncated and zero or sign
637extended by the caller and/or excess arguments might be passed packed
638in the stack. Currently only Apple systems with @code{aarch64} cpus are
639known to have this restriction.
640
4a71579b
PC
641@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
642arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
643@code{va_push} is required when passing a @code{va_list} to another function,
644because not all architectures expect it as a single pointer. Known case
645is DEC Alpha, that requires it as a structure passed by value.
646
647@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
648@code{arg} is different from other instruction in that it does not
649actually generate any code: instead, it is a function which returns
650a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
651a value'' means that @lightning{} code that compile these
652instructions return a value when expanded.} You should call
653@code{arg} as soon as possible, before any function call or, more
654easily, right after the @code{prolog} instructions
655(which is treated later).
656
657@code{getarg} accepts a register argument and a value returned by
658@code{arg}, and will move that argument to the register, extending
659it (with or without sign, according to the data type specification)
660to fit a whole register. These instructions are more intimately
661related to the usage of the @lightning{} instruction set in code
662that generates other code, so they will be treated more
663specifically in @ref{GNU lightning examples, , Generating code at
664run-time}.
665
666@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
667it accepts as first argument a register or immediate, and as
668second argument a value returned by @code{arg}. It allows changing,
669or restoring an argument to the current function, and is a
670construct required to implement tail call optimization. Note that
671arguments in registers are very cheap, but will be overwritten
672at any moment, including on some operations, for example division,
673that on several ports is implemented as a function call.
674
675Finally, the @code{retval} instruction fetches the return value of a
676called function in a register. The @code{retval} instruction takes a
677register argument and copies the return value of the previously called
678function in that register. A function with a return value should use
679@code{retr} or @code{reti} to put the return value in the return register
680before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
681
682@code{epilog} is an optional call, that marks the end of a function
683body. It is automatically generated by @lightning{} if starting a new
684function (what should be done after a @code{ret} call) or finishing
685generating jit.
686It is very important to note that the fact that @code{epilog} being
687optional may cause a common mistake. Consider this:
688@example
689fun1:
690 prolog
691 ...
692 ret
693fun2:
694 prolog
695@end example
696Because @code{epilog} is added when finding a new @code{prolog},
697this will cause the @code{fun2} label to actually be before the
698return from @code{fun1}. Because @lightning{} will actually
699understand it as:
700@example
701fun1:
702 prolog
703 ...
704 ret
705fun2:
706 epilog
707 prolog
708@end example
709
710You should observe a few rules when using these macros. First of
711all, if calling a varargs function, you should use the @code{ellipsis}
712call to mark the position of the ellipsis in the C prototype.
713
714You should not nest calls to @code{prepare} inside a
715@code{prepare/finish} block. Doing this will result in undefined
716behavior. Note that for functions with zero arguments you can use
717just @code{call}.
718
719@item Branch instructions
720Like @code{arg}, these also return a value which, in this case,
721is to be used to compile forward branches as explained in
722@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
723compared; of these, the last can be either a register or an immediate.
724They are:
725@example
726bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
727blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
728bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
729blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
730bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
731bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
732bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
733bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
734beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
735beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
736bner _f _d @r{if }(O2 != O3)@r{ goto }O1
737bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
738
739bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
740bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
741bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
742bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
743buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
744bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
745bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
746bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
747
748bmsr @r{if }O2 & O3@r{ goto }O1
749bmsi @r{if }O2 & O3@r{ goto }O1
750bmcr @r{if }!(O2 & O3)@r{ goto }O1
751bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
752boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
753boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
754bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
755bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
756bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
757bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
758bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
759bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
760@end example
761
79bfeef6
PC
762Note that the @code{C} code does not have an @code{O1} argument. It is
763required to always use the return value as an argument to @code{patch},
764@code{patch_at} or @code{patch_abs}.
765
4a71579b
PC
766@item Jump and return operations
767These accept one argument except @code{ret} and @code{jmpi} which
768have none; the difference between @code{finishi} and @code{calli}
769is that the latter does not clean the stack from pushed parameters
770(if any) and the former must @strong{always} follow a @code{prepare}
771instruction.
772@example
773callr (not specified) @r{function call to register O1}
774calli (not specified) @r{function call to immediate O1}
775finishr (not specified) @r{function call to register O1}
776finishi (not specified) @r{function call to immediate O1}
777jmpr (not specified) @r{unconditional jump to register}
778jmpi (not specified) @r{unconditional jump}
779ret (not specified) @r{return from subroutine}
780retr _c _uc _s _us _i _ui _l _f _d
781reti _c _uc _s _us _i _ui _l _f _d
782retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
783 @r{to register}
784@end example
785
786Like branch instruction, @code{jmpi} also returns a value which is to
787be used to compile forward branches. @xref{Fibonacci, , Fibonacci
788numbers}.
789
790@item Labels
791There are 3 @lightning{} instructions to create labels:
792@example
793label (not specified) @r{simple label}
794forward (not specified) @r{forward label}
795indirect (not specified) @r{special simple label}
796@end example
797
ba3814c1
PC
798The following instruction is used to specify a minimal alignment for
799the next instruction, usually with a label:
800@example
801align (not specified) @r{align code}
802@end example
803
79bfeef6
PC
804Similar to @code{align} is the next instruction, also usually used with
805a label:
806@example
807skip (not specified) @r{skip code}
808@end example
809It is used to specify a minimal number of bytes of nops to be inserted
810before the next instruction.
811
4a71579b
PC
812@code{label} is normally used as @code{patch_at} argument for backward
813jumps.
814
815@example
816 jit_node_t *jump, *label;
817label = jit_label();
818 ...
819 jump = jit_beqr(JIT_R0, JIT_R1);
820 jit_patch_at(jump, label);
821@end example
822
823@code{forward} is used to patch code generation before the actual
824position of the label is known.
825
826@example
827 jit_node_t *jump, *label;
828label = jit_forward();
829 jump = jit_beqr(JIT_R0, JIT_R1);
830 jit_patch_at(jump, label);
831 ...
832 jit_link(label);
833@end example
834
835@code{indirect} is useful when creating jump tables, and tells
836@lightning{} to not optimize out a label that is not the target of
837any jump, because an indirect jump may land where it is defined.
838
839@example
840 jit_node_t *jump, *label;
841 ...
842 jmpr(JIT_R0); @rem{/* may jump to label */}
843 ...
844label = jit_indirect();
845@end example
846
847@code{indirect} is an special case of @code{note} and @code{name}
848because it is a valid argument to @code{address}.
849
850Note that the usual idiom to write the previous example is
851@example
852 jit_node_t *addr, *jump;
853addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
854 ...
855 jmpr(JIT_R0);
856 ...
857 jit_patch(addr); @rem{/* implicit label added */}
858@end example
859
860that automatically binds the implicit label added by @code{patch} with
861the @code{movi}, but on some special conditions it is required to create
862an "unbound" label.
863
ba3814c1
PC
864@code{align} is useful for creating multiple entry points to a
865(trampoline) function that are all accessible through a single
866function pointer. @code{align} receives an integer argument that
867defines the minimal alignment of the address of a label directly
868following the @code{align} instruction. The integer argument must be
869a power of two and the effective alignment will be a power of two no
870less than the argument to @code{align}. If the argument to
871@code{align} is 16 or more, the effective alignment will match the
872specified minimal alignment exactly.
873
874@example
875 jit_node_t *forward, *label1, *label2, *jump;
876 unsigned char *addr1, *addr2;
877forward = jit_forward();
878 jit_align(16);
879label1 = jit_indirect(); @rem{/* first entry point */}
880jump = jit_jmpi(); @rem{/* jump to first handler */}
881 jit_patch_at(jump, forward);
882 jit_align(16);
883label2 = jit_indirect(); @rem{/* second entry point */}
884 ... @rem{/* second handler */}
885 jit_jmpr(...);
886 jit_link(forward);
887 ... @rem{/* first handler /*}
888 jit_jmpr(...);
889 ...
890 jit_emit();
891 addr1 = jit_address(label1);
892 addr2 = jit_address(label2);
893 assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
894@end example
895
79bfeef6
PC
896@code{skip} is useful for reserving space in the code buffer that can
897later be filled (possibly with the help of the pair of functions
898@code{jit_unprotect} and @code{jit_protect}).
899
4a71579b
PC
900@item Function prolog
901
902These macros are used to set up a function prolog. The @code{allocai}
903call accept a single integer argument and returns an offset value
904for stack storage access. The @code{allocar} accepts two registers
905arguments, the first is set to the offset for stack access, and the
906second is the size in bytes argument.
907
908@example
909prolog (not specified) @r{function prolog}
910allocai (not specified) @r{reserve space on the stack}
911allocar (not specified) @r{allocate space on the stack}
912@end example
913
914@code{allocai} receives the number of bytes to allocate and returns
915the offset from the frame pointer register @code{FP} to the base of
916the area.
917
918@code{allocar} receives two register arguments. The first is where
919to store the offset from the frame pointer register @code{FP} to the
920base of the area. The second argument is the size in bytes. Note
921that @code{allocar} is dynamic allocation, and special attention
922should be taken when using it. If called in a loop, every iteration
923will allocate stack space. Stack space is aligned from 8 to 64 bytes
924depending on backend requirements, even if allocating only one byte.
925It is advisable to not use it with @code{frame} and @code{tramp}; it
926should work with @code{frame} with special care to call only once,
927but is not supported if used in @code{tramp}, even if called only
928once.
929
930As a small appetizer, here is a small function that adds 1 to the input
931parameter (an @code{int}). I'm using an assembly-like syntax here which
932is a bit different from the one used when writing real subroutines with
933@lightning{}; the real syntax will be introduced in @xref{GNU lightning
934examples, , Generating code at run-time}.
935
936@example
937incr:
938 prolog
939in = arg @rem{! We have an integer argument}
940 getarg R0, in @rem{! Move it to R0}
941 addi R0, R0, 1 @rem{! Add 1}
942 retr R0 @rem{! And return the result}
943@end example
944
945And here is another function which uses the @code{printf} function from
946the standard C library to write a number in hexadecimal notation:
947
948@example
949printhex:
950 prolog
951in = arg @rem{! Same as above}
952 getarg R0, in
953 prepare @rem{! Begin call sequence for printf}
954 pushargi "%x" @rem{! Push format string}
955 ellipsis @rem{! Varargs start here}
956 pushargr R0 @rem{! Push second argument}
957 finishi printf @rem{! Call printf}
958 ret @rem{! Return to caller}
959@end example
960
519a9ea1
PC
961@item Register liveness
962
963During code generation, @lightning{} occasionally needs scratch registers
964or needs to use architecture-defined registers. For that, @lightning{}
965internally maintains register liveness information.
966
967In the following example, @code{qdivr} will need special registers like
968@code{R0} on some architectures. As @lightning{} understands that
969@code{R0} is used in the subsequent instruction, it will create
970save/restore code for @code{R0} in case.
971
972@example
973...
974qdivr V0, V1, V2, V3
975movr V3, R0
976...
977@end example
978
979The same is not true in the example that follows. Here, @code{R0} is
980not alive after the division operation because @code{R0} is neither an
981argument register nor a callee-save register. Thus, no save/restore
982code for @code{R0} will be created in case.
983
984@example
985...
986qdivr V0, V1, V2, V3
987jmpr R1
988...
989@end example
990
991The @code{live} instruction can be used to mark a register as live after
992it as in the following example. Here, @code{R0} will be preserved
993across the division.
994
995@example
996...
997qdivr V0, V1, V2, V3
998live R0
999jmpr R1
1000...
1001@end example
1002
1003The @code{live} instruction is useful at code entry and exit points,
1004like after and before a @code{callr} instruction.
1005
4a71579b
PC
1006@item Trampolines, continuations and tail call optimization
1007
1008Frequently it is required to generate jit code that must jump to
1009code generated later, possibly from another @code{jit_context_t}.
1010These require compatible stack frames.
1011
1012@lightning{} provides two primitives from where trampolines,
1013continuations and tail call optimization can be implemented.
1014
1015@example
1016frame (not specified) @r{create stack frame}
1017tramp (not specified) @r{assume stack frame}
1018@end example
1019
1020@code{frame} receives an integer argument@footnote{It is not
1021automatically computed because it does not know about the
1022requirement of later generated code.} that defines the size in
1023bytes for the stack frame of the current, @code{C} callable,
1024jit function. To calculate this value, a good formula is maximum
1025number of arguments to any called native function times
1026eight@footnote{Times eight so that it works for double arguments.
1027And would not need conditionals for ports that pass arguments in
1028the stack.}, plus the sum of the arguments to any call to
1029@code{jit_allocai}. @lightning{} automatically adjusts this value
1030for any backend specific stack memory it may need, or any
1031alignment constraint.
1032
1033@code{frame} also instructs @lightning{} to save all callee
1034save registers in the prolog and reload in the epilog.
1035
1036@example
1037main: @rem{! jit entry point}
1038 prolog @rem{! function prolog}
1039 frame 256 @rem{! save all callee save registers and}
1040 @rem{! reserve at least 256 bytes in stack}
1041main_loop:
1042 ...
1043 jmpi handler @rem{! jumps to external code}
1044 ...
1045 ret @rem{! return to the caller}
1046@end example
1047
1048@code{tramp} differs from @code{frame} only that a prolog and epilog
1049will not be generated. Note that @code{prolog} must still be used.
1050The code under @code{tramp} must be ready to be entered with a jump
1051at the prolog position, and instead of a return, it must end with
1052a non conditional jump. @code{tramp} exists solely for the fact
1053that it allows optimizing out prolog and epilog code that would
1054never be executed.
1055
1056@example
1057handler: @rem{! handler entry point}
1058 prolog @rem{! function prolog}
1059 tramp 256 @rem{! assumes all callee save registers}
1060 @rem{! are saved and there is at least}
1061 @rem{! 256 bytes in stack}
1062 ...
1063 jmpi main_loop @rem{! return to the main loop}
1064@end example
1065
1066@lightning{} only supports Tail Call Optimization using the
1067@code{tramp} construct. Any other way is not guaranteed to
1068work on all ports.
1069
1070An example of a simple (recursive) tail call optimization:
1071
1072@example
1073factorial: @rem{! Entry point of the factorial function}
1074 prolog
1075in = arg @rem{! Receive an integer argument}
1076 getarg R0, in @rem{! Move argument to RO}
1077 prepare
1078 pushargi 1 @rem{! This is the accumulator}
1079 pushargr R0 @rem{! This is the argument}
1080 finishi fact @rem{! Call the tail call optimized function}
1081 retval R0 @rem{! Fetch the result}
1082 retr R0 @rem{! Return it}
1083 epilog @rem{! Epilog *before* label before prolog}
1084
1085fact: @rem{! Entry point of the helper function}
1086 prolog
1087 frame 16 @rem{! Reserve 16 bytes in the stack}
1088fact_entry: @rem{! This is the tail call entry point}
1089ac = arg @rem{! The accumulator is the first argument}
1090in = arg @rem{! The factorial argument}
1091 getarg R0, ac @rem{! Move the accumulator to R0}
1092 getarg R1, in @rem{! Move the argument to R1}
1093 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
1094 mulr R0, R0, R1 @rem{! accumulator *= argument}
1095 putargr R0, ac @rem{! Update the accumulator}
1096 subi R1, R1, 1 @rem{! argument -= 1}
1097 putargr R1, in @rem{! Update the argument}
1098 jmpi fact_entry @rem{! Tail Call Optimize it!}
1099fact_out:
1100 retr R0 @rem{! Return the accumulator}
1101@end example
1102
1103@item Predicates
1104@example
1105forward_p (not specified) @r{forward label predicate}
1106indirect_p (not specified) @r{indirect label predicate}
1107target_p (not specified) @r{used label predicate}
1108arg_register_p (not specified) @r{argument kind predicate}
1109callee_save_p (not specified) @r{callee save predicate}
1110pointer_p (not specified) @r{pointer predicate}
1111@end example
1112
1113@code{forward_p} expects a @code{jit_node_t*} argument, and
1114returns non zero if it is a forward label reference, that is,
1115a label returned by @code{forward}, that still needs a
1116@code{link} call.
1117
1118@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
1119non zero if it is an indirect label reference, that is, a label that
1120was returned by @code{indirect}.
1121
1122@code{target_p} expects a @code{jit_node_t*} argument, that is any
1123kind of label, and will return non zero if there is at least one
1124jump or move referencing it.
1125
1126@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
1127have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
1128will return non zero if the argument lives in a register. This call
1129is useful to know the live range of register arguments, as those
1130are very fast to read and write, but have volatile values.
1131
79bfeef6 1132@code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
4a71579b
PC
1133@code{JIT_Fn}, and will return non zero if the register is callee
1134save. This call is useful because on several ports, the @code{JIT_Rn}
1135and @code{JIT_Fn} registers are actually callee save; no need
1136to save and load the values when making function calls.
1137
1138@code{pointer_p} expects a pointer argument, and will return non
1139zero if the pointer is inside the generated jit code. Must be
1140called after @code{jit_emit} and before @code{jit_destroy_state}.
ba3814c1
PC
1141
1142@item Atomic operations
1143Only compare-and-swap is implemented. It accepts four operands;
1144the second can be an immediate.
1145
1146The first argument is set with a boolean value telling if the operation
1147did succeed.
1148
1149Arguments must be different, cannot use the result register to also pass
1150an argument.
1151
1152The second argument is the address of a machine word.
1153
1154The third argument is the old value.
1155
1156The fourth argument is the new value.
1157
1158@example
1159casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1160casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1161@end example
1162
1163If value at the address in the second argument is equal to the third
1164argument, the address value is atomically modified to the value of the
1165fourth argument and the first argument is set to a non zero value.
1166
1167If the value at the address in the second argument is not equal to the
1168third argument nothing is done and the first argument is set to zero.
4a71579b
PC
1169@end table
1170
1171@node GNU lightning examples
1172@chapter Generating code at run-time
1173
1174To use @lightning{}, you should include the @file{lightning.h} file that
1175is put in your include directory by the @samp{make install} command.
1176
1177Each of the instructions above translates to a macro or function call.
1178All you have to do is prepend @code{jit_} (lowercase) to opcode names
1179and @code{JIT_} (uppercase) to register names. Of course, parameters
1180are to be put between parentheses.
1181
1182This small tutorial presents three examples:
1183
1184@iftex
1185@itemize @bullet
1186@item
1187The @code{incr} function found in @ref{The instruction set, ,
1188@lightning{}'s instruction set}:
1189
1190@item
1191A simple function call to @code{printf}
1192
1193@item
1194An RPN calculator.
1195
1196@item
1197Fibonacci numbers
1198@end itemize
1199@end iftex
1200@ifnottex
1201@menu
1202* incr:: A function which increments a number by one
1203* printf:: A simple function call to printf
1204* RPN calculator:: A more complex example, an RPN calculator
1205* Fibonacci:: Calculating Fibonacci numbers
1206@end menu
1207@end ifnottex
1208
1209@node incr
1210@section A function which increments a number by one
1211
1212Let's see how to create and use the sample @code{incr} function created
1213in @ref{The instruction set, , @lightning{}'s instruction set}:
1214
1215@example
1216#include <stdio.h>
1217#include <lightning.h>
1218
1219static jit_state_t *_jit;
1220
1221typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1222
1223int main(int argc, char *argv[])
1224@{
1225 jit_node_t *in;
1226 pifi incr;
1227
1228 init_jit(argv[0]);
1229 _jit = jit_new_state();
1230
1231 jit_prolog(); @rem{/* @t{ prolog } */}
1232 in = jit_arg(); @rem{/* @t{ in = arg } */}
1233 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
1234 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
1235 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
1236
1237 incr = jit_emit();
1238 jit_clear_state();
1239
1240 @rem{/* call the generated code@comma{} passing 5 as an argument */}
1241 printf("%d + 1 = %d\n", 5, incr(5));
1242
1243 jit_destroy_state();
1244 finish_jit();
1245 return 0;
1246@}
1247@end example
1248
1249Let's examine the code line by line (well, almost@dots{}):
1250
1251@table @t
1252@item #include <lightning.h>
1253You already know about this. It defines all of @lightning{}'s macros.
1254
1255@item static jit_state_t *_jit;
1256You might wonder about what is @code{jit_state_t}. It is a structure
1257that stores jit code generation information. The name @code{_jit} is
1258special, because since multiple jit generators can run at the same
1259time, you must either @r{#define _jit my_jit_state} or name it
1260@code{_jit}.
1261
1262@item typedef int (*pifi)(int);
1263Just a handy typedef for a pointer to a function that takes an
1264@code{int} and returns another.
1265
1266@item jit_node_t *in;
1267Declares a variable to hold an identifier for a function argument. It
1268is an opaque pointer, that will hold the return of a call to @code{arg}
1269and be used as argument to @code{getarg}.
1270
1271@item pifi incr;
1272Declares a function pointer variable to a function that receives an
1273@code{int} and returns an @code{int}.
1274
1275@item init_jit(argv[0]);
1276You must call this function before creating a @code{jit_state_t}
1277object. This function does global state initialization, and may need
1278to detect CPU or Operating System features. It receives a string
1279argument that is later used to read symbols from a shared object using
1280GNU binutils if disassembly was enabled at configure time. If no
1281disassembly will be performed a NULL pointer can be used as argument.
1282
1283@item _jit = jit_new_state();
1284This call initializes a @lightning{} jit state.
1285
1286@item jit_prolog();
1287Ok, so we start generating code for our beloved function@dots{}
1288
1289@item in = jit_arg();
1290@itemx jit_getarg(JIT_R0, in);
1291We retrieve the first (and only) argument, an integer, and store it
1292into the general-purpose register @code{R0}.
1293
1294@item jit_addi(JIT_R0, JIT_R0, 1);
1295We add one to the content of the register.
1296
1297@item jit_retr(JIT_R0);
1298This instruction generates a standard function epilog that returns
1299the contents of the @code{R0} register.
1300
1301@item incr = jit_emit();
1302This instruction is very important. It actually translates the
1303@lightning{} macros used before to machine code, flushes the generated
1304code area out of the processor's instruction cache and return a
1305pointer to the start of the code.
1306
1307@item jit_clear_state();
1308This call cleanups any data not required for jit execution. Note
1309that it must be called after any call to @code{jit_print} or
1310@code{jit_address}, as this call destroy the @lightning{}
1311intermediate representation.
1312
1313@item printf("%d + 1 = %d", 5, incr(5));
1314Calling our function is this simple---it is not distinguishable from
1315a normal C function call, the only difference being that @code{incr}
1316is a variable.
1317
1318@item jit_destroy_state();
1319Releases all memory associated with the jit context. It should be
1320called after known the jit will no longer be called.
1321
1322@item finish_jit();
1323This call cleanups any global state hold by @lightning{}, and is
1324advisable to call it once jit code will no longer be generated.
1325@end table
1326
1327@lightning{} abstracts two phases of dynamic code generation: selecting
1328instructions that map the standard representation, and emitting binary
1329code for these instructions. The client program has the responsibility
1330of describing the code to be generated using the standard @lightning{}
1331instruction set.
1332
1333Let's examine the code generated for @code{incr} on the SPARC and x86_64
1334architecture (on the right is the code that an assembly-language
1335programmer would write):
1336
1337@table @b
1338@item SPARC
1339@example
1340 save %sp, -112, %sp
1341 mov %i0, %g2 retl
1342 inc %g2 inc %o0
1343 mov %g2, %i0
519a9ea1
PC
1344 restore
1345 retl
1346 nop
4a71579b
PC
1347@end example
1348In this case, @lightning{} introduces overhead to create a register
1349window (not knowing that the procedure is a leaf procedure) and to
1350move the argument to the general purpose register @code{R0} (which
1351maps to @code{%g2} on the SPARC).
1352@end table
1353
1354@table @b
1355@item x86_64
1356@example
79bfeef6
PC
1357 mov %rdi,%rax
1358 add $0x1,%rax
1359 ret
4a71579b 1360@end example
79bfeef6
PC
1361In this case, for the x86 port, @lightning{} has simple optimizations
1362to understand it is a leaf function, and that it is not required to
1363create a stack frame nor update the stack pointer.
4a71579b
PC
1364@end table
1365
1366@node printf
1367@section A simple function call to @code{printf}
1368
1369Again, here is the code for the example:
1370
1371@example
1372#include <stdio.h>
1373#include <lightning.h>
1374
1375static jit_state_t *_jit;
1376
1377typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1378
1379int main(int argc, char *argv[])
1380@{
1381 pvfi myFunction; @rem{/* ptr to generated code */}
1382 jit_node_t *start, *end; @rem{/* a couple of labels */}
1383 jit_node_t *in; @rem{/* to get the argument */}
1384
1385 init_jit(argv[0]);
1386 _jit = jit_new_state();
1387
1388 start = jit_note(__FILE__, __LINE__);
1389 jit_prolog();
1390 in = jit_arg();
1391 jit_getarg(JIT_R1, in);
1392 jit_prepare();
1393 jit_pushargi((jit_word_t)"generated %d bytes\n");
1394 jit_ellipsis();
1395 jit_pushargr(JIT_R1);
1396 jit_finishi(printf);
1397 jit_ret();
1398 jit_epilog();
1399 end = jit_note(__FILE__, __LINE__);
1400
1401 myFunction = jit_emit();
1402
1403 @rem{/* call the generated code@comma{} passing its size as argument */}
1404 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1405 jit_clear_state();
1406
1407 jit_disassemble();
1408
1409 jit_destroy_state();
1410 finish_jit();
1411 return 0;
1412@}
1413@end example
1414
1415The function shows how many bytes were generated. Most of the code
1416is not very interesting, as it resembles very closely the program
1417presented in @ref{incr, , A function which increments a number by one}.
1418
1419For this reason, we're going to concentrate on just a few statements.
1420
1421@table @t
1422@item start = jit_note(__FILE__, __LINE__);
1423@itemx @r{@dots{}}
1424@itemx end = jit_note(__FILE__, __LINE__);
1425These two instruction call the @code{jit_note} macro, which creates
1426a note in the jit code; arguments to @code{jit_note} usually are a
1427filename string and line number integer, but using NULL for the
1428string argument is perfectly valid if only need to create a simple
1429marker in the code.
1430
1431@item jit_ellipsis();
1432@code{ellipsis} usually is only required if calling varargs functions
1433with double arguments, but it is a good practice to properly describe
1434the @r{@dots{}} in the call sequence.
1435
1436@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1437Note the use of the @code{(jit_word_t)} cast, that is used only
1438to avoid a compiler warning, due to using a pointer where a
1439wordsize integer type was expected.
1440
1441@item jit_prepare();
1442@itemx @r{@dots{}}
1443@itemx jit_finishi(printf);
1444Once the arguments to @code{printf} have been pushed, what means
1445moving them to stack or register arguments, the @code{printf}
1446function is called and the stack cleaned. Note how @lightning{}
1447abstracts the differences between different architectures and
1448ABI's -- the client program does not know how parameter passing
1449works on the host architecture.
1450
1451@item jit_epilog();
1452Usually it is not required to call @code{epilog}, but because it
1453is implicitly called when noticing the end of a function, if the
1454@code{end} variable was set with a @code{note} call after the
1455@code{ret}, it would not consider the function epilog.
1456
1457@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1458This calls the generate jit function passing as argument the offset
1459difference from the @code{start} and @code{end} notes. The @code{address}
1460call must be done after the @code{emit} call or either a fatal error
1461will happen (if @lightning{} is built with assertions enable) or an
1462undefined value will be returned.
1463
1464@item jit_clear_state();
1465Note that @code{jit_clear_state} was called after executing jit in
1466this example. It was done because it must be called after any call
1467to @code{jit_address} or @code{jit_print}.
1468
1469@item jit_disassemble();
1470@code{disassemble} will dump the generated code to standard output,
1471unless @lightning{} was built with the disassembler disabled, in which
1472case no output will be shown.
1473@end table
1474
1475@node RPN calculator
1476@section A more complex example, an RPN calculator
1477
1478We create a small stack-based RPN calculator which applies a series
1479of operators to a given parameter and to other numeric operands.
1480Unlike previous examples, the code generator is fully parameterized
1481and is able to compile different formulas to different functions.
1482Here is the code for the expression compiler; a sample usage will
1483follow.
1484
1485Since @lightning{} does not provide push/pop instruction, this
1486example uses a stack-allocated area to store the data. Such an
1487area can be allocated using the macro @code{allocai}, which
1488receives the number of bytes to allocate and returns the offset
1489from the frame pointer register @code{FP} to the base of the
1490area.
1491
1492Usually, you will use the @code{ldxi} and @code{stxi} instruction
1493to access stack-allocated variables. However, it is possible to
1494use operations such as @code{add} to compute the address of the
1495variables, and pass the address around.
1496
1497@example
1498#include <stdio.h>
1499#include <lightning.h>
1500
1501typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1502
1503static jit_state_t *_jit;
1504
1505void stack_push(int reg, int *sp)
1506@{
1507 jit_stxi_i (*sp, JIT_FP, reg);
1508 *sp += sizeof (int);
1509@}
1510
1511void stack_pop(int reg, int *sp)
1512@{
1513 *sp -= sizeof (int);
1514 jit_ldxi_i (reg, JIT_FP, *sp);
1515@}
1516
1517jit_node_t *compile_rpn(char *expr)
1518@{
1519 jit_node_t *in, *fn;
1520 int stack_base, stack_ptr;
1521
1522 fn = jit_note(NULL, 0);
1523 jit_prolog();
1524 in = jit_arg();
1525 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1526
79bfeef6 1527 jit_getarg(JIT_R2, in);
4a71579b
PC
1528
1529 while (*expr) @{
1530 char buf[32];
1531 int n;
1532 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1533 expr += n - 1;
1534 stack_push(JIT_R0, &stack_ptr);
1535 jit_movi(JIT_R0, atoi(buf));
1536 @} else if (*expr == 'x') @{
1537 stack_push(JIT_R0, &stack_ptr);
1538 jit_movr(JIT_R0, JIT_R2);
1539 @} else if (*expr == '+') @{
1540 stack_pop(JIT_R1, &stack_ptr);
1541 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1542 @} else if (*expr == '-') @{
1543 stack_pop(JIT_R1, &stack_ptr);
1544 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1545 @} else if (*expr == '*') @{
1546 stack_pop(JIT_R1, &stack_ptr);
1547 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1548 @} else if (*expr == '/') @{
1549 stack_pop(JIT_R1, &stack_ptr);
1550 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1551 @} else @{
1552 fprintf(stderr, "cannot compile: %s\n", expr);
1553 abort();
1554 @}
1555 ++expr;
1556 @}
1557 jit_retr(JIT_R0);
1558 jit_epilog();
1559 return fn;
1560@}
1561@end example
1562
1563The principle on which the calculator is based is easy: the stack top
1564is held in R0, while the remaining items of the stack are held in the
1565memory area that we allocate with @code{allocai}. Compiling a numeric
1566operand or the argument @code{x} pushes the old stack top onto the
1567stack and moves the operand into R0; compiling an operator pops the
1568second operand off the stack into R1, and compiles the operation so
1569that the result goes into R0, thus becoming the new stack top.
1570
1571This example allocates a fixed area for 32 @code{int}s. This is not
1572a problem when the function is a leaf like in this case; in a full-blown
1573compiler you will want to analyze the input and determine the number
1574of needed stack slots---a very simple example of register allocation.
1575The area is then managed like a stack using @code{stack_push} and
1576@code{stack_pop}.
1577
1578Source code for the client (which lies in the same source file) follows:
1579
1580@example
1581int main(int argc, char *argv[])
1582@{
1583 jit_node_t *nc, *nf;
1584 pifi c2f, f2c;
1585 int i;
1586
1587 init_jit(argv[0]);
1588 _jit = jit_new_state();
1589
1590 nc = compile_rpn("32x9*5/+");
1591 nf = compile_rpn("x32-5*9/");
1592 (void)jit_emit();
1593 c2f = (pifi)jit_address(nc);
1594 f2c = (pifi)jit_address(nf);
1595 jit_clear_state();
1596
1597 printf("\nC:");
1598 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1599 printf("\nF:");
1600 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1601 printf("\n");
1602
1603 printf("\nF:");
1604 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1605 printf("\nC:");
1606 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1607 printf("\n");
1608
1609 jit_destroy_state();
1610 finish_jit();
1611 return 0;
1612@}
1613@end example
1614
1615The client displays a conversion table between Celsius and Fahrenheit
1616degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1617formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1618respectively.
1619
1620Providing the formula as an argument to @code{compile_rpn} effectively
1621parameterizes code generation, making it possible to use the same code
1622to compile different functions; this is what makes dynamic code
1623generation so powerful.
1624
1625@node Fibonacci
1626@section Fibonacci numbers
1627
1628The code in this section calculates the Fibonacci sequence. That is
1629modeled by the recurrence relation:
1630@display
1631 f(0) = 0
1632 f(1) = f(2) = 1
1633 f(n) = f(n-1) + f(n-2)
1634@end display
1635
1636The purpose of this example is to introduce branches. There are two
1637kind of branches: backward branches and forward branches. We'll
1638present the calculation in a recursive and iterative form; the
1639former only uses forward branches, while the latter uses both.
1640
1641@example
1642#include <stdio.h>
1643#include <lightning.h>
1644
1645static jit_state_t *_jit;
1646
1647typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1648
1649int main(int argc, char *argv[])
1650@{
1651 pifi fib;
1652 jit_node_t *label;
1653 jit_node_t *call;
1654 jit_node_t *in; @rem{/* offset of the argument */}
1655 jit_node_t *ref; @rem{/* to patch the forward reference */}
1656 jit_node_t *zero; @rem{/* to patch the forward reference */}
1657
1658 init_jit(argv[0]);
1659 _jit = jit_new_state();
1660
1661 label = jit_label();
1662 jit_prolog ();
1663 in = jit_arg ();
1664 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1665 zero = jit_beqi (JIT_R0, 0);
1666 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1667 jit_movi (JIT_R0, 1);
1668 ref = jit_blei (JIT_V0, 2);
1669 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1670 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1671 jit_prepare();
1672 jit_pushargr(JIT_V1);
1673 call = jit_finishi(NULL);
1674 jit_patch_at(call, label);
1675 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1676 jit_prepare();
1677 jit_pushargr(JIT_V2);
1678 call = jit_finishi(NULL);
1679 jit_patch_at(call, label);
1680 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1681 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1682
1683 jit_patch(ref); @rem{/* patch jump */}
1684 jit_patch(zero); @rem{/* patch jump */}
1685 jit_retr(JIT_R0);
1686
1687 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1688 fib = jit_emit();
1689 jit_clear_state();
1690 printf("fib(%d) = %d\n", 32, fib(32));
1691 jit_destroy_state();
1692 finish_jit();
1693 return 0;
1694@}
1695@end example
1696
1697As said above, this is the first example of dynamically compiling
1698branches. Branch instructions have two operands containing the
1699values to be compared, and return a @code{jit_note_t *} object
1700to be patched.
1701
1702Because labels final address are only known after calling @code{emit},
1703it is required to call @code{patch} or @code{patch_at}, what does
1704tell @lightning{} that the target to patch is actually a pointer to
1705a @code{jit_node_t *} object, otherwise, it would assume that is
1706a pointer to a C function. Note that conditional branches do not
1707receive a label argument, so they must be patched.
1708
1709You need to call @code{patch_at} on the return of value @code{calli},
1710@code{finishi}, and @code{calli} if it is actually referencing a label
1711in the jit code. All branch instructions do not receive a label
1712argument. Note that @code{movi} is an special case, and patching it
1713is usually done to get the final address of a label, usually to later
1714call @code{jmpr}.
1715
1716Now, here is the iterative version:
1717
1718@example
1719#include <stdio.h>
1720#include <lightning.h>
1721
1722static jit_state_t *_jit;
1723
1724typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1725
1726int main(int argc, char *argv[])
1727@{
1728 pifi fib;
1729 jit_node_t *in; @rem{/* offset of the argument */}
1730 jit_node_t *ref; @rem{/* to patch the forward reference */}
1731 jit_node_t *zero; @rem{/* to patch the forward reference */}
1732 jit_node_t *jump; @rem{/* jump to start of loop */}
1733 jit_node_t *loop; @rem{/* start of the loop */}
1734
1735 init_jit(argv[0]);
1736 _jit = jit_new_state();
1737
1738 jit_prolog ();
1739 in = jit_arg ();
1740 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1741 zero = jit_beqi (JIT_R0, 0);
1742 jit_movr (JIT_R1, JIT_R0);
1743 jit_movi (JIT_R0, 1);
1744 ref = jit_blti (JIT_R1, 2);
1745 jit_subi (JIT_R2, JIT_R2, 2);
1746 jit_movr (JIT_R1, JIT_R0);
1747
1748 loop= jit_label();
1749 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1750 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1751 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1752 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1753 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1754 jit_patch_at(jump, loop);
1755
1756 jit_patch(ref); @rem{/* patch forward jump */}
1757 jit_patch(zero); @rem{/* patch forward jump */}
1758 jit_retr (JIT_R0);
1759
1760 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1761 fib = jit_emit();
1762 jit_clear_state();
1763 printf("fib(%d) = %d\n", 36, fib(36));
1764 jit_destroy_state();
1765 finish_jit();
1766 return 0;
1767@}
1768@end example
1769
1770This code calculates the recurrence relation using iteration (a
1771@code{for} loop in high-level languages). There are no function
1772calls anymore: instead, there is a backward jump (the @code{bnei} at
1773the end of the loop).
1774
1775Note that the program must remember the address for backward jumps;
1776for forward jumps it is only required to remember the jump code,
1777and call @code{patch} for the implicit label.
1778
1779@node Reentrancy
1780@chapter Re-entrant usage of @lightning{}
1781
1782@lightning{} uses the special @code{_jit} identifier. To be able
1783to be able to use multiple jit generation states at the same
1784time, it is required to used code similar to:
1785
1786@example
1787 struct jit_state lightning;
1788 #define lightning _jit
1789@end example
1790
1791This will cause the symbol defined to @code{_jit} to be passed as
1792the first argument to the underlying @lightning{} implementation,
1793that is usually a function with an @code{_} (underscode) prefix
1794and with an argument named @code{_jit}, in the pattern:
1795
1796@example
1797 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1798 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1799@end example
1800
1801The reason for this is to use the same syntax as the initial lightning
1802implementation and to avoid needing the user to keep adding an extra
1803argument to every call, as multiple jit states generating code in
1804paralell should be very uncommon.
1805
519a9ea1 1806@node Registers
4a71579b
PC
1807@chapter Accessing the whole register file
1808
1809As mentioned earlier in this chapter, all @lightning{} back-ends are
1810guaranteed to have at least six general-purpose integer registers and
1811six floating-point registers, but many back-ends will have more.
1812
1813To access the entire register files, you can use the
1814@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1815accept a parameter that identifies the register number, which
1816must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1817and @code{JIT_F_NUM} respectively; the number need not be
1818constant. Of course, expressions like @code{JIT_R0} and
1819@code{JIT_R(0)} denote the same register, and likewise for
1820integer callee-saved, or floating-point, registers.
1821
519a9ea1
PC
1822@section Scratch registers
1823
1824For operations, @lightning{} does not support directly, like storing
1825a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1826acquire and release a scratch register as in the following pattern:
1827
1828@example
1829 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1830 jit_movi (reg, immediate);
1831 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1832 jit_unget_reg (reg);
1833@end example
1834
1835As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1836reloads but don't follow branches, the code between both must be in
1837the same basic block and must not contain any branches as in the
1838following (bad) example.
1839
1840@example
1841 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1842 jit_ldxi (reg, JIT_V0, offset);
1843 jump = jit_bnei (reg, V0);
1844 jit_movr (JIT_V1, reg);
1845 jit_patch (jump);
1846 jit_unget_reg (reg);
1847@end example
1848
4a71579b
PC
1849@node Customizations
1850@chapter Customizations
1851
1852Frequently it is desirable to have more control over how code is
1853generated or how memory is used during jit generation or execution.
1854
1855@section Memory functions
1856To aid in complete control of memory allocation and deallocation
1857@lightning{} provides wrappers that default to standard @code{malloc},
1858@code{realloc} and @code{free}. These are loosely based on the
1859GNU GMP counterparts, with the difference that they use the same
1860prototype of the system allocation functions, that is, no @code{size}
1861for @code{free} or @code{old_size} for @code{realloc}.
1862
1863@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1864@lightning{} guarantees that memory is only allocated or released
1865using these wrapped functions, but you must note that if lightning
1866was linked to GNU binutils, malloc is probably will be called multiple
1867times from there when initializing the disassembler.
1868
1869Because @code{init_jit} may call memory functions, if you need to call
1870@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1871otherwise, when calling @code{finish_jit}, a pointer allocated with the
1872previous or default wrappers will be passed.
1873@end deftypefun
1874
1875@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1876Get the current memory allocation function. Also, unlike the GNU GMP
1877counterpart, it is an error to pass @code{NULL} pointers as arguments.
1878@end deftypefun
1879
79bfeef6
PC
1880@section Protection
1881Unless an alternate code buffer is used (see below), @code{jit_emit}
1882set the access protections that the code buffer's memory can be read and
1883executed, but not modified. One can use the following functions after
1884@code{jit_emit} but before @code{jit_clear} to temporarily lift the
1885protection:
1886
1887@deftypefun void jit_unprotect ()
1888Changes the access protection that the code buffer's memory can be read and
1889modified. Before the emitted code can be invoked, @code{jit_protect}
1890has to be called to reset the change.
1891
1892This procedure has no effect when an alternate code buffer (see below) is used.
1893@end deftypefun
1894
1895@deftypefun void jit_protect ()
1896Changes the access protection that the code buffer's memory can be read and
1897executed.
1898
1899This procedure has no effect when an alternate code buffer (see below) is used.
1900@end deftypefun
1901
4a71579b
PC
1902@section Alternate code buffer
1903To instruct @lightning{} to use an alternate code buffer it is required
1904to call @code{jit_realize} before @code{jit_emit}, and then query states
1905and customize as appropriate.
1906
1907@deftypefun void jit_realize ()
1908Must be called once, before @code{jit_emit}, to instruct @lightning{}
1909that no other @code{jit_xyz} call will be made.
1910@end deftypefun
1911
1912@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1913Returns NULL or the previous value set with @code{jit_set_code}, and
1914sets the @var{code_size} argument to an appropriate value.
1915If @code{jit_get_code} is called before @code{jit_emit}, the
1916@var{code_size} argument is set to the expected amount of bytes
1917required to generate code.
1918If @code{jit_get_code} is called after @code{jit_emit}, the
1919@var{code_size} argument is set to the exact amount of bytes used
1920by the code.
1921@end deftypefun
1922
1923@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1924Instructs @lightning{} to output to the @var{code} argument and
1925use @var{size} as a guard to not write to invalid memory. If during
1926@code{jit_emit} @lightning{} finds out that the code would not fit
1927in @var{size} bytes, it halts code emit and returns @code{NULL}.
1928@end deftypefun
1929
1930A simple example of a loop using an alternate buffer is:
1931
1932@example
1933 jit_uint8_t *code;
1934 int *(func)(int); @rem{/* function pointer */}
1935 jit_word_t code_size;
1936 jit_word_t real_code_size;
1937 @rem{...}
1938 jit_realize(); @rem{/* ready to generate code */}
1939 jit_get_code(&code_size); @rem{/* get expected code size */}
1940 code_size = (code_size + 4095) & -4096;
1941 do (;;) @{
1942 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1943 MAP_PRIVATE | MAP_ANON, -1, 0);
1944 jit_set_code(code, code_size);
1945 if ((func = jit_emit()) == NULL) @{
1946 munmap(code, code_size);
1947 code_size += 4096;
1948 @}
1949 @} while (func == NULL);
1950 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1951@end example
1952
1953The first call to @code{jit_get_code} should return @code{NULL} and set
1954the @code{code_size} argument to the expected amount of bytes required
1955to emit code.
1956The second call to @code{jit_get_code} is after a successful call to
1957@code{jit_emit}, and will return the value previously set with
1958@code{jit_set_code} and set the @code{real_code_size} argument to the
1959exact amount of bytes used to emit the code.
1960
1961@section Alternate data buffer
1962Sometimes it may be desirable to customize how, or to prevent
1963@lightning{} from using an extra buffer for constants or debug
1964annotation. Usually when also using an alternate code buffer.
1965
1966@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1967Returns @code{NULL} or the previous value set with @code{jit_set_data},
1968and sets the @var{data_size} argument to how many bytes are required
1969for the constants data buffer, and @var{note_size} to how many bytes
1970are required to store the debug note information.
1971Note that it always preallocate one debug note entry even if
1972@code{jit_name} or @code{jit_note} are never called, but will return
1973zero in the @var{data_size} argument if no constant is required;
1974constants are only used for the @code{float} and @code{double} operations
1975that have an immediate argument, and not in all @lightning{} ports.
1976@end deftypefun
1977
1978@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1979
1980@var{data} can be NULL if disabling constants and annotations, otherwise,
1981a valid pointer must be passed. An assertion is done that the data will
1982fit in @var{size} bytes (but that is a noop if @lightning{} was built
1983with @code{-DNDEBUG}).
1984
1985@var{size} tells the space in bytes available in @var{data}.
1986
1987@var{flags} can be zero to tell to just use the alternate data buffer,
1988or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1989
1990@table @t
1991@item JIT_DISABLE_DATA
1992@cindex JIT_DISABLE_DATA
1993Instructs @lightning{} to not use a constant table, but to use an
1994alternate method to synthesize those, usually with a larger code
1995sequence using stack space to transfer the value from a GPR to a
1996FPR register.
1997
1998@item JIT_DISABLE_NOTE
1999@cindex JIT_DISABLE_NOTE
2000Instructs @lightning{} to not store file or function name, and
2001line numbers in the constant buffer.
2002@end table
2003@end deftypefun
2004
2005A simple example of a preventing usage of a data buffer is:
2006
2007@example
2008 @rem{...}
2009 jit_realize(); @rem{/* ready to generate code */}
2010 jit_get_data(NULL, NULL);
2011 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
2012 @rem{...}
2013@end example
2014
2015Or to only use a data buffer, if required:
2016
2017@example
2018 jit_uint8_t *data;
2019 jit_word_t data_size;
2020 @rem{...}
2021 jit_realize(); @rem{/* ready to generate code */}
2022 jit_get_data(&data_size, NULL);
2023 if (data_size)
2024 data = malloc(data_size);
2025 else
2026 data = NULL;
2027 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
2028 @rem{...}
2029 if (data)
2030 free(data);
2031 @rem{...}
2032@end example
2033
2034@node Acknowledgements
2035@chapter Acknowledgements
2036
2037As far as I know, the first general-purpose portable dynamic code
2038generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
2039Further work by Dawson R. Engler resulted in the @sc{vcode} system;
2040unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
2041directly inspired @lightning{}.
2042
2043Thanks go to Ian Piumarta, who kindly accepted to release his own
2044program @sc{ccg} under the GNU General Public License, thereby allowing
2045@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
2046@sc{ccg} provides a way of dynamically assemble programs written in the
2047underlying architecture's assembly language. So it is not portable,
2048yet very interesting.
2049
2050I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
2051was first developed as a tool to be used in GNU Smalltalk's dynamic
2052translator from bytecodes to native code.