update libchdr
[pcsx_rearmed.git] / deps / lightning / doc / body.texi
CommitLineData
4a71579b
PC
1@ifnottex
2@dircategory Software development
3@direntry
4* lightning: (lightning). Library for dynamic code generation.
5@end direntry
6@end ifnottex
7
8@ifnottex
9@node Top
10@top @lightning{}
11
12@iftex
13@macro comma
14@verbatim{|,|}
15@end macro
16@end iftex
17
18@ifnottex
19@macro comma
20@verb{|,|}
21@end macro
22@end ifnottex
23
24This document describes @value{TOPIC} the @lightning{} library for
25dynamic code generation.
26
27@menu
28* Overview:: What GNU lightning is
29* Installation:: Configuring and installing GNU lightning
30* The instruction set:: The RISC instruction set used in GNU lightning
31* GNU lightning examples:: GNU lightning's examples
32* Reentrancy:: Re-entrant usage of GNU lightning
519a9ea1 33* Registers:: Accessing the whole register file
4a71579b
PC
34* Customizations:: Advanced code generation customizations
35* Acknowledgements:: Acknowledgements for GNU lightning
36@end menu
37@end ifnottex
38
39@node Overview
40@chapter Introduction to @lightning{}
41
42@iftex
43This document describes @value{TOPIC} the @lightning{} library for
44dynamic code generation.
45@end iftex
46
519a9ea1
PC
47Dynamic code generation is the generation of machine code
48at runtime. It is typically used to strip a layer of interpretation
4a71579b
PC
49by allowing compilation to occur at runtime. One of the most
50well-known applications of dynamic code generation is perhaps that
51of interpreters that compile source code to an intermediate bytecode
52form, which is then recompiled to machine code at run-time: this
53approach effectively combines the portability of bytecode
54representations with the speed of machine code. Another common
55application of dynamic code generation is in the field of hardware
56simulators and binary emulators, which can use the same techniques
519a9ea1 57to translate simulated instructions to the instructions of the
4a71579b
PC
58underlying machine.
59
60Yet other applications come to mind: for example, windowing
61@dfn{bitblt} operations, matrix manipulations, and network packet
62filters. Albeit very powerful and relatively well known within the
63compiler community, dynamic code generation techniques are rarely
64exploited to their full potential and, with the exception of the
65two applications described above, have remained curiosities because
66of their portability and functionality barriers: binary instructions
67are generated, so programs using dynamic code generation must be
68retargeted for each machine; in addition, coding a run-time code
69generator is a tedious and error-prone task more than a difficult one.
70
71@lightning{} provides a portable, fast and easily retargetable dynamic
519a9ea1 72code generation system.
4a71579b
PC
73
74To be portable, @lightning{} abstracts over current architectures'
75quirks and unorthogonalities. The interface that it exposes to is that
76of a standardized RISC architecture loosely based on the SPARC and MIPS
77chips. There are a few general-purpose registers (six, not including
78those used to receive and pass parameters between subroutines), and
79arithmetic operations involve three operands---either three registers
80or two registers and an arbitrarily sized immediate value.
81
82On one hand, this architecture is general enough that it is possible to
83generate pretty efficient code even on CISC architectures such as the
84Intel x86 or the Motorola 68k families. On the other hand, it matches
85real architectures closely enough that, most of the time, the
86compiler's constant folding pass ends up generating code which
87assembles machine instructions without further tests.
88
89@node Installation
90@chapter Configuring and installing @lightning{}
91
40a44dcb
PC
92Here we will assume that your system already has the dependencies
93necessary to build @lightning{}. For more on dependencies, see
94@lightning{}'s @file{README-hacking} file.
95
96The first thing to do to build @lightning{} is to configure the
4a71579b
PC
97program, picking the set of macros to be used on the host
98architecture; this configuration is automatically performed by
99the @file{configure} shell script; to run it, merely type:
100@example
101 ./configure
102@end example
103
79bfeef6
PC
104The @file{configure} accepts the @code{--enable-disassembler} option,
105hat enables linking to GNU binutils and optionally print human readable
4a71579b
PC
106disassembly of the jit code. This option can be disabled by the
107@code{--disable-disassembler} option.
108
79bfeef6
PC
109@file{configure} also accepts the @code{--enable-devel-disassembler},
110option useful to check exactly hat machine instructions were generated
111for a @lightning{} instrction. Basically mixing @code{jit_print} and
112@code{jit_disassembly}.
113
114The @code{--enable-assertions} option, which enables several consistency
115hecks in the run-time assemblers. These are not usually needed, so you
116can decide to simply forget about it; also remember that these consistency
4a71579b
PC
117checks tend to slow down your code generator.
118
79bfeef6
PC
119The @code{--enable-devel-strong-type-checking} option that does extra type
120checking using @code{assert}. This option also enables the
121@code{--enable-assertions} unless it is explicitly disabled.
122
123The option @code{--enable-devel-get-jit-size} should only be used
124when doing updates or maintenance to lightning. It regenerates the
125@code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
126translating a @lightning{} instruction to machine code.
127
4a71579b
PC
128After you've configured @lightning{}, run @file{make} as usual.
129
130@lightning{} has an extensive set of tests to validate it is working
131correctly in the build host. To test it run:
132@example
133 make check
134@end example
135
136The next important step is:
137@example
138 make install
139@end example
140
141This ends the process of installing @lightning{}.
142
143@node The instruction set
144@chapter @lightning{}'s instruction set
145
146@lightning{}'s instruction set was designed by deriving instructions
147that closely match those of most existing RISC architectures, or
148that can be easily syntesized if absent. Each instruction is composed
149of:
150@itemize @bullet
151@item
152an operation, like @code{sub} or @code{mul}
153
154@item
155most times, a register/immediate flag (@code{r} or @code{i})
156
157@item
158an unsigned modifier (@code{u}), a type identifier or two, when applicable.
159@end itemize
160
161Examples of legal mnemonics are @code{addr} (integer add, with three
162register operands) and @code{muli} (integer multiply, with two
163register operands and an immediate operand). Each instruction takes
164two or three operands; in most cases, one of them can be an immediate
165value instead of a register.
166
167Most @lightning{} integer operations are signed wordsize operations,
168with the exception of operations that convert types, or load or store
169values to/from memory. When applicable, the types and C types are as
170follow:
171
172@example
173 _c @r{signed char}
174 _uc @r{unsigned char}
175 _s @r{short}
176 _us @r{unsigned short}
177 _i @r{int}
178 _ui @r{unsigned int}
179 _l @r{long}
180 _f @r{float}
181 _d @r{double}
182@end example
183
184Most integer operations do not need a type modifier, and when loading or
185storing values to memory there is an alias to the proper operation
186using wordsize operands, that is, if ommited, the type is @r{int} on
18732-bit architectures and @r{long} on 64-bit architectures. Note
188that lightning also expects @code{sizeof(void*)} to match the wordsize.
189
190When an unsigned operation result differs from the equivalent signed
191operation, there is a the @code{_u} modifier.
192
193There are at least seven integer registers, of which six are
194general-purpose, while the last is used to contain the frame pointer
195(@code{FP}). The frame pointer can be used to allocate and access local
196variables on the stack, using the @code{allocai} or @code{allocar}
197instruction.
198
199Of the general-purpose registers, at least three are guaranteed to be
200preserved across function calls (@code{V0}, @code{V1} and
201@code{V2}) and at least three are not (@code{R0}, @code{R1} and
202@code{R2}). Six registers are not very much, but this
203restriction was forced by the need to target CISC architectures
204which, like the x86, are poor of registers; anyway, backends can
205specify the actual number of available registers with the calls
206@code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM}
207(for callee-save registers).
208
209There are at least six floating-point registers, named @code{F0} to
210@code{F5}. These are usually caller-save and are separate from the integer
211registers on the supported architectures; on Intel architectures,
212in 32 bit mode if SSE2 is not available or use of X87 is forced,
213the register stack is mapped to a flat register file. As for the
214integer registers, the macro @code{JIT_F_NUM} yields the number of
215floating-point registers.
216
217The complete instruction set follows; as you can see, most non-memory
218operations only take integers (either signed or unsigned) as operands;
219this was done in order to reduce the instruction set, and because most
220architectures only provide word and long word operations on registers.
221There are instructions that allow operands to be extended to fit a larger
222data type, both in a signed and in an unsigned way.
223
224@table @b
225@item Binary ALU operations
226These accept three operands; the last one can be an immediate.
227@code{addx} operations must directly follow @code{addc}, and
228@code{subx} must follow @code{subc}; otherwise, results are undefined.
229Most, if not all, architectures do not support @r{float} or @r{double}
230immediate operands; lightning emulates those operations by moving the
231immediate to a temporary register and emiting the call with only
232register operands.
233@example
234addr _f _d O1 = O2 + O3
235addi _f _d O1 = O2 + O3
236addxr O1 = O2 + (O3 + carry)
237addxi O1 = O2 + (O3 + carry)
238addcr O1 = O2 + O3, set carry
239addci O1 = O2 + O3, set carry
240subr _f _d O1 = O2 - O3
241subi _f _d O1 = O2 - O3
242subxr O1 = O2 - (O3 + carry)
243subxi O1 = O2 - (O3 + carry)
244subcr O1 = O2 - O3, set carry
245subci O1 = O2 - O3, set carry
246rsbr _f _d O1 = O3 - O1
247rsbi _f _d O1 = O3 - O1
248mulr _f _d O1 = O2 * O3
249muli _f _d O1 = O2 * O3
250divr _u _f _d O1 = O2 / O3
251divi _u _f _d O1 = O2 / O3
252remr _u O1 = O2 % O3
253remi _u O1 = O2 % O3
254andr O1 = O2 & O3
255andi O1 = O2 & O3
256orr O1 = O2 | O3
257ori O1 = O2 | O3
258xorr O1 = O2 ^ O3
259xori O1 = O2 ^ O3
260lshr O1 = O2 << O3
261lshi O1 = O2 << O3
262rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
263rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.}
1f22b268
PC
264movzr O1 = O3 ? O1 : O2
265movnr O1 = O3 ? O2 : O1
4a71579b
PC
266@end example
267
268@item Four operand binary ALU operations
269These accept two result registers, and two operands; the last one can
270be an immediate. The first two arguments cannot be the same register.
271
272@code{qmul} stores the low word of the result in @code{O1} and the
273high word in @code{O2}. For unsigned multiplication, @code{O2} zero
274means there was no overflow. For signed multiplication, no overflow
275check is based on sign, and can be detected if @code{O2} is zero or
276minus one.
277
278@code{qdiv} stores the quotient in @code{O1} and the remainder in
279@code{O2}. It can be used as quick way to check if a division is
280exact, in which case the remainder is zero.
281
282@example
283qmulr _u O1 O2 = O3 * O4
284qmuli _u O1 O2 = O3 * O4
285qdivr _u O1 O2 = O3 / O4
286qdivi _u O1 O2 = O3 / O4
287@end example
288
289@item Unary ALU operations
290These accept two operands, both of which must be registers.
291@example
292negr _f _d O1 = -O2
293comr O1 = ~O2
79bfeef6
PC
294clor O1 = number of leading one bits
295clzr O1 = number of leading zero bits
296ctor O1 = number of trailing one bits
297ctzr O1 = number of trailing zero bits
4a71579b
PC
298@end example
299
79bfeef6
PC
300Note that @code{ctzr} is basically equivalent of a @code{C} call
301@code{ffs} but indexed at bit zero, not one.
302
303Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
304value of zero is not an error, it just returns the number of bits
305in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
306it returns 32.
307
308The @code{clor} and @code{ctor} are just counterparts of the versions
309that search for zero bits.
310
4a71579b
PC
311These unary ALU operations are only defined for float operands.
312@example
313absr _f _d O1 = fabs(O2)
79bfeef6 314sqrtr _f _d O1 = sqrt(O2)
4a71579b
PC
315@end example
316
317Besides requiring the @code{r} modifier, there are no unary operations
318with an immediate operand.
319
320@item Compare instructions
321These accept three operands; again, the last can be an immediate.
322The last two operands are compared, and the first operand, that must be
323an integer register, is set to either 0 or 1, according to whether the
324given condition was met or not.
325
326The conditions given below are for the standard behavior of C,
327where the ``unordered'' comparison result is mapped to false.
328
329@example
330ltr _u _f _d O1 = (O2 < O3)
331lti _u _f _d O1 = (O2 < O3)
332ler _u _f _d O1 = (O2 <= O3)
333lei _u _f _d O1 = (O2 <= O3)
334gtr _u _f _d O1 = (O2 > O3)
335gti _u _f _d O1 = (O2 > O3)
336ger _u _f _d O1 = (O2 >= O3)
337gei _u _f _d O1 = (O2 >= O3)
338eqr _f _d O1 = (O2 == O3)
339eqi _f _d O1 = (O2 == O3)
340ner _f _d O1 = (O2 != O3)
341nei _f _d O1 = (O2 != O3)
342unltr _f _d O1 = !(O2 >= O3)
343unler _f _d O1 = !(O2 > O3)
344ungtr _f _d O1 = !(O2 <= O3)
345unger _f _d O1 = !(O2 < O3)
346uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3)
347ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3)
348ordr _f _d O1 = (O2 == O2) && (O3 == O3)
349unordr _f _d O1 = (O2 != O2) || (O3 != O3)
350@end example
351
352@item Transfer operations
353These accept two operands; for @code{ext} both of them must be
354registers, while @code{mov} accepts an immediate value as the second
355operand.
356
357Unlike @code{movr} and @code{movi}, the other instructions are used
358to truncate a wordsize operand to a smaller integer data type or to
359convert float data types. You can also use @code{extr} to convert an
360integer to a floating point value: the usual options are @code{extr_f}
361and @code{extr_d}.
362
363@example
364movr _f _d O1 = O2
365movi _f _d O1 = O2
366extr _c _uc _s _us _i _ui _f _d O1 = O2
367truncr _f _d O1 = trunc(O2)
368@end example
369
370In 64-bit architectures it may be required to use @code{truncr_f_i},
371@code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match
372the equivalent C code. Only the @code{_i} modifier is available in
37332-bit architectures.
374
375@example
376truncr_f_i = <int> O1 = <float> O2
377truncr_f_l = <long>O1 = <float> O2
378truncr_d_i = <int> O1 = <double>O2
379truncr_d_l = <long>O1 = <double>O2
380@end example
381
382The float conversion operations are @emph{destination first,
383source second}, but the order of the types is reversed. This happens
384for historical reasons.
385
386@example
387extr_f_d = <double>O1 = <float> O2
388extr_d_f = <float> O1 = <double>O2
389@end example
390
391@item Network extensions
392These accept two operands, both of which must be registers; these
393two instructions actually perform the same task, yet they are
394assigned to two mnemonics for the sake of convenience and
395completeness. As usual, the first operand is the destination and
396the second is the source.
397The @code{_ul} variant is only available in 64-bit architectures.
398@example
399htonr _us _ui _ul @r{Host-to-network (big endian) order}
400ntohr _us _ui _ul @r{Network-to-host order }
401@end example
402
40a44dcb
PC
403@code{bswapr} can be used to unconditionally byte-swap an operand.
404On little-endian architectures, @code{htonr} and @code{ntohr} resolve
405to this.
406The @code{_ul} variant is only available in 64-bit architectures.
407@example
408bswapr _us _ui _ul 01 = byte_swap(02)
409@end example
410
4a71579b
PC
411@item Load operations
412@code{ld} accepts two operands while @code{ldx} accepts three;
413in both cases, the last can be either a register or an immediate
414value. Values are extended (with or without sign, according to
415the data type specification) to fit a whole register.
416The @code{_ui} and @code{_l} types are only available in 64-bit
417architectures. For convenience, there is a version without a
418type modifier for integer or pointer operands that uses the
419appropriate wordsize call.
420@example
421ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2
422ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2
423ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
424ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3)
425@end example
426
427@item Store operations
428@code{st} accepts two operands while @code{stx} accepts three; in
429both cases, the first can be either a register or an immediate
430value. Values are sign-extended to fit a whole register.
431@example
79bfeef6
PC
432str _c _s _i _l _f _d *O1 = O2
433sti _c _s _i _l _f _d *O1 = O2
434stxr _c _s _i _l _f _d *(O1+O2) = O3
435stxi _c _s _i _l _f _d *(O1+O2) = O3
4a71579b 436@end example
79bfeef6
PC
437Note that the unsigned type modifier is not available, as the store
438only writes to the 1, 2, 4 or 8 sized memory address.
439The @code{_l} type is only available in 64-bit architectures, and for
440convenience, there is a version without a type modifier for integer or
441pointer operands that uses the appropriate wordsize call.
4a71579b
PC
442
443@item Argument management
444These are:
445@example
446prepare (not specified)
447va_start (not specified)
79bfeef6
PC
448pushargr _c _uc _s _us _i _ui _l _f _d
449pushargi _c _uc _s _us _i _ui _l _f _d
4a71579b 450va_push (not specified)
79bfeef6 451arg _c _uc _s _us _i _ui _l _f _d
4a71579b
PC
452getarg _c _uc _s _us _i _ui _l _f _d
453va_arg _d
79bfeef6
PC
454putargr _c _uc _s _us _i _ui _l _f _d
455putargi _c _uc _s _us _i _ui _l _f _d
4a71579b 456ret (not specified)
79bfeef6
PC
457retr _c _uc _s _us _i _ui _l _f _d
458reti _c _uc _s _us _i _ui _l _f _d
4a71579b
PC
459reti _f _d
460va_end (not specified)
461retval _c _uc _s _us _i _ui _l _f _d
462epilog (not specified)
463@end example
464As with other operations that use a type modifier, the @code{_ui} and
465@code{_l} types are only available in 64-bit architectures, but there
466are operations without a type modifier that alias to the appropriate
467integer operation with wordsize operands.
468
469@code{prepare}, @code{pusharg}, and @code{retval} are used by the caller,
470while @code{arg}, @code{getarg} and @code{ret} are used by the callee.
471A code snippet that wants to call another procedure and has to pass
472arguments must, in order: use the @code{prepare} instruction and use
473the @code{pushargr} or @code{pushargi} to push the arguments @strong{in
474left to right order}; and use @code{finish} or @code{call} (explained below)
475to perform the actual call.
476
79bfeef6
PC
477Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
478handling integer types can be used without a type modifier.
479It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
480and @code{getarg} otherwise problems will happen if generating jit for
481environments that require arguments to be truncated and zero or sign
482extended by the caller and/or excess arguments might be passed packed
483in the stack. Currently only Apple systems with @code{aarch64} cpus are
484known to have this restriction.
485
4a71579b
PC
486@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
487arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
488@code{va_push} is required when passing a @code{va_list} to another function,
489because not all architectures expect it as a single pointer. Known case
490is DEC Alpha, that requires it as a structure passed by value.
491
492@code{arg}, @code{getarg} and @code{putarg} are used by the callee.
493@code{arg} is different from other instruction in that it does not
494actually generate any code: instead, it is a function which returns
495a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return
496a value'' means that @lightning{} code that compile these
497instructions return a value when expanded.} You should call
498@code{arg} as soon as possible, before any function call or, more
499easily, right after the @code{prolog} instructions
500(which is treated later).
501
502@code{getarg} accepts a register argument and a value returned by
503@code{arg}, and will move that argument to the register, extending
504it (with or without sign, according to the data type specification)
505to fit a whole register. These instructions are more intimately
506related to the usage of the @lightning{} instruction set in code
507that generates other code, so they will be treated more
508specifically in @ref{GNU lightning examples, , Generating code at
509run-time}.
510
511@code{putarg} is a mix of @code{getarg} and @code{pusharg} in that
512it accepts as first argument a register or immediate, and as
513second argument a value returned by @code{arg}. It allows changing,
514or restoring an argument to the current function, and is a
515construct required to implement tail call optimization. Note that
516arguments in registers are very cheap, but will be overwritten
517at any moment, including on some operations, for example division,
518that on several ports is implemented as a function call.
519
520Finally, the @code{retval} instruction fetches the return value of a
521called function in a register. The @code{retval} instruction takes a
522register argument and copies the return value of the previously called
523function in that register. A function with a return value should use
524@code{retr} or @code{reti} to put the return value in the return register
525before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example.
526
527@code{epilog} is an optional call, that marks the end of a function
528body. It is automatically generated by @lightning{} if starting a new
529function (what should be done after a @code{ret} call) or finishing
530generating jit.
531It is very important to note that the fact that @code{epilog} being
532optional may cause a common mistake. Consider this:
533@example
534fun1:
535 prolog
536 ...
537 ret
538fun2:
539 prolog
540@end example
541Because @code{epilog} is added when finding a new @code{prolog},
542this will cause the @code{fun2} label to actually be before the
543return from @code{fun1}. Because @lightning{} will actually
544understand it as:
545@example
546fun1:
547 prolog
548 ...
549 ret
550fun2:
551 epilog
552 prolog
553@end example
554
555You should observe a few rules when using these macros. First of
556all, if calling a varargs function, you should use the @code{ellipsis}
557call to mark the position of the ellipsis in the C prototype.
558
559You should not nest calls to @code{prepare} inside a
560@code{prepare/finish} block. Doing this will result in undefined
561behavior. Note that for functions with zero arguments you can use
562just @code{call}.
563
564@item Branch instructions
565Like @code{arg}, these also return a value which, in this case,
566is to be used to compile forward branches as explained in
567@ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be
568compared; of these, the last can be either a register or an immediate.
569They are:
570@example
571bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1
572blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1
573bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
574blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1
575bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1
576bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1
577bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
578bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1
579beqr _f _d @r{if }(O2 == O3)@r{ goto }O1
580beqi _f _d @r{if }(O2 == O3)@r{ goto }O1
581bner _f _d @r{if }(O2 != O3)@r{ goto }O1
582bnei _f _d @r{if }(O2 != O3)@r{ goto }O1
583
584bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1
585bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1
586bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1
587bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1
588buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1
589bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
590bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1
591bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1
592
593bmsr @r{if }O2 & O3@r{ goto }O1
594bmsi @r{if }O2 & O3@r{ goto }O1
595bmcr @r{if }!(O2 & O3)@r{ goto }O1
596bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
597boaddr _u O2 += O3@r{, goto }O1@r{ if overflow}
598boaddi _u O2 += O3@r{, goto }O1@r{ if overflow}
599bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow}
600bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow}
601bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow}
602bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow}
603bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow}
604bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
605@end example
606
79bfeef6
PC
607Note that the @code{C} code does not have an @code{O1} argument. It is
608required to always use the return value as an argument to @code{patch},
609@code{patch_at} or @code{patch_abs}.
610
4a71579b
PC
611@item Jump and return operations
612These accept one argument except @code{ret} and @code{jmpi} which
613have none; the difference between @code{finishi} and @code{calli}
614is that the latter does not clean the stack from pushed parameters
615(if any) and the former must @strong{always} follow a @code{prepare}
616instruction.
617@example
618callr (not specified) @r{function call to register O1}
619calli (not specified) @r{function call to immediate O1}
620finishr (not specified) @r{function call to register O1}
621finishi (not specified) @r{function call to immediate O1}
622jmpr (not specified) @r{unconditional jump to register}
623jmpi (not specified) @r{unconditional jump}
624ret (not specified) @r{return from subroutine}
625retr _c _uc _s _us _i _ui _l _f _d
626reti _c _uc _s _us _i _ui _l _f _d
627retval _c _uc _s _us _i _ui _l _f _d @r{move return value}
628 @r{to register}
629@end example
630
631Like branch instruction, @code{jmpi} also returns a value which is to
632be used to compile forward branches. @xref{Fibonacci, , Fibonacci
633numbers}.
634
635@item Labels
636There are 3 @lightning{} instructions to create labels:
637@example
638label (not specified) @r{simple label}
639forward (not specified) @r{forward label}
640indirect (not specified) @r{special simple label}
641@end example
642
ba3814c1
PC
643The following instruction is used to specify a minimal alignment for
644the next instruction, usually with a label:
645@example
646align (not specified) @r{align code}
647@end example
648
79bfeef6
PC
649Similar to @code{align} is the next instruction, also usually used with
650a label:
651@example
652skip (not specified) @r{skip code}
653@end example
654It is used to specify a minimal number of bytes of nops to be inserted
655before the next instruction.
656
4a71579b
PC
657@code{label} is normally used as @code{patch_at} argument for backward
658jumps.
659
660@example
661 jit_node_t *jump, *label;
662label = jit_label();
663 ...
664 jump = jit_beqr(JIT_R0, JIT_R1);
665 jit_patch_at(jump, label);
666@end example
667
668@code{forward} is used to patch code generation before the actual
669position of the label is known.
670
671@example
672 jit_node_t *jump, *label;
673label = jit_forward();
674 jump = jit_beqr(JIT_R0, JIT_R1);
675 jit_patch_at(jump, label);
676 ...
677 jit_link(label);
678@end example
679
680@code{indirect} is useful when creating jump tables, and tells
681@lightning{} to not optimize out a label that is not the target of
682any jump, because an indirect jump may land where it is defined.
683
684@example
685 jit_node_t *jump, *label;
686 ...
687 jmpr(JIT_R0); @rem{/* may jump to label */}
688 ...
689label = jit_indirect();
690@end example
691
692@code{indirect} is an special case of @code{note} and @code{name}
693because it is a valid argument to @code{address}.
694
695Note that the usual idiom to write the previous example is
696@example
697 jit_node_t *addr, *jump;
698addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */}
699 ...
700 jmpr(JIT_R0);
701 ...
702 jit_patch(addr); @rem{/* implicit label added */}
703@end example
704
705that automatically binds the implicit label added by @code{patch} with
706the @code{movi}, but on some special conditions it is required to create
707an "unbound" label.
708
ba3814c1
PC
709@code{align} is useful for creating multiple entry points to a
710(trampoline) function that are all accessible through a single
711function pointer. @code{align} receives an integer argument that
712defines the minimal alignment of the address of a label directly
713following the @code{align} instruction. The integer argument must be
714a power of two and the effective alignment will be a power of two no
715less than the argument to @code{align}. If the argument to
716@code{align} is 16 or more, the effective alignment will match the
717specified minimal alignment exactly.
718
719@example
720 jit_node_t *forward, *label1, *label2, *jump;
721 unsigned char *addr1, *addr2;
722forward = jit_forward();
723 jit_align(16);
724label1 = jit_indirect(); @rem{/* first entry point */}
725jump = jit_jmpi(); @rem{/* jump to first handler */}
726 jit_patch_at(jump, forward);
727 jit_align(16);
728label2 = jit_indirect(); @rem{/* second entry point */}
729 ... @rem{/* second handler */}
730 jit_jmpr(...);
731 jit_link(forward);
732 ... @rem{/* first handler /*}
733 jit_jmpr(...);
734 ...
735 jit_emit();
736 addr1 = jit_address(label1);
737 addr2 = jit_address(label2);
738 assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
739@end example
740
79bfeef6
PC
741@code{skip} is useful for reserving space in the code buffer that can
742later be filled (possibly with the help of the pair of functions
743@code{jit_unprotect} and @code{jit_protect}).
744
4a71579b
PC
745@item Function prolog
746
747These macros are used to set up a function prolog. The @code{allocai}
748call accept a single integer argument and returns an offset value
749for stack storage access. The @code{allocar} accepts two registers
750arguments, the first is set to the offset for stack access, and the
751second is the size in bytes argument.
752
753@example
754prolog (not specified) @r{function prolog}
755allocai (not specified) @r{reserve space on the stack}
756allocar (not specified) @r{allocate space on the stack}
757@end example
758
759@code{allocai} receives the number of bytes to allocate and returns
760the offset from the frame pointer register @code{FP} to the base of
761the area.
762
763@code{allocar} receives two register arguments. The first is where
764to store the offset from the frame pointer register @code{FP} to the
765base of the area. The second argument is the size in bytes. Note
766that @code{allocar} is dynamic allocation, and special attention
767should be taken when using it. If called in a loop, every iteration
768will allocate stack space. Stack space is aligned from 8 to 64 bytes
769depending on backend requirements, even if allocating only one byte.
770It is advisable to not use it with @code{frame} and @code{tramp}; it
771should work with @code{frame} with special care to call only once,
772but is not supported if used in @code{tramp}, even if called only
773once.
774
775As a small appetizer, here is a small function that adds 1 to the input
776parameter (an @code{int}). I'm using an assembly-like syntax here which
777is a bit different from the one used when writing real subroutines with
778@lightning{}; the real syntax will be introduced in @xref{GNU lightning
779examples, , Generating code at run-time}.
780
781@example
782incr:
783 prolog
784in = arg @rem{! We have an integer argument}
785 getarg R0, in @rem{! Move it to R0}
786 addi R0, R0, 1 @rem{! Add 1}
787 retr R0 @rem{! And return the result}
788@end example
789
790And here is another function which uses the @code{printf} function from
791the standard C library to write a number in hexadecimal notation:
792
793@example
794printhex:
795 prolog
796in = arg @rem{! Same as above}
797 getarg R0, in
798 prepare @rem{! Begin call sequence for printf}
799 pushargi "%x" @rem{! Push format string}
800 ellipsis @rem{! Varargs start here}
801 pushargr R0 @rem{! Push second argument}
802 finishi printf @rem{! Call printf}
803 ret @rem{! Return to caller}
804@end example
805
519a9ea1
PC
806@item Register liveness
807
808During code generation, @lightning{} occasionally needs scratch registers
809or needs to use architecture-defined registers. For that, @lightning{}
810internally maintains register liveness information.
811
812In the following example, @code{qdivr} will need special registers like
813@code{R0} on some architectures. As @lightning{} understands that
814@code{R0} is used in the subsequent instruction, it will create
815save/restore code for @code{R0} in case.
816
817@example
818...
819qdivr V0, V1, V2, V3
820movr V3, R0
821...
822@end example
823
824The same is not true in the example that follows. Here, @code{R0} is
825not alive after the division operation because @code{R0} is neither an
826argument register nor a callee-save register. Thus, no save/restore
827code for @code{R0} will be created in case.
828
829@example
830...
831qdivr V0, V1, V2, V3
832jmpr R1
833...
834@end example
835
836The @code{live} instruction can be used to mark a register as live after
837it as in the following example. Here, @code{R0} will be preserved
838across the division.
839
840@example
841...
842qdivr V0, V1, V2, V3
843live R0
844jmpr R1
845...
846@end example
847
848The @code{live} instruction is useful at code entry and exit points,
849like after and before a @code{callr} instruction.
850
4a71579b
PC
851@item Trampolines, continuations and tail call optimization
852
853Frequently it is required to generate jit code that must jump to
854code generated later, possibly from another @code{jit_context_t}.
855These require compatible stack frames.
856
857@lightning{} provides two primitives from where trampolines,
858continuations and tail call optimization can be implemented.
859
860@example
861frame (not specified) @r{create stack frame}
862tramp (not specified) @r{assume stack frame}
863@end example
864
865@code{frame} receives an integer argument@footnote{It is not
866automatically computed because it does not know about the
867requirement of later generated code.} that defines the size in
868bytes for the stack frame of the current, @code{C} callable,
869jit function. To calculate this value, a good formula is maximum
870number of arguments to any called native function times
871eight@footnote{Times eight so that it works for double arguments.
872And would not need conditionals for ports that pass arguments in
873the stack.}, plus the sum of the arguments to any call to
874@code{jit_allocai}. @lightning{} automatically adjusts this value
875for any backend specific stack memory it may need, or any
876alignment constraint.
877
878@code{frame} also instructs @lightning{} to save all callee
879save registers in the prolog and reload in the epilog.
880
881@example
882main: @rem{! jit entry point}
883 prolog @rem{! function prolog}
884 frame 256 @rem{! save all callee save registers and}
885 @rem{! reserve at least 256 bytes in stack}
886main_loop:
887 ...
888 jmpi handler @rem{! jumps to external code}
889 ...
890 ret @rem{! return to the caller}
891@end example
892
893@code{tramp} differs from @code{frame} only that a prolog and epilog
894will not be generated. Note that @code{prolog} must still be used.
895The code under @code{tramp} must be ready to be entered with a jump
896at the prolog position, and instead of a return, it must end with
897a non conditional jump. @code{tramp} exists solely for the fact
898that it allows optimizing out prolog and epilog code that would
899never be executed.
900
901@example
902handler: @rem{! handler entry point}
903 prolog @rem{! function prolog}
904 tramp 256 @rem{! assumes all callee save registers}
905 @rem{! are saved and there is at least}
906 @rem{! 256 bytes in stack}
907 ...
908 jmpi main_loop @rem{! return to the main loop}
909@end example
910
911@lightning{} only supports Tail Call Optimization using the
912@code{tramp} construct. Any other way is not guaranteed to
913work on all ports.
914
915An example of a simple (recursive) tail call optimization:
916
917@example
918factorial: @rem{! Entry point of the factorial function}
919 prolog
920in = arg @rem{! Receive an integer argument}
921 getarg R0, in @rem{! Move argument to RO}
922 prepare
923 pushargi 1 @rem{! This is the accumulator}
924 pushargr R0 @rem{! This is the argument}
925 finishi fact @rem{! Call the tail call optimized function}
926 retval R0 @rem{! Fetch the result}
927 retr R0 @rem{! Return it}
928 epilog @rem{! Epilog *before* label before prolog}
929
930fact: @rem{! Entry point of the helper function}
931 prolog
932 frame 16 @rem{! Reserve 16 bytes in the stack}
933fact_entry: @rem{! This is the tail call entry point}
934ac = arg @rem{! The accumulator is the first argument}
935in = arg @rem{! The factorial argument}
936 getarg R0, ac @rem{! Move the accumulator to R0}
937 getarg R1, in @rem{! Move the argument to R1}
938 blei fact_out, R1, 1 @rem{! Done if argument is one or less}
939 mulr R0, R0, R1 @rem{! accumulator *= argument}
940 putargr R0, ac @rem{! Update the accumulator}
941 subi R1, R1, 1 @rem{! argument -= 1}
942 putargr R1, in @rem{! Update the argument}
943 jmpi fact_entry @rem{! Tail Call Optimize it!}
944fact_out:
945 retr R0 @rem{! Return the accumulator}
946@end example
947
948@item Predicates
949@example
950forward_p (not specified) @r{forward label predicate}
951indirect_p (not specified) @r{indirect label predicate}
952target_p (not specified) @r{used label predicate}
953arg_register_p (not specified) @r{argument kind predicate}
954callee_save_p (not specified) @r{callee save predicate}
955pointer_p (not specified) @r{pointer predicate}
956@end example
957
958@code{forward_p} expects a @code{jit_node_t*} argument, and
959returns non zero if it is a forward label reference, that is,
960a label returned by @code{forward}, that still needs a
961@code{link} call.
962
963@code{indirect_p} expects a @code{jit_node_t*} argument, and returns
964non zero if it is an indirect label reference, that is, a label that
965was returned by @code{indirect}.
966
967@code{target_p} expects a @code{jit_node_t*} argument, that is any
968kind of label, and will return non zero if there is at least one
969jump or move referencing it.
970
971@code{arg_register_p} expects a @code{jit_node_t*} argument, that must
972have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and
973will return non zero if the argument lives in a register. This call
974is useful to know the live range of register arguments, as those
975are very fast to read and write, but have volatile values.
976
79bfeef6 977@code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
4a71579b
PC
978@code{JIT_Fn}, and will return non zero if the register is callee
979save. This call is useful because on several ports, the @code{JIT_Rn}
980and @code{JIT_Fn} registers are actually callee save; no need
981to save and load the values when making function calls.
982
983@code{pointer_p} expects a pointer argument, and will return non
984zero if the pointer is inside the generated jit code. Must be
985called after @code{jit_emit} and before @code{jit_destroy_state}.
ba3814c1
PC
986
987@item Atomic operations
988Only compare-and-swap is implemented. It accepts four operands;
989the second can be an immediate.
990
991The first argument is set with a boolean value telling if the operation
992did succeed.
993
994Arguments must be different, cannot use the result register to also pass
995an argument.
996
997The second argument is the address of a machine word.
998
999The third argument is the old value.
1000
1001The fourth argument is the new value.
1002
1003@example
1004casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1005casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
1006@end example
1007
1008If value at the address in the second argument is equal to the third
1009argument, the address value is atomically modified to the value of the
1010fourth argument and the first argument is set to a non zero value.
1011
1012If the value at the address in the second argument is not equal to the
1013third argument nothing is done and the first argument is set to zero.
4a71579b
PC
1014@end table
1015
1016@node GNU lightning examples
1017@chapter Generating code at run-time
1018
1019To use @lightning{}, you should include the @file{lightning.h} file that
1020is put in your include directory by the @samp{make install} command.
1021
1022Each of the instructions above translates to a macro or function call.
1023All you have to do is prepend @code{jit_} (lowercase) to opcode names
1024and @code{JIT_} (uppercase) to register names. Of course, parameters
1025are to be put between parentheses.
1026
1027This small tutorial presents three examples:
1028
1029@iftex
1030@itemize @bullet
1031@item
1032The @code{incr} function found in @ref{The instruction set, ,
1033@lightning{}'s instruction set}:
1034
1035@item
1036A simple function call to @code{printf}
1037
1038@item
1039An RPN calculator.
1040
1041@item
1042Fibonacci numbers
1043@end itemize
1044@end iftex
1045@ifnottex
1046@menu
1047* incr:: A function which increments a number by one
1048* printf:: A simple function call to printf
1049* RPN calculator:: A more complex example, an RPN calculator
1050* Fibonacci:: Calculating Fibonacci numbers
1051@end menu
1052@end ifnottex
1053
1054@node incr
1055@section A function which increments a number by one
1056
1057Let's see how to create and use the sample @code{incr} function created
1058in @ref{The instruction set, , @lightning{}'s instruction set}:
1059
1060@example
1061#include <stdio.h>
1062#include <lightning.h>
1063
1064static jit_state_t *_jit;
1065
1066typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1067
1068int main(int argc, char *argv[])
1069@{
1070 jit_node_t *in;
1071 pifi incr;
1072
1073 init_jit(argv[0]);
1074 _jit = jit_new_state();
1075
1076 jit_prolog(); @rem{/* @t{ prolog } */}
1077 in = jit_arg(); @rem{/* @t{ in = arg } */}
1078 jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */}
1079 jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */}
1080 jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */}
1081
1082 incr = jit_emit();
1083 jit_clear_state();
1084
1085 @rem{/* call the generated code@comma{} passing 5 as an argument */}
1086 printf("%d + 1 = %d\n", 5, incr(5));
1087
1088 jit_destroy_state();
1089 finish_jit();
1090 return 0;
1091@}
1092@end example
1093
1094Let's examine the code line by line (well, almost@dots{}):
1095
1096@table @t
1097@item #include <lightning.h>
1098You already know about this. It defines all of @lightning{}'s macros.
1099
1100@item static jit_state_t *_jit;
1101You might wonder about what is @code{jit_state_t}. It is a structure
1102that stores jit code generation information. The name @code{_jit} is
1103special, because since multiple jit generators can run at the same
1104time, you must either @r{#define _jit my_jit_state} or name it
1105@code{_jit}.
1106
1107@item typedef int (*pifi)(int);
1108Just a handy typedef for a pointer to a function that takes an
1109@code{int} and returns another.
1110
1111@item jit_node_t *in;
1112Declares a variable to hold an identifier for a function argument. It
1113is an opaque pointer, that will hold the return of a call to @code{arg}
1114and be used as argument to @code{getarg}.
1115
1116@item pifi incr;
1117Declares a function pointer variable to a function that receives an
1118@code{int} and returns an @code{int}.
1119
1120@item init_jit(argv[0]);
1121You must call this function before creating a @code{jit_state_t}
1122object. This function does global state initialization, and may need
1123to detect CPU or Operating System features. It receives a string
1124argument that is later used to read symbols from a shared object using
1125GNU binutils if disassembly was enabled at configure time. If no
1126disassembly will be performed a NULL pointer can be used as argument.
1127
1128@item _jit = jit_new_state();
1129This call initializes a @lightning{} jit state.
1130
1131@item jit_prolog();
1132Ok, so we start generating code for our beloved function@dots{}
1133
1134@item in = jit_arg();
1135@itemx jit_getarg(JIT_R0, in);
1136We retrieve the first (and only) argument, an integer, and store it
1137into the general-purpose register @code{R0}.
1138
1139@item jit_addi(JIT_R0, JIT_R0, 1);
1140We add one to the content of the register.
1141
1142@item jit_retr(JIT_R0);
1143This instruction generates a standard function epilog that returns
1144the contents of the @code{R0} register.
1145
1146@item incr = jit_emit();
1147This instruction is very important. It actually translates the
1148@lightning{} macros used before to machine code, flushes the generated
1149code area out of the processor's instruction cache and return a
1150pointer to the start of the code.
1151
1152@item jit_clear_state();
1153This call cleanups any data not required for jit execution. Note
1154that it must be called after any call to @code{jit_print} or
1155@code{jit_address}, as this call destroy the @lightning{}
1156intermediate representation.
1157
1158@item printf("%d + 1 = %d", 5, incr(5));
1159Calling our function is this simple---it is not distinguishable from
1160a normal C function call, the only difference being that @code{incr}
1161is a variable.
1162
1163@item jit_destroy_state();
1164Releases all memory associated with the jit context. It should be
1165called after known the jit will no longer be called.
1166
1167@item finish_jit();
1168This call cleanups any global state hold by @lightning{}, and is
1169advisable to call it once jit code will no longer be generated.
1170@end table
1171
1172@lightning{} abstracts two phases of dynamic code generation: selecting
1173instructions that map the standard representation, and emitting binary
1174code for these instructions. The client program has the responsibility
1175of describing the code to be generated using the standard @lightning{}
1176instruction set.
1177
1178Let's examine the code generated for @code{incr} on the SPARC and x86_64
1179architecture (on the right is the code that an assembly-language
1180programmer would write):
1181
1182@table @b
1183@item SPARC
1184@example
1185 save %sp, -112, %sp
1186 mov %i0, %g2 retl
1187 inc %g2 inc %o0
1188 mov %g2, %i0
519a9ea1
PC
1189 restore
1190 retl
1191 nop
4a71579b
PC
1192@end example
1193In this case, @lightning{} introduces overhead to create a register
1194window (not knowing that the procedure is a leaf procedure) and to
1195move the argument to the general purpose register @code{R0} (which
1196maps to @code{%g2} on the SPARC).
1197@end table
1198
1199@table @b
1200@item x86_64
1201@example
79bfeef6
PC
1202 mov %rdi,%rax
1203 add $0x1,%rax
1204 ret
4a71579b 1205@end example
79bfeef6
PC
1206In this case, for the x86 port, @lightning{} has simple optimizations
1207to understand it is a leaf function, and that it is not required to
1208create a stack frame nor update the stack pointer.
4a71579b
PC
1209@end table
1210
1211@node printf
1212@section A simple function call to @code{printf}
1213
1214Again, here is the code for the example:
1215
1216@example
1217#include <stdio.h>
1218#include <lightning.h>
1219
1220static jit_state_t *_jit;
1221
1222typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */}
1223
1224int main(int argc, char *argv[])
1225@{
1226 pvfi myFunction; @rem{/* ptr to generated code */}
1227 jit_node_t *start, *end; @rem{/* a couple of labels */}
1228 jit_node_t *in; @rem{/* to get the argument */}
1229
1230 init_jit(argv[0]);
1231 _jit = jit_new_state();
1232
1233 start = jit_note(__FILE__, __LINE__);
1234 jit_prolog();
1235 in = jit_arg();
1236 jit_getarg(JIT_R1, in);
1237 jit_prepare();
1238 jit_pushargi((jit_word_t)"generated %d bytes\n");
1239 jit_ellipsis();
1240 jit_pushargr(JIT_R1);
1241 jit_finishi(printf);
1242 jit_ret();
1243 jit_epilog();
1244 end = jit_note(__FILE__, __LINE__);
1245
1246 myFunction = jit_emit();
1247
1248 @rem{/* call the generated code@comma{} passing its size as argument */}
1249 myFunction((char*)jit_address(end) - (char*)jit_address(start));
1250 jit_clear_state();
1251
1252 jit_disassemble();
1253
1254 jit_destroy_state();
1255 finish_jit();
1256 return 0;
1257@}
1258@end example
1259
1260The function shows how many bytes were generated. Most of the code
1261is not very interesting, as it resembles very closely the program
1262presented in @ref{incr, , A function which increments a number by one}.
1263
1264For this reason, we're going to concentrate on just a few statements.
1265
1266@table @t
1267@item start = jit_note(__FILE__, __LINE__);
1268@itemx @r{@dots{}}
1269@itemx end = jit_note(__FILE__, __LINE__);
1270These two instruction call the @code{jit_note} macro, which creates
1271a note in the jit code; arguments to @code{jit_note} usually are a
1272filename string and line number integer, but using NULL for the
1273string argument is perfectly valid if only need to create a simple
1274marker in the code.
1275
1276@item jit_ellipsis();
1277@code{ellipsis} usually is only required if calling varargs functions
1278with double arguments, but it is a good practice to properly describe
1279the @r{@dots{}} in the call sequence.
1280
1281@item jit_pushargi((jit_word_t)"generated %d bytes\n");
1282Note the use of the @code{(jit_word_t)} cast, that is used only
1283to avoid a compiler warning, due to using a pointer where a
1284wordsize integer type was expected.
1285
1286@item jit_prepare();
1287@itemx @r{@dots{}}
1288@itemx jit_finishi(printf);
1289Once the arguments to @code{printf} have been pushed, what means
1290moving them to stack or register arguments, the @code{printf}
1291function is called and the stack cleaned. Note how @lightning{}
1292abstracts the differences between different architectures and
1293ABI's -- the client program does not know how parameter passing
1294works on the host architecture.
1295
1296@item jit_epilog();
1297Usually it is not required to call @code{epilog}, but because it
1298is implicitly called when noticing the end of a function, if the
1299@code{end} variable was set with a @code{note} call after the
1300@code{ret}, it would not consider the function epilog.
1301
1302@item myFunction((char*)jit_address(end) - (char*)jit_address(start));
1303This calls the generate jit function passing as argument the offset
1304difference from the @code{start} and @code{end} notes. The @code{address}
1305call must be done after the @code{emit} call or either a fatal error
1306will happen (if @lightning{} is built with assertions enable) or an
1307undefined value will be returned.
1308
1309@item jit_clear_state();
1310Note that @code{jit_clear_state} was called after executing jit in
1311this example. It was done because it must be called after any call
1312to @code{jit_address} or @code{jit_print}.
1313
1314@item jit_disassemble();
1315@code{disassemble} will dump the generated code to standard output,
1316unless @lightning{} was built with the disassembler disabled, in which
1317case no output will be shown.
1318@end table
1319
1320@node RPN calculator
1321@section A more complex example, an RPN calculator
1322
1323We create a small stack-based RPN calculator which applies a series
1324of operators to a given parameter and to other numeric operands.
1325Unlike previous examples, the code generator is fully parameterized
1326and is able to compile different formulas to different functions.
1327Here is the code for the expression compiler; a sample usage will
1328follow.
1329
1330Since @lightning{} does not provide push/pop instruction, this
1331example uses a stack-allocated area to store the data. Such an
1332area can be allocated using the macro @code{allocai}, which
1333receives the number of bytes to allocate and returns the offset
1334from the frame pointer register @code{FP} to the base of the
1335area.
1336
1337Usually, you will use the @code{ldxi} and @code{stxi} instruction
1338to access stack-allocated variables. However, it is possible to
1339use operations such as @code{add} to compute the address of the
1340variables, and pass the address around.
1341
1342@example
1343#include <stdio.h>
1344#include <lightning.h>
1345
1346typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1347
1348static jit_state_t *_jit;
1349
1350void stack_push(int reg, int *sp)
1351@{
1352 jit_stxi_i (*sp, JIT_FP, reg);
1353 *sp += sizeof (int);
1354@}
1355
1356void stack_pop(int reg, int *sp)
1357@{
1358 *sp -= sizeof (int);
1359 jit_ldxi_i (reg, JIT_FP, *sp);
1360@}
1361
1362jit_node_t *compile_rpn(char *expr)
1363@{
1364 jit_node_t *in, *fn;
1365 int stack_base, stack_ptr;
1366
1367 fn = jit_note(NULL, 0);
1368 jit_prolog();
1369 in = jit_arg();
1370 stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
1371
79bfeef6 1372 jit_getarg(JIT_R2, in);
4a71579b
PC
1373
1374 while (*expr) @{
1375 char buf[32];
1376 int n;
1377 if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
1378 expr += n - 1;
1379 stack_push(JIT_R0, &stack_ptr);
1380 jit_movi(JIT_R0, atoi(buf));
1381 @} else if (*expr == 'x') @{
1382 stack_push(JIT_R0, &stack_ptr);
1383 jit_movr(JIT_R0, JIT_R2);
1384 @} else if (*expr == '+') @{
1385 stack_pop(JIT_R1, &stack_ptr);
1386 jit_addr(JIT_R0, JIT_R1, JIT_R0);
1387 @} else if (*expr == '-') @{
1388 stack_pop(JIT_R1, &stack_ptr);
1389 jit_subr(JIT_R0, JIT_R1, JIT_R0);
1390 @} else if (*expr == '*') @{
1391 stack_pop(JIT_R1, &stack_ptr);
1392 jit_mulr(JIT_R0, JIT_R1, JIT_R0);
1393 @} else if (*expr == '/') @{
1394 stack_pop(JIT_R1, &stack_ptr);
1395 jit_divr(JIT_R0, JIT_R1, JIT_R0);
1396 @} else @{
1397 fprintf(stderr, "cannot compile: %s\n", expr);
1398 abort();
1399 @}
1400 ++expr;
1401 @}
1402 jit_retr(JIT_R0);
1403 jit_epilog();
1404 return fn;
1405@}
1406@end example
1407
1408The principle on which the calculator is based is easy: the stack top
1409is held in R0, while the remaining items of the stack are held in the
1410memory area that we allocate with @code{allocai}. Compiling a numeric
1411operand or the argument @code{x} pushes the old stack top onto the
1412stack and moves the operand into R0; compiling an operator pops the
1413second operand off the stack into R1, and compiles the operation so
1414that the result goes into R0, thus becoming the new stack top.
1415
1416This example allocates a fixed area for 32 @code{int}s. This is not
1417a problem when the function is a leaf like in this case; in a full-blown
1418compiler you will want to analyze the input and determine the number
1419of needed stack slots---a very simple example of register allocation.
1420The area is then managed like a stack using @code{stack_push} and
1421@code{stack_pop}.
1422
1423Source code for the client (which lies in the same source file) follows:
1424
1425@example
1426int main(int argc, char *argv[])
1427@{
1428 jit_node_t *nc, *nf;
1429 pifi c2f, f2c;
1430 int i;
1431
1432 init_jit(argv[0]);
1433 _jit = jit_new_state();
1434
1435 nc = compile_rpn("32x9*5/+");
1436 nf = compile_rpn("x32-5*9/");
1437 (void)jit_emit();
1438 c2f = (pifi)jit_address(nc);
1439 f2c = (pifi)jit_address(nf);
1440 jit_clear_state();
1441
1442 printf("\nC:");
1443 for (i = 0; i <= 100; i += 10) printf("%3d ", i);
1444 printf("\nF:");
1445 for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
1446 printf("\n");
1447
1448 printf("\nF:");
1449 for (i = 32; i <= 212; i += 18) printf("%3d ", i);
1450 printf("\nC:");
1451 for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i));
1452 printf("\n");
1453
1454 jit_destroy_state();
1455 finish_jit();
1456 return 0;
1457@}
1458@end example
1459
1460The client displays a conversion table between Celsius and Fahrenheit
1461degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
1462formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
1463respectively.
1464
1465Providing the formula as an argument to @code{compile_rpn} effectively
1466parameterizes code generation, making it possible to use the same code
1467to compile different functions; this is what makes dynamic code
1468generation so powerful.
1469
1470@node Fibonacci
1471@section Fibonacci numbers
1472
1473The code in this section calculates the Fibonacci sequence. That is
1474modeled by the recurrence relation:
1475@display
1476 f(0) = 0
1477 f(1) = f(2) = 1
1478 f(n) = f(n-1) + f(n-2)
1479@end display
1480
1481The purpose of this example is to introduce branches. There are two
1482kind of branches: backward branches and forward branches. We'll
1483present the calculation in a recursive and iterative form; the
1484former only uses forward branches, while the latter uses both.
1485
1486@example
1487#include <stdio.h>
1488#include <lightning.h>
1489
1490static jit_state_t *_jit;
1491
1492typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1493
1494int main(int argc, char *argv[])
1495@{
1496 pifi fib;
1497 jit_node_t *label;
1498 jit_node_t *call;
1499 jit_node_t *in; @rem{/* offset of the argument */}
1500 jit_node_t *ref; @rem{/* to patch the forward reference */}
1501 jit_node_t *zero; @rem{/* to patch the forward reference */}
1502
1503 init_jit(argv[0]);
1504 _jit = jit_new_state();
1505
1506 label = jit_label();
1507 jit_prolog ();
1508 in = jit_arg ();
1509 jit_getarg (JIT_V0, in); @rem{/* R0 = n */}
1510 zero = jit_beqi (JIT_R0, 0);
1511 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1512 jit_movi (JIT_R0, 1);
1513 ref = jit_blei (JIT_V0, 2);
1514 jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */}
1515 jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */}
1516 jit_prepare();
1517 jit_pushargr(JIT_V1);
1518 call = jit_finishi(NULL);
1519 jit_patch_at(call, label);
1520 jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */}
1521 jit_prepare();
1522 jit_pushargr(JIT_V2);
1523 call = jit_finishi(NULL);
1524 jit_patch_at(call, label);
1525 jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */}
1526 jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */}
1527
1528 jit_patch(ref); @rem{/* patch jump */}
1529 jit_patch(zero); @rem{/* patch jump */}
1530 jit_retr(JIT_R0);
1531
1532 @rem{/* call the generated code@comma{} passing 32 as an argument */}
1533 fib = jit_emit();
1534 jit_clear_state();
1535 printf("fib(%d) = %d\n", 32, fib(32));
1536 jit_destroy_state();
1537 finish_jit();
1538 return 0;
1539@}
1540@end example
1541
1542As said above, this is the first example of dynamically compiling
1543branches. Branch instructions have two operands containing the
1544values to be compared, and return a @code{jit_note_t *} object
1545to be patched.
1546
1547Because labels final address are only known after calling @code{emit},
1548it is required to call @code{patch} or @code{patch_at}, what does
1549tell @lightning{} that the target to patch is actually a pointer to
1550a @code{jit_node_t *} object, otherwise, it would assume that is
1551a pointer to a C function. Note that conditional branches do not
1552receive a label argument, so they must be patched.
1553
1554You need to call @code{patch_at} on the return of value @code{calli},
1555@code{finishi}, and @code{calli} if it is actually referencing a label
1556in the jit code. All branch instructions do not receive a label
1557argument. Note that @code{movi} is an special case, and patching it
1558is usually done to get the final address of a label, usually to later
1559call @code{jmpr}.
1560
1561Now, here is the iterative version:
1562
1563@example
1564#include <stdio.h>
1565#include <lightning.h>
1566
1567static jit_state_t *_jit;
1568
1569typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */}
1570
1571int main(int argc, char *argv[])
1572@{
1573 pifi fib;
1574 jit_node_t *in; @rem{/* offset of the argument */}
1575 jit_node_t *ref; @rem{/* to patch the forward reference */}
1576 jit_node_t *zero; @rem{/* to patch the forward reference */}
1577 jit_node_t *jump; @rem{/* jump to start of loop */}
1578 jit_node_t *loop; @rem{/* start of the loop */}
1579
1580 init_jit(argv[0]);
1581 _jit = jit_new_state();
1582
1583 jit_prolog ();
1584 in = jit_arg ();
1585 jit_getarg (JIT_R0, in); @rem{/* R0 = n */}
1586 zero = jit_beqi (JIT_R0, 0);
1587 jit_movr (JIT_R1, JIT_R0);
1588 jit_movi (JIT_R0, 1);
1589 ref = jit_blti (JIT_R1, 2);
1590 jit_subi (JIT_R2, JIT_R2, 2);
1591 jit_movr (JIT_R1, JIT_R0);
1592
1593 loop= jit_label();
1594 jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */}
1595 jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */
1596 jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */
1597 jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */
1598 jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */
1599 jit_patch_at(jump, loop);
1600
1601 jit_patch(ref); @rem{/* patch forward jump */}
1602 jit_patch(zero); @rem{/* patch forward jump */}
1603 jit_retr (JIT_R0);
1604
1605 @rem{/* call the generated code@comma{} passing 36 as an argument */}
1606 fib = jit_emit();
1607 jit_clear_state();
1608 printf("fib(%d) = %d\n", 36, fib(36));
1609 jit_destroy_state();
1610 finish_jit();
1611 return 0;
1612@}
1613@end example
1614
1615This code calculates the recurrence relation using iteration (a
1616@code{for} loop in high-level languages). There are no function
1617calls anymore: instead, there is a backward jump (the @code{bnei} at
1618the end of the loop).
1619
1620Note that the program must remember the address for backward jumps;
1621for forward jumps it is only required to remember the jump code,
1622and call @code{patch} for the implicit label.
1623
1624@node Reentrancy
1625@chapter Re-entrant usage of @lightning{}
1626
1627@lightning{} uses the special @code{_jit} identifier. To be able
1628to be able to use multiple jit generation states at the same
1629time, it is required to used code similar to:
1630
1631@example
1632 struct jit_state lightning;
1633 #define lightning _jit
1634@end example
1635
1636This will cause the symbol defined to @code{_jit} to be passed as
1637the first argument to the underlying @lightning{} implementation,
1638that is usually a function with an @code{_} (underscode) prefix
1639and with an argument named @code{_jit}, in the pattern:
1640
1641@example
1642 static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t);
1643 #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v);
1644@end example
1645
1646The reason for this is to use the same syntax as the initial lightning
1647implementation and to avoid needing the user to keep adding an extra
1648argument to every call, as multiple jit states generating code in
1649paralell should be very uncommon.
1650
519a9ea1 1651@node Registers
4a71579b
PC
1652@chapter Accessing the whole register file
1653
1654As mentioned earlier in this chapter, all @lightning{} back-ends are
1655guaranteed to have at least six general-purpose integer registers and
1656six floating-point registers, but many back-ends will have more.
1657
1658To access the entire register files, you can use the
1659@code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They
1660accept a parameter that identifies the register number, which
1661must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
1662and @code{JIT_F_NUM} respectively; the number need not be
1663constant. Of course, expressions like @code{JIT_R0} and
1664@code{JIT_R(0)} denote the same register, and likewise for
1665integer callee-saved, or floating-point, registers.
1666
519a9ea1
PC
1667@section Scratch registers
1668
1669For operations, @lightning{} does not support directly, like storing
1670a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to
1671acquire and release a scratch register as in the following pattern:
1672
1673@example
1674 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1675 jit_movi (reg, immediate);
1676 jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg);
1677 jit_unget_reg (reg);
1678@end example
1679
1680As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and
1681reloads but don't follow branches, the code between both must be in
1682the same basic block and must not contain any branches as in the
1683following (bad) example.
1684
1685@example
1686 jit_int32_t reg = jit_get_reg (jit_class_gpr);
1687 jit_ldxi (reg, JIT_V0, offset);
1688 jump = jit_bnei (reg, V0);
1689 jit_movr (JIT_V1, reg);
1690 jit_patch (jump);
1691 jit_unget_reg (reg);
1692@end example
1693
4a71579b
PC
1694@node Customizations
1695@chapter Customizations
1696
1697Frequently it is desirable to have more control over how code is
1698generated or how memory is used during jit generation or execution.
1699
1700@section Memory functions
1701To aid in complete control of memory allocation and deallocation
1702@lightning{} provides wrappers that default to standard @code{malloc},
1703@code{realloc} and @code{free}. These are loosely based on the
1704GNU GMP counterparts, with the difference that they use the same
1705prototype of the system allocation functions, that is, no @code{size}
1706for @code{free} or @code{old_size} for @code{realloc}.
1707
1708@deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *))
1709@lightning{} guarantees that memory is only allocated or released
1710using these wrapped functions, but you must note that if lightning
1711was linked to GNU binutils, malloc is probably will be called multiple
1712times from there when initializing the disassembler.
1713
1714Because @code{init_jit} may call memory functions, if you need to call
1715@code{jit_set_memory_functions}, it must be called before @code{init_jit},
1716otherwise, when calling @code{finish_jit}, a pointer allocated with the
1717previous or default wrappers will be passed.
1718@end deftypefun
1719
1720@deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *))
1721Get the current memory allocation function. Also, unlike the GNU GMP
1722counterpart, it is an error to pass @code{NULL} pointers as arguments.
1723@end deftypefun
1724
79bfeef6
PC
1725@section Protection
1726Unless an alternate code buffer is used (see below), @code{jit_emit}
1727set the access protections that the code buffer's memory can be read and
1728executed, but not modified. One can use the following functions after
1729@code{jit_emit} but before @code{jit_clear} to temporarily lift the
1730protection:
1731
1732@deftypefun void jit_unprotect ()
1733Changes the access protection that the code buffer's memory can be read and
1734modified. Before the emitted code can be invoked, @code{jit_protect}
1735has to be called to reset the change.
1736
1737This procedure has no effect when an alternate code buffer (see below) is used.
1738@end deftypefun
1739
1740@deftypefun void jit_protect ()
1741Changes the access protection that the code buffer's memory can be read and
1742executed.
1743
1744This procedure has no effect when an alternate code buffer (see below) is used.
1745@end deftypefun
1746
4a71579b
PC
1747@section Alternate code buffer
1748To instruct @lightning{} to use an alternate code buffer it is required
1749to call @code{jit_realize} before @code{jit_emit}, and then query states
1750and customize as appropriate.
1751
1752@deftypefun void jit_realize ()
1753Must be called once, before @code{jit_emit}, to instruct @lightning{}
1754that no other @code{jit_xyz} call will be made.
1755@end deftypefun
1756
1757@deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size})
1758Returns NULL or the previous value set with @code{jit_set_code}, and
1759sets the @var{code_size} argument to an appropriate value.
1760If @code{jit_get_code} is called before @code{jit_emit}, the
1761@var{code_size} argument is set to the expected amount of bytes
1762required to generate code.
1763If @code{jit_get_code} is called after @code{jit_emit}, the
1764@var{code_size} argument is set to the exact amount of bytes used
1765by the code.
1766@end deftypefun
1767
1768@deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size})
1769Instructs @lightning{} to output to the @var{code} argument and
1770use @var{size} as a guard to not write to invalid memory. If during
1771@code{jit_emit} @lightning{} finds out that the code would not fit
1772in @var{size} bytes, it halts code emit and returns @code{NULL}.
1773@end deftypefun
1774
1775A simple example of a loop using an alternate buffer is:
1776
1777@example
1778 jit_uint8_t *code;
1779 int *(func)(int); @rem{/* function pointer */}
1780 jit_word_t code_size;
1781 jit_word_t real_code_size;
1782 @rem{...}
1783 jit_realize(); @rem{/* ready to generate code */}
1784 jit_get_code(&code_size); @rem{/* get expected code size */}
1785 code_size = (code_size + 4095) & -4096;
1786 do (;;) @{
1787 code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE,
1788 MAP_PRIVATE | MAP_ANON, -1, 0);
1789 jit_set_code(code, code_size);
1790 if ((func = jit_emit()) == NULL) @{
1791 munmap(code, code_size);
1792 code_size += 4096;
1793 @}
1794 @} while (func == NULL);
1795 jit_get_code(&real_code_size); @rem{/* query exact size of the code */}
1796@end example
1797
1798The first call to @code{jit_get_code} should return @code{NULL} and set
1799the @code{code_size} argument to the expected amount of bytes required
1800to emit code.
1801The second call to @code{jit_get_code} is after a successful call to
1802@code{jit_emit}, and will return the value previously set with
1803@code{jit_set_code} and set the @code{real_code_size} argument to the
1804exact amount of bytes used to emit the code.
1805
1806@section Alternate data buffer
1807Sometimes it may be desirable to customize how, or to prevent
1808@lightning{} from using an extra buffer for constants or debug
1809annotation. Usually when also using an alternate code buffer.
1810
1811@deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size})
1812Returns @code{NULL} or the previous value set with @code{jit_set_data},
1813and sets the @var{data_size} argument to how many bytes are required
1814for the constants data buffer, and @var{note_size} to how many bytes
1815are required to store the debug note information.
1816Note that it always preallocate one debug note entry even if
1817@code{jit_name} or @code{jit_note} are never called, but will return
1818zero in the @var{data_size} argument if no constant is required;
1819constants are only used for the @code{float} and @code{double} operations
1820that have an immediate argument, and not in all @lightning{} ports.
1821@end deftypefun
1822
1823@deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags})
1824
1825@var{data} can be NULL if disabling constants and annotations, otherwise,
1826a valid pointer must be passed. An assertion is done that the data will
1827fit in @var{size} bytes (but that is a noop if @lightning{} was built
1828with @code{-DNDEBUG}).
1829
1830@var{size} tells the space in bytes available in @var{data}.
1831
1832@var{flags} can be zero to tell to just use the alternate data buffer,
1833or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE}
1834
1835@table @t
1836@item JIT_DISABLE_DATA
1837@cindex JIT_DISABLE_DATA
1838Instructs @lightning{} to not use a constant table, but to use an
1839alternate method to synthesize those, usually with a larger code
1840sequence using stack space to transfer the value from a GPR to a
1841FPR register.
1842
1843@item JIT_DISABLE_NOTE
1844@cindex JIT_DISABLE_NOTE
1845Instructs @lightning{} to not store file or function name, and
1846line numbers in the constant buffer.
1847@end table
1848@end deftypefun
1849
1850A simple example of a preventing usage of a data buffer is:
1851
1852@example
1853 @rem{...}
1854 jit_realize(); @rem{/* ready to generate code */}
1855 jit_get_data(NULL, NULL);
1856 jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE);
1857 @rem{...}
1858@end example
1859
1860Or to only use a data buffer, if required:
1861
1862@example
1863 jit_uint8_t *data;
1864 jit_word_t data_size;
1865 @rem{...}
1866 jit_realize(); @rem{/* ready to generate code */}
1867 jit_get_data(&data_size, NULL);
1868 if (data_size)
1869 data = malloc(data_size);
1870 else
1871 data = NULL;
1872 jit_set_data(data, data_size, JIT_DISABLE_NOTE);
1873 @rem{...}
1874 if (data)
1875 free(data);
1876 @rem{...}
1877@end example
1878
1879@node Acknowledgements
1880@chapter Acknowledgements
1881
1882As far as I know, the first general-purpose portable dynamic code
1883generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting.
1884Further work by Dawson R. Engler resulted in the @sc{vcode} system;
1885unlike @sc{dcg}, @sc{vcode} used no intermediate representation and
1886directly inspired @lightning{}.
1887
1888Thanks go to Ian Piumarta, who kindly accepted to release his own
1889program @sc{ccg} under the GNU General Public License, thereby allowing
1890@lightning{} to use the run-time assemblers he had wrote for @sc{ccg}.
1891@sc{ccg} provides a way of dynamically assemble programs written in the
1892underlying architecture's assembly language. So it is not portable,
1893yet very interesting.
1894
1895I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{}
1896was first developed as a tool to be used in GNU Smalltalk's dynamic
1897translator from bytecodes to native code.