Commit | Line | Data |
---|---|---|
4a71579b PC |
1 | @ifnottex |
2 | @dircategory Software development | |
3 | @direntry | |
4 | * lightning: (lightning). Library for dynamic code generation. | |
5 | @end direntry | |
6 | @end ifnottex | |
7 | ||
8 | @ifnottex | |
9 | @node Top | |
10 | @top @lightning{} | |
11 | ||
12 | @iftex | |
13 | @macro comma | |
14 | @verbatim{|,|} | |
15 | @end macro | |
16 | @end iftex | |
17 | ||
18 | @ifnottex | |
19 | @macro comma | |
20 | @verb{|,|} | |
21 | @end macro | |
22 | @end ifnottex | |
23 | ||
24 | This document describes @value{TOPIC} the @lightning{} library for | |
25 | dynamic code generation. | |
26 | ||
27 | @menu | |
28 | * Overview:: What GNU lightning is | |
29 | * Installation:: Configuring and installing GNU lightning | |
30 | * The instruction set:: The RISC instruction set used in GNU lightning | |
31 | * GNU lightning examples:: GNU lightning's examples | |
32 | * Reentrancy:: Re-entrant usage of GNU lightning | |
519a9ea1 | 33 | * Registers:: Accessing the whole register file |
4a71579b PC |
34 | * Customizations:: Advanced code generation customizations |
35 | * Acknowledgements:: Acknowledgements for GNU lightning | |
36 | @end menu | |
37 | @end ifnottex | |
38 | ||
39 | @node Overview | |
40 | @chapter Introduction to @lightning{} | |
41 | ||
42 | @iftex | |
43 | This document describes @value{TOPIC} the @lightning{} library for | |
44 | dynamic code generation. | |
45 | @end iftex | |
46 | ||
519a9ea1 PC |
47 | Dynamic code generation is the generation of machine code |
48 | at runtime. It is typically used to strip a layer of interpretation | |
4a71579b PC |
49 | by allowing compilation to occur at runtime. One of the most |
50 | well-known applications of dynamic code generation is perhaps that | |
51 | of interpreters that compile source code to an intermediate bytecode | |
52 | form, which is then recompiled to machine code at run-time: this | |
53 | approach effectively combines the portability of bytecode | |
54 | representations with the speed of machine code. Another common | |
55 | application of dynamic code generation is in the field of hardware | |
56 | simulators and binary emulators, which can use the same techniques | |
519a9ea1 | 57 | to translate simulated instructions to the instructions of the |
4a71579b PC |
58 | underlying machine. |
59 | ||
60 | Yet other applications come to mind: for example, windowing | |
61 | @dfn{bitblt} operations, matrix manipulations, and network packet | |
62 | filters. Albeit very powerful and relatively well known within the | |
63 | compiler community, dynamic code generation techniques are rarely | |
64 | exploited to their full potential and, with the exception of the | |
65 | two applications described above, have remained curiosities because | |
66 | of their portability and functionality barriers: binary instructions | |
67 | are generated, so programs using dynamic code generation must be | |
68 | retargeted for each machine; in addition, coding a run-time code | |
69 | generator is a tedious and error-prone task more than a difficult one. | |
70 | ||
71 | @lightning{} provides a portable, fast and easily retargetable dynamic | |
519a9ea1 | 72 | code generation system. |
4a71579b PC |
73 | |
74 | To be portable, @lightning{} abstracts over current architectures' | |
75 | quirks and unorthogonalities. The interface that it exposes to is that | |
76 | of a standardized RISC architecture loosely based on the SPARC and MIPS | |
77 | chips. There are a few general-purpose registers (six, not including | |
78 | those used to receive and pass parameters between subroutines), and | |
79 | arithmetic operations involve three operands---either three registers | |
80 | or two registers and an arbitrarily sized immediate value. | |
81 | ||
82 | On one hand, this architecture is general enough that it is possible to | |
83 | generate pretty efficient code even on CISC architectures such as the | |
84 | Intel x86 or the Motorola 68k families. On the other hand, it matches | |
85 | real architectures closely enough that, most of the time, the | |
86 | compiler's constant folding pass ends up generating code which | |
87 | assembles machine instructions without further tests. | |
88 | ||
89 | @node Installation | |
90 | @chapter Configuring and installing @lightning{} | |
91 | ||
40a44dcb PC |
92 | Here we will assume that your system already has the dependencies |
93 | necessary to build @lightning{}. For more on dependencies, see | |
94 | @lightning{}'s @file{README-hacking} file. | |
95 | ||
96 | The first thing to do to build @lightning{} is to configure the | |
4a71579b PC |
97 | program, picking the set of macros to be used on the host |
98 | architecture; this configuration is automatically performed by | |
99 | the @file{configure} shell script; to run it, merely type: | |
100 | @example | |
101 | ./configure | |
102 | @end example | |
103 | ||
79bfeef6 PC |
104 | The @file{configure} accepts the @code{--enable-disassembler} option, |
105 | hat enables linking to GNU binutils and optionally print human readable | |
4a71579b PC |
106 | disassembly of the jit code. This option can be disabled by the |
107 | @code{--disable-disassembler} option. | |
108 | ||
79bfeef6 PC |
109 | @file{configure} also accepts the @code{--enable-devel-disassembler}, |
110 | option useful to check exactly hat machine instructions were generated | |
111 | for a @lightning{} instrction. Basically mixing @code{jit_print} and | |
112 | @code{jit_disassembly}. | |
113 | ||
114 | The @code{--enable-assertions} option, which enables several consistency | |
115 | hecks in the run-time assemblers. These are not usually needed, so you | |
116 | can decide to simply forget about it; also remember that these consistency | |
4a71579b PC |
117 | checks tend to slow down your code generator. |
118 | ||
79bfeef6 PC |
119 | The @code{--enable-devel-strong-type-checking} option that does extra type |
120 | checking using @code{assert}. This option also enables the | |
121 | @code{--enable-assertions} unless it is explicitly disabled. | |
122 | ||
123 | The option @code{--enable-devel-get-jit-size} should only be used | |
124 | when doing updates or maintenance to lightning. It regenerates the | |
125 | @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when | |
126 | translating a @lightning{} instruction to machine code. | |
127 | ||
4a71579b PC |
128 | After you've configured @lightning{}, run @file{make} as usual. |
129 | ||
130 | @lightning{} has an extensive set of tests to validate it is working | |
131 | correctly in the build host. To test it run: | |
132 | @example | |
133 | make check | |
134 | @end example | |
135 | ||
136 | The next important step is: | |
137 | @example | |
138 | make install | |
139 | @end example | |
140 | ||
141 | This ends the process of installing @lightning{}. | |
142 | ||
143 | @node The instruction set | |
144 | @chapter @lightning{}'s instruction set | |
145 | ||
146 | @lightning{}'s instruction set was designed by deriving instructions | |
147 | that closely match those of most existing RISC architectures, or | |
148 | that can be easily syntesized if absent. Each instruction is composed | |
149 | of: | |
150 | @itemize @bullet | |
151 | @item | |
152 | an operation, like @code{sub} or @code{mul} | |
153 | ||
154 | @item | |
155 | most times, a register/immediate flag (@code{r} or @code{i}) | |
156 | ||
157 | @item | |
158 | an unsigned modifier (@code{u}), a type identifier or two, when applicable. | |
159 | @end itemize | |
160 | ||
161 | Examples of legal mnemonics are @code{addr} (integer add, with three | |
162 | register operands) and @code{muli} (integer multiply, with two | |
163 | register operands and an immediate operand). Each instruction takes | |
164 | two or three operands; in most cases, one of them can be an immediate | |
165 | value instead of a register. | |
166 | ||
167 | Most @lightning{} integer operations are signed wordsize operations, | |
168 | with the exception of operations that convert types, or load or store | |
169 | values to/from memory. When applicable, the types and C types are as | |
170 | follow: | |
171 | ||
172 | @example | |
173 | _c @r{signed char} | |
174 | _uc @r{unsigned char} | |
175 | _s @r{short} | |
176 | _us @r{unsigned short} | |
177 | _i @r{int} | |
178 | _ui @r{unsigned int} | |
179 | _l @r{long} | |
180 | _f @r{float} | |
181 | _d @r{double} | |
182 | @end example | |
183 | ||
184 | Most integer operations do not need a type modifier, and when loading or | |
185 | storing values to memory there is an alias to the proper operation | |
186 | using wordsize operands, that is, if ommited, the type is @r{int} on | |
187 | 32-bit architectures and @r{long} on 64-bit architectures. Note | |
188 | that lightning also expects @code{sizeof(void*)} to match the wordsize. | |
189 | ||
190 | When an unsigned operation result differs from the equivalent signed | |
191 | operation, there is a the @code{_u} modifier. | |
192 | ||
193 | There are at least seven integer registers, of which six are | |
194 | general-purpose, while the last is used to contain the frame pointer | |
195 | (@code{FP}). The frame pointer can be used to allocate and access local | |
196 | variables on the stack, using the @code{allocai} or @code{allocar} | |
197 | instruction. | |
198 | ||
199 | Of the general-purpose registers, at least three are guaranteed to be | |
200 | preserved across function calls (@code{V0}, @code{V1} and | |
201 | @code{V2}) and at least three are not (@code{R0}, @code{R1} and | |
202 | @code{R2}). Six registers are not very much, but this | |
203 | restriction was forced by the need to target CISC architectures | |
204 | which, like the x86, are poor of registers; anyway, backends can | |
205 | specify the actual number of available registers with the calls | |
206 | @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM} | |
207 | (for callee-save registers). | |
208 | ||
209 | There are at least six floating-point registers, named @code{F0} to | |
210 | @code{F5}. These are usually caller-save and are separate from the integer | |
211 | registers on the supported architectures; on Intel architectures, | |
212 | in 32 bit mode if SSE2 is not available or use of X87 is forced, | |
213 | the register stack is mapped to a flat register file. As for the | |
214 | integer registers, the macro @code{JIT_F_NUM} yields the number of | |
215 | floating-point registers. | |
216 | ||
217 | The complete instruction set follows; as you can see, most non-memory | |
218 | operations only take integers (either signed or unsigned) as operands; | |
219 | this was done in order to reduce the instruction set, and because most | |
220 | architectures only provide word and long word operations on registers. | |
221 | There are instructions that allow operands to be extended to fit a larger | |
222 | data type, both in a signed and in an unsigned way. | |
223 | ||
224 | @table @b | |
225 | @item Binary ALU operations | |
226 | These accept three operands; the last one can be an immediate. | |
227 | @code{addx} operations must directly follow @code{addc}, and | |
228 | @code{subx} must follow @code{subc}; otherwise, results are undefined. | |
229 | Most, if not all, architectures do not support @r{float} or @r{double} | |
230 | immediate operands; lightning emulates those operations by moving the | |
231 | immediate to a temporary register and emiting the call with only | |
232 | register operands. | |
233 | @example | |
234 | addr _f _d O1 = O2 + O3 | |
235 | addi _f _d O1 = O2 + O3 | |
236 | addxr O1 = O2 + (O3 + carry) | |
237 | addxi O1 = O2 + (O3 + carry) | |
238 | addcr O1 = O2 + O3, set carry | |
239 | addci O1 = O2 + O3, set carry | |
240 | subr _f _d O1 = O2 - O3 | |
241 | subi _f _d O1 = O2 - O3 | |
242 | subxr O1 = O2 - (O3 + carry) | |
243 | subxi O1 = O2 - (O3 + carry) | |
244 | subcr O1 = O2 - O3, set carry | |
245 | subci O1 = O2 - O3, set carry | |
246 | rsbr _f _d O1 = O3 - O1 | |
247 | rsbi _f _d O1 = O3 - O1 | |
248 | mulr _f _d O1 = O2 * O3 | |
249 | muli _f _d O1 = O2 * O3 | |
ba86ff93 PC |
250 | hmulr _u O1 = ((O2 * O3) >> WORDSIZE) |
251 | hmuli _u O1 = ((O2 * O3) >> WORDSIZE) | |
4a71579b PC |
252 | divr _u _f _d O1 = O2 / O3 |
253 | divi _u _f _d O1 = O2 / O3 | |
254 | remr _u O1 = O2 % O3 | |
255 | remi _u O1 = O2 % O3 | |
256 | andr O1 = O2 & O3 | |
257 | andi O1 = O2 & O3 | |
258 | orr O1 = O2 | O3 | |
259 | ori O1 = O2 | O3 | |
260 | xorr O1 = O2 ^ O3 | |
261 | xori O1 = O2 ^ O3 | |
262 | lshr O1 = O2 << O3 | |
263 | lshi O1 = O2 << O3 | |
264 | rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.} | |
265 | rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.} | |
ba86ff93 PC |
266 | lrotr O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3)) |
267 | lroti O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3)) | |
268 | rrotr O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3)) | |
269 | rroti O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3)) | |
1f22b268 PC |
270 | movzr O1 = O3 ? O1 : O2 |
271 | movnr O1 = O3 ? O2 : O1 | |
4a71579b PC |
272 | @end example |
273 | ||
ba86ff93 PC |
274 | Note that @code{lrotr}, @code{lroti}, @code{rrotr} and @code{rroti} |
275 | are described as the fallback operation. These are bit shift/rotation | |
276 | operation. | |
277 | ||
4a71579b PC |
278 | @item Four operand binary ALU operations |
279 | These accept two result registers, and two operands; the last one can | |
280 | be an immediate. The first two arguments cannot be the same register. | |
281 | ||
282 | @code{qmul} stores the low word of the result in @code{O1} and the | |
283 | high word in @code{O2}. For unsigned multiplication, @code{O2} zero | |
284 | means there was no overflow. For signed multiplication, no overflow | |
285 | check is based on sign, and can be detected if @code{O2} is zero or | |
286 | minus one. | |
287 | ||
288 | @code{qdiv} stores the quotient in @code{O1} and the remainder in | |
289 | @code{O2}. It can be used as quick way to check if a division is | |
290 | exact, in which case the remainder is zero. | |
291 | ||
ba86ff93 PC |
292 | @code{qlsh} shifts from 0 to @emph{wordsize}, doing a normal left |
293 | shift for the first result register and setting the second result | |
294 | resister to the overflow bits. @code{qlsh} can be used as a quick | |
295 | way to multiply by powers of two. | |
296 | ||
297 | @code{qrsh} shifts from 0 to @emph{wordsize}, doing a normal right | |
298 | shift for the first result register and setting the second result | |
299 | register to the overflow bits. @code{qrsh} can be used as a quick | |
300 | way to divide by powers of two. | |
301 | ||
302 | Note that @code{qlsh} and @code{qrsh} are basically implemented as | |
303 | two shifts. It is undefined behavior to pass a value not in the range | |
304 | 0 to @emph{wordsize}. Most cpus will usually @code{and} the shift | |
305 | amount with @emph{wordsize} - 1, or possible use the @emph{remainder}. | |
306 | @lightning{} only generates code to specially handle 0 and @emph{wordsize} | |
307 | shifts. Since in a code generator for a @emph{safe language} should | |
308 | usually check the shift amount, these instructions usually should be | |
309 | used as a fast path to check for division without remainder or | |
310 | multiplication that does not overflow. | |
311 | ||
4a71579b PC |
312 | @example |
313 | qmulr _u O1 O2 = O3 * O4 | |
314 | qmuli _u O1 O2 = O3 * O4 | |
315 | qdivr _u O1 O2 = O3 / O4 | |
316 | qdivi _u O1 O2 = O3 / O4 | |
ba86ff93 PC |
317 | qlshr _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4) |
318 | qlshi _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4) | |
319 | qrshr _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4) | |
320 | qrshi _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4) | |
4a71579b PC |
321 | @end example |
322 | ||
ba86ff93 PC |
323 | These four operand ALU operations are only defined for float operands. |
324 | ||
325 | @example | |
326 | fmar _f _d O1 = O2 * O3 + O4 | |
327 | fmai _f _d O1 = O2 * O3 + O4 | |
328 | fmsr _f _d O1 = O2 * O3 - O4 | |
329 | fmsi _f _d O1 = O2 * O3 - O4 | |
330 | fnmar _f _d O1 = -O2 * O3 - O4 | |
331 | fnmai _f _d O1 = -O2 * O3 - O4 | |
332 | fnmsr _f _d O1 = -O2 * O3 + O4 | |
333 | fnmsi _f _d O1 = -O2 * O3 + O4 | |
334 | @end example | |
335 | ||
336 | These are a family of fused multiply-add instructions. | |
337 | Note that @lightning{} does not handle rounding modes nor math exceptions. | |
338 | Also note that not all backends provide a instruction for the equivalent | |
339 | @lightning{} instruction presented above. Some are completely implemented | |
340 | as fallbacks and some are composed of one or more instructions. For common | |
341 | input this should not cause major issues, but note that when implemented by | |
342 | the cpu, these are implemented as the multiplication calculated with infinite | |
343 | precision, and after the addition step rounding is done. Due to this, For | |
344 | specially crafted input different ports might show different output. When | |
345 | implemented by the CPU, it is also possible to have exceptions that do | |
346 | not happen if implemented as a fallback. | |
347 | ||
4a71579b | 348 | @item Unary ALU operations |
ba86ff93 PC |
349 | These accept two operands, the first must be a register and the |
350 | second is a register if the @code{r} modifier is used, otherwise, | |
351 | the @code{i} modifier is used and the second argument is a constant. | |
352 | ||
4a71579b PC |
353 | @example |
354 | negr _f _d O1 = -O2 | |
ba86ff93 | 355 | negi _f _d O1 = -O2 |
4a71579b | 356 | comr O1 = ~O2 |
ba86ff93 PC |
357 | comi O1 = ~O2 |
358 | clor O1 = number of leading one bits in O2 | |
359 | cloi O1 = number of leading one bits in O2 | |
360 | clzr O1 = number of leading zero bits in O2 | |
361 | clzi O1 = number of leading zero bits in O2 | |
362 | ctor O1 = number of trailing one bits in O2 | |
363 | ctoi O1 = number of trailing one bits in O2 | |
364 | ctzr O1 = number of trailing zero bits in O2 | |
365 | ctzi O1 = number of trailing zero bits in O2 | |
366 | rbitr O1 = bits of O2 reversed | |
367 | rbiti O1 = bits of O2 reversed | |
368 | popcntr O1 = number of bits set in O2 | |
369 | popcnti O1 = number of bits set in O2 | |
4a71579b PC |
370 | @end example |
371 | ||
79bfeef6 PC |
372 | Note that @code{ctzr} is basically equivalent of a @code{C} call |
373 | @code{ffs} but indexed at bit zero, not one. | |
374 | ||
375 | Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input | |
376 | value of zero is not an error, it just returns the number of bits | |
377 | in a word, 64 if @lightning{} generates 64 bit instructions, otherwise | |
378 | it returns 32. | |
379 | ||
380 | The @code{clor} and @code{ctor} are just counterparts of the versions | |
381 | that search for zero bits. | |
382 | ||
4a71579b | 383 | These unary ALU operations are only defined for float operands. |
ba86ff93 | 384 | |
4a71579b PC |
385 | @example |
386 | absr _f _d O1 = fabs(O2) | |
ba86ff93 | 387 | absi _f _d O1 = fabs(O2) |
79bfeef6 | 388 | sqrtr _f _d O1 = sqrt(O2) |
ba86ff93 | 389 | sqrti _f _d O1 = sqrt(O2) |
4a71579b PC |
390 | @end example |
391 | ||
ba86ff93 PC |
392 | Note that for @code{float} and @code{double} unary operations, @lightning{} |
393 | will generate code to actually execute the operation at runtime. | |
4a71579b PC |
394 | |
395 | @item Compare instructions | |
396 | These accept three operands; again, the last can be an immediate. | |
397 | The last two operands are compared, and the first operand, that must be | |
398 | an integer register, is set to either 0 or 1, according to whether the | |
399 | given condition was met or not. | |
400 | ||
401 | The conditions given below are for the standard behavior of C, | |
402 | where the ``unordered'' comparison result is mapped to false. | |
403 | ||
404 | @example | |
405 | ltr _u _f _d O1 = (O2 < O3) | |
406 | lti _u _f _d O1 = (O2 < O3) | |
407 | ler _u _f _d O1 = (O2 <= O3) | |
408 | lei _u _f _d O1 = (O2 <= O3) | |
409 | gtr _u _f _d O1 = (O2 > O3) | |
410 | gti _u _f _d O1 = (O2 > O3) | |
411 | ger _u _f _d O1 = (O2 >= O3) | |
412 | gei _u _f _d O1 = (O2 >= O3) | |
413 | eqr _f _d O1 = (O2 == O3) | |
414 | eqi _f _d O1 = (O2 == O3) | |
415 | ner _f _d O1 = (O2 != O3) | |
416 | nei _f _d O1 = (O2 != O3) | |
417 | unltr _f _d O1 = !(O2 >= O3) | |
418 | unler _f _d O1 = !(O2 > O3) | |
419 | ungtr _f _d O1 = !(O2 <= O3) | |
420 | unger _f _d O1 = !(O2 < O3) | |
421 | uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3) | |
422 | ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3) | |
423 | ordr _f _d O1 = (O2 == O2) && (O3 == O3) | |
424 | unordr _f _d O1 = (O2 != O2) || (O3 != O3) | |
425 | @end example | |
426 | ||
427 | @item Transfer operations | |
428 | These accept two operands; for @code{ext} both of them must be | |
429 | registers, while @code{mov} accepts an immediate value as the second | |
430 | operand. | |
431 | ||
432 | Unlike @code{movr} and @code{movi}, the other instructions are used | |
433 | to truncate a wordsize operand to a smaller integer data type or to | |
434 | convert float data types. You can also use @code{extr} to convert an | |
435 | integer to a floating point value: the usual options are @code{extr_f} | |
436 | and @code{extr_d}. | |
437 | ||
438 | @example | |
439 | movr _f _d O1 = O2 | |
440 | movi _f _d O1 = O2 | |
441 | extr _c _uc _s _us _i _ui _f _d O1 = O2 | |
442 | truncr _f _d O1 = trunc(O2) | |
ba86ff93 PC |
443 | extr O1 = sign_extend(O2[O3:O3+04]) |
444 | extr_u O1 = O2[O3:O3+04] | |
445 | depr O1[O3:O3+O4] = O2 | |
446 | @end example | |
447 | ||
448 | @code{extr}, @code{extr_u} and @code{depr} are useful to access @code{C} | |
449 | compatible bit fields, provided that these are contained in a machine | |
450 | word. @code{extr} is used to @emph{extract} and signed extend a value | |
451 | from a bit field. @code{extr_u} is used to @emph{extract} and zero | |
452 | extend a value from a bit field. @code{depr} is used to @emph{deposit} | |
453 | a value into a bit field. | |
454 | ||
455 | @example | |
456 | extr(result, source, offset, length) | |
457 | extr_u(result, source, offset, length) | |
458 | depr(result, source, offset, length) | |
459 | @end example | |
460 | ||
461 | A common way to declare @code{C} and @lightning{} compatible bit fields is: | |
462 | @example | |
463 | union @{ | |
464 | struct @{ | |
465 | jit_word_t signed_bits: @code{length}; | |
466 | jit_uword_t unsigned_bits: @code{length}; | |
467 | ... | |
468 | @} s; | |
469 | jit_word_t signed_value; | |
470 | jit_uword_t unsigned_value; | |
471 | @} u; | |
4a71579b PC |
472 | @end example |
473 | ||
474 | In 64-bit architectures it may be required to use @code{truncr_f_i}, | |
475 | @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match | |
476 | the equivalent C code. Only the @code{_i} modifier is available in | |
477 | 32-bit architectures. | |
478 | ||
479 | @example | |
ba86ff93 PC |
480 | truncr_f_i <int> O1 = <float> O2 |
481 | truncr_f_l <long>O1 = <float> O2 | |
482 | truncr_d_i <int> O1 = <double>O2 | |
483 | truncr_d_l <long>O1 = <double>O2 | |
4a71579b PC |
484 | @end example |
485 | ||
486 | The float conversion operations are @emph{destination first, | |
487 | source second}, but the order of the types is reversed. This happens | |
488 | for historical reasons. | |
489 | ||
490 | @example | |
ba86ff93 PC |
491 | extr_f_d <double>O1 = <float> O2 |
492 | extr_d_f <float> O1 = <double>O2 | |
493 | @end example | |
494 | ||
495 | The float to/from integer transfer operations are also @emph{destination | |
496 | first, source second}. These were added later, but follow the pattern | |
497 | of historic patterns. | |
498 | ||
499 | @example | |
500 | movr_w_f <float>O1 = <int>O2 | |
501 | movi_w_f <float>O1 = <int>O2 | |
502 | movr_f_w <int>O1 = <float>O2 | |
503 | movi_f_w <int>O1 = <float>O2 | |
504 | movr_w_d <double>O1 = <long>O2 | |
505 | movi_w_d <double>O1 = <long>O2 | |
506 | movr_d_w <long>O1 = <double>O2 | |
507 | movi_d_w <long>O1 = <double>O2 | |
508 | movr_ww_d <double>O1 = [<int>O2:<int>O3] | |
509 | movi_ww_d <double>O1 = [<int>O2:<int>O3] | |
510 | movr_d_ww [<int>O1:<int>O2] = <double>O3 | |
511 | movi_d_ww [<int>O1:<int>O2] = <double>O3 | |
4a71579b PC |
512 | @end example |
513 | ||
ba86ff93 PC |
514 | These are used to transfer bits to/from floats to/from integers, and are |
515 | useful to access bits of floating point values. | |
516 | ||
517 | @code{movr_w_d}, @code{movi_w_d}, @code{movr_d_w} and @code{movi_d_w} are | |
518 | only available in 64-bit. Conversely, @code{movr_ww_d}, @code{movi_ww_d}, | |
519 | @code{movr_d_ww} and @code{movi_d_ww} are only available in 32-bit. | |
520 | For the int pair to/from double transfers, integer arguments must respect | |
521 | endianess, to match how the cpu handles the verbatim byte values. | |
522 | ||
4a71579b PC |
523 | @item Network extensions |
524 | These accept two operands, both of which must be registers; these | |
525 | two instructions actually perform the same task, yet they are | |
526 | assigned to two mnemonics for the sake of convenience and | |
527 | completeness. As usual, the first operand is the destination and | |
528 | the second is the source. | |
529 | The @code{_ul} variant is only available in 64-bit architectures. | |
530 | @example | |
531 | htonr _us _ui _ul @r{Host-to-network (big endian) order} | |
532 | ntohr _us _ui _ul @r{Network-to-host order } | |
533 | @end example | |
534 | ||
40a44dcb PC |
535 | @code{bswapr} can be used to unconditionally byte-swap an operand. |
536 | On little-endian architectures, @code{htonr} and @code{ntohr} resolve | |
537 | to this. | |
538 | The @code{_ul} variant is only available in 64-bit architectures. | |
539 | @example | |
540 | bswapr _us _ui _ul 01 = byte_swap(02) | |
541 | @end example | |
542 | ||
4a71579b PC |
543 | @item Load operations |
544 | @code{ld} accepts two operands while @code{ldx} accepts three; | |
545 | in both cases, the last can be either a register or an immediate | |
546 | value. Values are extended (with or without sign, according to | |
547 | the data type specification) to fit a whole register. | |
548 | The @code{_ui} and @code{_l} types are only available in 64-bit | |
549 | architectures. For convenience, there is a version without a | |
550 | type modifier for integer or pointer operands that uses the | |
551 | appropriate wordsize call. | |
552 | @example | |
553 | ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2 | |
554 | ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2 | |
555 | ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3) | |
556 | ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3) | |
557 | @end example | |
558 | ||
559 | @item Store operations | |
560 | @code{st} accepts two operands while @code{stx} accepts three; in | |
561 | both cases, the first can be either a register or an immediate | |
562 | value. Values are sign-extended to fit a whole register. | |
563 | @example | |
79bfeef6 PC |
564 | str _c _s _i _l _f _d *O1 = O2 |
565 | sti _c _s _i _l _f _d *O1 = O2 | |
566 | stxr _c _s _i _l _f _d *(O1+O2) = O3 | |
567 | stxi _c _s _i _l _f _d *(O1+O2) = O3 | |
4a71579b | 568 | @end example |
79bfeef6 PC |
569 | Note that the unsigned type modifier is not available, as the store |
570 | only writes to the 1, 2, 4 or 8 sized memory address. | |
571 | The @code{_l} type is only available in 64-bit architectures, and for | |
572 | convenience, there is a version without a type modifier for integer or | |
573 | pointer operands that uses the appropriate wordsize call. | |
4a71579b | 574 | |
ba86ff93 PC |
575 | @item Unaligned memory access |
576 | These allow access to integers of size 3, in 32-bit, and extra sizes | |
577 | 5, 6 and 7 in 64-bit. | |
578 | For floating point values only support for size 4 and 8 is provided. | |
579 | @example | |
580 | unldr O1 = *(signed O3 byte integer)* = O2 | |
581 | unldi O1 = *(signed O3 byte integer)* = O2 | |
582 | unldr_u O1 = *(unsigned O3 byte integer)* = O2 | |
583 | unldi_u O1 = *(unsigned O3 byte integer)* = O2 | |
584 | unldr_x O1 = *(O3 byte float)* = O2 | |
585 | unldi_x O1 = *(O3 byte float)* = O2 | |
586 | unstr *(O3 byte integer)O1 = O2 | |
587 | unsti *(O3 byte integer)O1 = O2 | |
588 | unstr_x *(O3 byte float)O1 = O2 | |
589 | unsti_x *(O3 byte float)O1 = O2 | |
590 | @end example | |
591 | With the exception of non standard sized integers, these might be | |
592 | implemented as normal loads and stores, if the processor supports | |
593 | unaligned memory access, or, mode can be chosen at jit initialization | |
594 | time, to generate or not generate, code that does trap on unaligned | |
595 | memory access. Letting the kernel trap means smaller code generation | |
596 | as it is required to check alignment at runtime@footnote{This requires changing jit_cpu.unaligned to 0 to disable or 1 to enable unaligned code generation. Not all ports have the C jit_cpu.unaligned value.}. | |
597 | ||
4a71579b PC |
598 | @item Argument management |
599 | These are: | |
600 | @example | |
601 | prepare (not specified) | |
602 | va_start (not specified) | |
79bfeef6 PC |
603 | pushargr _c _uc _s _us _i _ui _l _f _d |
604 | pushargi _c _uc _s _us _i _ui _l _f _d | |
4a71579b | 605 | va_push (not specified) |
79bfeef6 | 606 | arg _c _uc _s _us _i _ui _l _f _d |
4a71579b PC |
607 | getarg _c _uc _s _us _i _ui _l _f _d |
608 | va_arg _d | |
79bfeef6 PC |
609 | putargr _c _uc _s _us _i _ui _l _f _d |
610 | putargi _c _uc _s _us _i _ui _l _f _d | |
4a71579b | 611 | ret (not specified) |
79bfeef6 PC |
612 | retr _c _uc _s _us _i _ui _l _f _d |
613 | reti _c _uc _s _us _i _ui _l _f _d | |
4a71579b PC |
614 | reti _f _d |
615 | va_end (not specified) | |
616 | retval _c _uc _s _us _i _ui _l _f _d | |
617 | epilog (not specified) | |
618 | @end example | |
619 | As with other operations that use a type modifier, the @code{_ui} and | |
620 | @code{_l} types are only available in 64-bit architectures, but there | |
621 | are operations without a type modifier that alias to the appropriate | |
622 | integer operation with wordsize operands. | |
623 | ||
624 | @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller, | |
625 | while @code{arg}, @code{getarg} and @code{ret} are used by the callee. | |
626 | A code snippet that wants to call another procedure and has to pass | |
627 | arguments must, in order: use the @code{prepare} instruction and use | |
628 | the @code{pushargr} or @code{pushargi} to push the arguments @strong{in | |
629 | left to right order}; and use @code{finish} or @code{call} (explained below) | |
630 | to perform the actual call. | |
631 | ||
79bfeef6 PC |
632 | Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when |
633 | handling integer types can be used without a type modifier. | |
634 | It is suggested to use matching type modifiers to @code{arg}, @code{putarg} | |
635 | and @code{getarg} otherwise problems will happen if generating jit for | |
636 | environments that require arguments to be truncated and zero or sign | |
637 | extended by the caller and/or excess arguments might be passed packed | |
638 | in the stack. Currently only Apple systems with @code{aarch64} cpus are | |
639 | known to have this restriction. | |
640 | ||
4a71579b PC |
641 | @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch |
642 | arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles. | |
643 | @code{va_push} is required when passing a @code{va_list} to another function, | |
644 | because not all architectures expect it as a single pointer. Known case | |
645 | is DEC Alpha, that requires it as a structure passed by value. | |
646 | ||
647 | @code{arg}, @code{getarg} and @code{putarg} are used by the callee. | |
648 | @code{arg} is different from other instruction in that it does not | |
649 | actually generate any code: instead, it is a function which returns | |
650 | a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return | |
651 | a value'' means that @lightning{} code that compile these | |
652 | instructions return a value when expanded.} You should call | |
653 | @code{arg} as soon as possible, before any function call or, more | |
654 | easily, right after the @code{prolog} instructions | |
655 | (which is treated later). | |
656 | ||
657 | @code{getarg} accepts a register argument and a value returned by | |
658 | @code{arg}, and will move that argument to the register, extending | |
659 | it (with or without sign, according to the data type specification) | |
660 | to fit a whole register. These instructions are more intimately | |
661 | related to the usage of the @lightning{} instruction set in code | |
662 | that generates other code, so they will be treated more | |
663 | specifically in @ref{GNU lightning examples, , Generating code at | |
664 | run-time}. | |
665 | ||
666 | @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that | |
667 | it accepts as first argument a register or immediate, and as | |
668 | second argument a value returned by @code{arg}. It allows changing, | |
669 | or restoring an argument to the current function, and is a | |
670 | construct required to implement tail call optimization. Note that | |
671 | arguments in registers are very cheap, but will be overwritten | |
672 | at any moment, including on some operations, for example division, | |
673 | that on several ports is implemented as a function call. | |
674 | ||
675 | Finally, the @code{retval} instruction fetches the return value of a | |
676 | called function in a register. The @code{retval} instruction takes a | |
677 | register argument and copies the return value of the previously called | |
678 | function in that register. A function with a return value should use | |
679 | @code{retr} or @code{reti} to put the return value in the return register | |
680 | before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example. | |
681 | ||
682 | @code{epilog} is an optional call, that marks the end of a function | |
683 | body. It is automatically generated by @lightning{} if starting a new | |
684 | function (what should be done after a @code{ret} call) or finishing | |
685 | generating jit. | |
686 | It is very important to note that the fact that @code{epilog} being | |
687 | optional may cause a common mistake. Consider this: | |
688 | @example | |
689 | fun1: | |
690 | prolog | |
691 | ... | |
692 | ret | |
693 | fun2: | |
694 | prolog | |
695 | @end example | |
696 | Because @code{epilog} is added when finding a new @code{prolog}, | |
697 | this will cause the @code{fun2} label to actually be before the | |
698 | return from @code{fun1}. Because @lightning{} will actually | |
699 | understand it as: | |
700 | @example | |
701 | fun1: | |
702 | prolog | |
703 | ... | |
704 | ret | |
705 | fun2: | |
706 | epilog | |
707 | prolog | |
708 | @end example | |
709 | ||
710 | You should observe a few rules when using these macros. First of | |
711 | all, if calling a varargs function, you should use the @code{ellipsis} | |
712 | call to mark the position of the ellipsis in the C prototype. | |
713 | ||
714 | You should not nest calls to @code{prepare} inside a | |
715 | @code{prepare/finish} block. Doing this will result in undefined | |
716 | behavior. Note that for functions with zero arguments you can use | |
717 | just @code{call}. | |
718 | ||
719 | @item Branch instructions | |
720 | Like @code{arg}, these also return a value which, in this case, | |
721 | is to be used to compile forward branches as explained in | |
722 | @ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be | |
723 | compared; of these, the last can be either a register or an immediate. | |
724 | They are: | |
725 | @example | |
726 | bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1 | |
727 | blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1 | |
728 | bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1 | |
729 | blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1 | |
730 | bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1 | |
731 | bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1 | |
732 | bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1 | |
733 | bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1 | |
734 | beqr _f _d @r{if }(O2 == O3)@r{ goto }O1 | |
735 | beqi _f _d @r{if }(O2 == O3)@r{ goto }O1 | |
736 | bner _f _d @r{if }(O2 != O3)@r{ goto }O1 | |
737 | bnei _f _d @r{if }(O2 != O3)@r{ goto }O1 | |
738 | ||
739 | bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1 | |
740 | bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1 | |
741 | bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1 | |
742 | bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1 | |
743 | buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1 | |
744 | bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1 | |
745 | bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1 | |
746 | bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1 | |
747 | ||
748 | bmsr @r{if }O2 & O3@r{ goto }O1 | |
749 | bmsi @r{if }O2 & O3@r{ goto }O1 | |
750 | bmcr @r{if }!(O2 & O3)@r{ goto }O1 | |
751 | bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.} | |
752 | boaddr _u O2 += O3@r{, goto }O1@r{ if overflow} | |
753 | boaddi _u O2 += O3@r{, goto }O1@r{ if overflow} | |
754 | bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow} | |
755 | bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow} | |
756 | bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow} | |
757 | bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow} | |
758 | bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow} | |
759 | bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow} | |
760 | @end example | |
761 | ||
79bfeef6 PC |
762 | Note that the @code{C} code does not have an @code{O1} argument. It is |
763 | required to always use the return value as an argument to @code{patch}, | |
764 | @code{patch_at} or @code{patch_abs}. | |
765 | ||
4a71579b PC |
766 | @item Jump and return operations |
767 | These accept one argument except @code{ret} and @code{jmpi} which | |
768 | have none; the difference between @code{finishi} and @code{calli} | |
769 | is that the latter does not clean the stack from pushed parameters | |
770 | (if any) and the former must @strong{always} follow a @code{prepare} | |
771 | instruction. | |
772 | @example | |
773 | callr (not specified) @r{function call to register O1} | |
774 | calli (not specified) @r{function call to immediate O1} | |
775 | finishr (not specified) @r{function call to register O1} | |
776 | finishi (not specified) @r{function call to immediate O1} | |
777 | jmpr (not specified) @r{unconditional jump to register} | |
778 | jmpi (not specified) @r{unconditional jump} | |
779 | ret (not specified) @r{return from subroutine} | |
780 | retr _c _uc _s _us _i _ui _l _f _d | |
781 | reti _c _uc _s _us _i _ui _l _f _d | |
782 | retval _c _uc _s _us _i _ui _l _f _d @r{move return value} | |
783 | @r{to register} | |
784 | @end example | |
785 | ||
786 | Like branch instruction, @code{jmpi} also returns a value which is to | |
787 | be used to compile forward branches. @xref{Fibonacci, , Fibonacci | |
788 | numbers}. | |
789 | ||
790 | @item Labels | |
791 | There are 3 @lightning{} instructions to create labels: | |
792 | @example | |
793 | label (not specified) @r{simple label} | |
794 | forward (not specified) @r{forward label} | |
795 | indirect (not specified) @r{special simple label} | |
796 | @end example | |
797 | ||
ba3814c1 PC |
798 | The following instruction is used to specify a minimal alignment for |
799 | the next instruction, usually with a label: | |
800 | @example | |
801 | align (not specified) @r{align code} | |
802 | @end example | |
803 | ||
79bfeef6 PC |
804 | Similar to @code{align} is the next instruction, also usually used with |
805 | a label: | |
806 | @example | |
807 | skip (not specified) @r{skip code} | |
808 | @end example | |
809 | It is used to specify a minimal number of bytes of nops to be inserted | |
810 | before the next instruction. | |
811 | ||
4a71579b PC |
812 | @code{label} is normally used as @code{patch_at} argument for backward |
813 | jumps. | |
814 | ||
815 | @example | |
816 | jit_node_t *jump, *label; | |
817 | label = jit_label(); | |
818 | ... | |
819 | jump = jit_beqr(JIT_R0, JIT_R1); | |
820 | jit_patch_at(jump, label); | |
821 | @end example | |
822 | ||
823 | @code{forward} is used to patch code generation before the actual | |
824 | position of the label is known. | |
825 | ||
826 | @example | |
827 | jit_node_t *jump, *label; | |
828 | label = jit_forward(); | |
829 | jump = jit_beqr(JIT_R0, JIT_R1); | |
830 | jit_patch_at(jump, label); | |
831 | ... | |
832 | jit_link(label); | |
833 | @end example | |
834 | ||
835 | @code{indirect} is useful when creating jump tables, and tells | |
836 | @lightning{} to not optimize out a label that is not the target of | |
837 | any jump, because an indirect jump may land where it is defined. | |
838 | ||
839 | @example | |
840 | jit_node_t *jump, *label; | |
841 | ... | |
842 | jmpr(JIT_R0); @rem{/* may jump to label */} | |
843 | ... | |
844 | label = jit_indirect(); | |
845 | @end example | |
846 | ||
847 | @code{indirect} is an special case of @code{note} and @code{name} | |
848 | because it is a valid argument to @code{address}. | |
849 | ||
850 | Note that the usual idiom to write the previous example is | |
851 | @example | |
852 | jit_node_t *addr, *jump; | |
853 | addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */} | |
854 | ... | |
855 | jmpr(JIT_R0); | |
856 | ... | |
857 | jit_patch(addr); @rem{/* implicit label added */} | |
858 | @end example | |
859 | ||
860 | that automatically binds the implicit label added by @code{patch} with | |
861 | the @code{movi}, but on some special conditions it is required to create | |
862 | an "unbound" label. | |
863 | ||
ba3814c1 PC |
864 | @code{align} is useful for creating multiple entry points to a |
865 | (trampoline) function that are all accessible through a single | |
866 | function pointer. @code{align} receives an integer argument that | |
867 | defines the minimal alignment of the address of a label directly | |
868 | following the @code{align} instruction. The integer argument must be | |
869 | a power of two and the effective alignment will be a power of two no | |
870 | less than the argument to @code{align}. If the argument to | |
871 | @code{align} is 16 or more, the effective alignment will match the | |
872 | specified minimal alignment exactly. | |
873 | ||
874 | @example | |
875 | jit_node_t *forward, *label1, *label2, *jump; | |
876 | unsigned char *addr1, *addr2; | |
877 | forward = jit_forward(); | |
878 | jit_align(16); | |
879 | label1 = jit_indirect(); @rem{/* first entry point */} | |
880 | jump = jit_jmpi(); @rem{/* jump to first handler */} | |
881 | jit_patch_at(jump, forward); | |
882 | jit_align(16); | |
883 | label2 = jit_indirect(); @rem{/* second entry point */} | |
884 | ... @rem{/* second handler */} | |
885 | jit_jmpr(...); | |
886 | jit_link(forward); | |
887 | ... @rem{/* first handler /*} | |
888 | jit_jmpr(...); | |
889 | ... | |
890 | jit_emit(); | |
891 | addr1 = jit_address(label1); | |
892 | addr2 = jit_address(label2); | |
893 | assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */} | |
894 | @end example | |
895 | ||
79bfeef6 PC |
896 | @code{skip} is useful for reserving space in the code buffer that can |
897 | later be filled (possibly with the help of the pair of functions | |
898 | @code{jit_unprotect} and @code{jit_protect}). | |
899 | ||
4a71579b PC |
900 | @item Function prolog |
901 | ||
902 | These macros are used to set up a function prolog. The @code{allocai} | |
903 | call accept a single integer argument and returns an offset value | |
904 | for stack storage access. The @code{allocar} accepts two registers | |
905 | arguments, the first is set to the offset for stack access, and the | |
906 | second is the size in bytes argument. | |
907 | ||
908 | @example | |
909 | prolog (not specified) @r{function prolog} | |
910 | allocai (not specified) @r{reserve space on the stack} | |
911 | allocar (not specified) @r{allocate space on the stack} | |
912 | @end example | |
913 | ||
914 | @code{allocai} receives the number of bytes to allocate and returns | |
915 | the offset from the frame pointer register @code{FP} to the base of | |
916 | the area. | |
917 | ||
918 | @code{allocar} receives two register arguments. The first is where | |
919 | to store the offset from the frame pointer register @code{FP} to the | |
920 | base of the area. The second argument is the size in bytes. Note | |
921 | that @code{allocar} is dynamic allocation, and special attention | |
922 | should be taken when using it. If called in a loop, every iteration | |
923 | will allocate stack space. Stack space is aligned from 8 to 64 bytes | |
924 | depending on backend requirements, even if allocating only one byte. | |
925 | It is advisable to not use it with @code{frame} and @code{tramp}; it | |
926 | should work with @code{frame} with special care to call only once, | |
927 | but is not supported if used in @code{tramp}, even if called only | |
928 | once. | |
929 | ||
930 | As a small appetizer, here is a small function that adds 1 to the input | |
931 | parameter (an @code{int}). I'm using an assembly-like syntax here which | |
932 | is a bit different from the one used when writing real subroutines with | |
933 | @lightning{}; the real syntax will be introduced in @xref{GNU lightning | |
934 | examples, , Generating code at run-time}. | |
935 | ||
936 | @example | |
937 | incr: | |
938 | prolog | |
939 | in = arg @rem{! We have an integer argument} | |
940 | getarg R0, in @rem{! Move it to R0} | |
941 | addi R0, R0, 1 @rem{! Add 1} | |
942 | retr R0 @rem{! And return the result} | |
943 | @end example | |
944 | ||
945 | And here is another function which uses the @code{printf} function from | |
946 | the standard C library to write a number in hexadecimal notation: | |
947 | ||
948 | @example | |
949 | printhex: | |
950 | prolog | |
951 | in = arg @rem{! Same as above} | |
952 | getarg R0, in | |
953 | prepare @rem{! Begin call sequence for printf} | |
954 | pushargi "%x" @rem{! Push format string} | |
955 | ellipsis @rem{! Varargs start here} | |
956 | pushargr R0 @rem{! Push second argument} | |
957 | finishi printf @rem{! Call printf} | |
958 | ret @rem{! Return to caller} | |
959 | @end example | |
960 | ||
519a9ea1 PC |
961 | @item Register liveness |
962 | ||
963 | During code generation, @lightning{} occasionally needs scratch registers | |
964 | or needs to use architecture-defined registers. For that, @lightning{} | |
965 | internally maintains register liveness information. | |
966 | ||
967 | In the following example, @code{qdivr} will need special registers like | |
968 | @code{R0} on some architectures. As @lightning{} understands that | |
969 | @code{R0} is used in the subsequent instruction, it will create | |
970 | save/restore code for @code{R0} in case. | |
971 | ||
972 | @example | |
973 | ... | |
974 | qdivr V0, V1, V2, V3 | |
975 | movr V3, R0 | |
976 | ... | |
977 | @end example | |
978 | ||
979 | The same is not true in the example that follows. Here, @code{R0} is | |
980 | not alive after the division operation because @code{R0} is neither an | |
981 | argument register nor a callee-save register. Thus, no save/restore | |
982 | code for @code{R0} will be created in case. | |
983 | ||
984 | @example | |
985 | ... | |
986 | qdivr V0, V1, V2, V3 | |
987 | jmpr R1 | |
988 | ... | |
989 | @end example | |
990 | ||
991 | The @code{live} instruction can be used to mark a register as live after | |
992 | it as in the following example. Here, @code{R0} will be preserved | |
993 | across the division. | |
994 | ||
995 | @example | |
996 | ... | |
997 | qdivr V0, V1, V2, V3 | |
998 | live R0 | |
999 | jmpr R1 | |
1000 | ... | |
1001 | @end example | |
1002 | ||
1003 | The @code{live} instruction is useful at code entry and exit points, | |
1004 | like after and before a @code{callr} instruction. | |
1005 | ||
4a71579b PC |
1006 | @item Trampolines, continuations and tail call optimization |
1007 | ||
1008 | Frequently it is required to generate jit code that must jump to | |
1009 | code generated later, possibly from another @code{jit_context_t}. | |
1010 | These require compatible stack frames. | |
1011 | ||
1012 | @lightning{} provides two primitives from where trampolines, | |
1013 | continuations and tail call optimization can be implemented. | |
1014 | ||
1015 | @example | |
1016 | frame (not specified) @r{create stack frame} | |
1017 | tramp (not specified) @r{assume stack frame} | |
1018 | @end example | |
1019 | ||
1020 | @code{frame} receives an integer argument@footnote{It is not | |
1021 | automatically computed because it does not know about the | |
1022 | requirement of later generated code.} that defines the size in | |
1023 | bytes for the stack frame of the current, @code{C} callable, | |
1024 | jit function. To calculate this value, a good formula is maximum | |
1025 | number of arguments to any called native function times | |
1026 | eight@footnote{Times eight so that it works for double arguments. | |
1027 | And would not need conditionals for ports that pass arguments in | |
1028 | the stack.}, plus the sum of the arguments to any call to | |
1029 | @code{jit_allocai}. @lightning{} automatically adjusts this value | |
1030 | for any backend specific stack memory it may need, or any | |
1031 | alignment constraint. | |
1032 | ||
1033 | @code{frame} also instructs @lightning{} to save all callee | |
1034 | save registers in the prolog and reload in the epilog. | |
1035 | ||
1036 | @example | |
1037 | main: @rem{! jit entry point} | |
1038 | prolog @rem{! function prolog} | |
1039 | frame 256 @rem{! save all callee save registers and} | |
1040 | @rem{! reserve at least 256 bytes in stack} | |
1041 | main_loop: | |
1042 | ... | |
1043 | jmpi handler @rem{! jumps to external code} | |
1044 | ... | |
1045 | ret @rem{! return to the caller} | |
1046 | @end example | |
1047 | ||
1048 | @code{tramp} differs from @code{frame} only that a prolog and epilog | |
1049 | will not be generated. Note that @code{prolog} must still be used. | |
1050 | The code under @code{tramp} must be ready to be entered with a jump | |
1051 | at the prolog position, and instead of a return, it must end with | |
1052 | a non conditional jump. @code{tramp} exists solely for the fact | |
1053 | that it allows optimizing out prolog and epilog code that would | |
1054 | never be executed. | |
1055 | ||
1056 | @example | |
1057 | handler: @rem{! handler entry point} | |
1058 | prolog @rem{! function prolog} | |
1059 | tramp 256 @rem{! assumes all callee save registers} | |
1060 | @rem{! are saved and there is at least} | |
1061 | @rem{! 256 bytes in stack} | |
1062 | ... | |
1063 | jmpi main_loop @rem{! return to the main loop} | |
1064 | @end example | |
1065 | ||
1066 | @lightning{} only supports Tail Call Optimization using the | |
1067 | @code{tramp} construct. Any other way is not guaranteed to | |
1068 | work on all ports. | |
1069 | ||
1070 | An example of a simple (recursive) tail call optimization: | |
1071 | ||
1072 | @example | |
1073 | factorial: @rem{! Entry point of the factorial function} | |
1074 | prolog | |
1075 | in = arg @rem{! Receive an integer argument} | |
1076 | getarg R0, in @rem{! Move argument to RO} | |
1077 | prepare | |
1078 | pushargi 1 @rem{! This is the accumulator} | |
1079 | pushargr R0 @rem{! This is the argument} | |
1080 | finishi fact @rem{! Call the tail call optimized function} | |
1081 | retval R0 @rem{! Fetch the result} | |
1082 | retr R0 @rem{! Return it} | |
1083 | epilog @rem{! Epilog *before* label before prolog} | |
1084 | ||
1085 | fact: @rem{! Entry point of the helper function} | |
1086 | prolog | |
1087 | frame 16 @rem{! Reserve 16 bytes in the stack} | |
1088 | fact_entry: @rem{! This is the tail call entry point} | |
1089 | ac = arg @rem{! The accumulator is the first argument} | |
1090 | in = arg @rem{! The factorial argument} | |
1091 | getarg R0, ac @rem{! Move the accumulator to R0} | |
1092 | getarg R1, in @rem{! Move the argument to R1} | |
1093 | blei fact_out, R1, 1 @rem{! Done if argument is one or less} | |
1094 | mulr R0, R0, R1 @rem{! accumulator *= argument} | |
1095 | putargr R0, ac @rem{! Update the accumulator} | |
1096 | subi R1, R1, 1 @rem{! argument -= 1} | |
1097 | putargr R1, in @rem{! Update the argument} | |
1098 | jmpi fact_entry @rem{! Tail Call Optimize it!} | |
1099 | fact_out: | |
1100 | retr R0 @rem{! Return the accumulator} | |
1101 | @end example | |
1102 | ||
1103 | @item Predicates | |
1104 | @example | |
1105 | forward_p (not specified) @r{forward label predicate} | |
1106 | indirect_p (not specified) @r{indirect label predicate} | |
1107 | target_p (not specified) @r{used label predicate} | |
1108 | arg_register_p (not specified) @r{argument kind predicate} | |
1109 | callee_save_p (not specified) @r{callee save predicate} | |
1110 | pointer_p (not specified) @r{pointer predicate} | |
1111 | @end example | |
1112 | ||
1113 | @code{forward_p} expects a @code{jit_node_t*} argument, and | |
1114 | returns non zero if it is a forward label reference, that is, | |
1115 | a label returned by @code{forward}, that still needs a | |
1116 | @code{link} call. | |
1117 | ||
1118 | @code{indirect_p} expects a @code{jit_node_t*} argument, and returns | |
1119 | non zero if it is an indirect label reference, that is, a label that | |
1120 | was returned by @code{indirect}. | |
1121 | ||
1122 | @code{target_p} expects a @code{jit_node_t*} argument, that is any | |
1123 | kind of label, and will return non zero if there is at least one | |
1124 | jump or move referencing it. | |
1125 | ||
1126 | @code{arg_register_p} expects a @code{jit_node_t*} argument, that must | |
1127 | have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and | |
1128 | will return non zero if the argument lives in a register. This call | |
1129 | is useful to know the live range of register arguments, as those | |
1130 | are very fast to read and write, but have volatile values. | |
1131 | ||
79bfeef6 | 1132 | @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or |
4a71579b PC |
1133 | @code{JIT_Fn}, and will return non zero if the register is callee |
1134 | save. This call is useful because on several ports, the @code{JIT_Rn} | |
1135 | and @code{JIT_Fn} registers are actually callee save; no need | |
1136 | to save and load the values when making function calls. | |
1137 | ||
1138 | @code{pointer_p} expects a pointer argument, and will return non | |
1139 | zero if the pointer is inside the generated jit code. Must be | |
1140 | called after @code{jit_emit} and before @code{jit_destroy_state}. | |
ba3814c1 PC |
1141 | |
1142 | @item Atomic operations | |
1143 | Only compare-and-swap is implemented. It accepts four operands; | |
1144 | the second can be an immediate. | |
1145 | ||
1146 | The first argument is set with a boolean value telling if the operation | |
1147 | did succeed. | |
1148 | ||
1149 | Arguments must be different, cannot use the result register to also pass | |
1150 | an argument. | |
1151 | ||
1152 | The second argument is the address of a machine word. | |
1153 | ||
1154 | The third argument is the old value. | |
1155 | ||
1156 | The fourth argument is the new value. | |
1157 | ||
1158 | @example | |
1159 | casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0 | |
1160 | casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0 | |
1161 | @end example | |
1162 | ||
1163 | If value at the address in the second argument is equal to the third | |
1164 | argument, the address value is atomically modified to the value of the | |
1165 | fourth argument and the first argument is set to a non zero value. | |
1166 | ||
1167 | If the value at the address in the second argument is not equal to the | |
1168 | third argument nothing is done and the first argument is set to zero. | |
4a71579b PC |
1169 | @end table |
1170 | ||
1171 | @node GNU lightning examples | |
1172 | @chapter Generating code at run-time | |
1173 | ||
1174 | To use @lightning{}, you should include the @file{lightning.h} file that | |
1175 | is put in your include directory by the @samp{make install} command. | |
1176 | ||
1177 | Each of the instructions above translates to a macro or function call. | |
1178 | All you have to do is prepend @code{jit_} (lowercase) to opcode names | |
1179 | and @code{JIT_} (uppercase) to register names. Of course, parameters | |
1180 | are to be put between parentheses. | |
1181 | ||
1182 | This small tutorial presents three examples: | |
1183 | ||
1184 | @iftex | |
1185 | @itemize @bullet | |
1186 | @item | |
1187 | The @code{incr} function found in @ref{The instruction set, , | |
1188 | @lightning{}'s instruction set}: | |
1189 | ||
1190 | @item | |
1191 | A simple function call to @code{printf} | |
1192 | ||
1193 | @item | |
1194 | An RPN calculator. | |
1195 | ||
1196 | @item | |
1197 | Fibonacci numbers | |
1198 | @end itemize | |
1199 | @end iftex | |
1200 | @ifnottex | |
1201 | @menu | |
1202 | * incr:: A function which increments a number by one | |
1203 | * printf:: A simple function call to printf | |
1204 | * RPN calculator:: A more complex example, an RPN calculator | |
1205 | * Fibonacci:: Calculating Fibonacci numbers | |
1206 | @end menu | |
1207 | @end ifnottex | |
1208 | ||
1209 | @node incr | |
1210 | @section A function which increments a number by one | |
1211 | ||
1212 | Let's see how to create and use the sample @code{incr} function created | |
1213 | in @ref{The instruction set, , @lightning{}'s instruction set}: | |
1214 | ||
1215 | @example | |
1216 | #include <stdio.h> | |
1217 | #include <lightning.h> | |
1218 | ||
1219 | static jit_state_t *_jit; | |
1220 | ||
1221 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} | |
1222 | ||
1223 | int main(int argc, char *argv[]) | |
1224 | @{ | |
1225 | jit_node_t *in; | |
1226 | pifi incr; | |
1227 | ||
1228 | init_jit(argv[0]); | |
1229 | _jit = jit_new_state(); | |
1230 | ||
1231 | jit_prolog(); @rem{/* @t{ prolog } */} | |
1232 | in = jit_arg(); @rem{/* @t{ in = arg } */} | |
1233 | jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */} | |
1234 | jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */} | |
1235 | jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */} | |
1236 | ||
1237 | incr = jit_emit(); | |
1238 | jit_clear_state(); | |
1239 | ||
1240 | @rem{/* call the generated code@comma{} passing 5 as an argument */} | |
1241 | printf("%d + 1 = %d\n", 5, incr(5)); | |
1242 | ||
1243 | jit_destroy_state(); | |
1244 | finish_jit(); | |
1245 | return 0; | |
1246 | @} | |
1247 | @end example | |
1248 | ||
1249 | Let's examine the code line by line (well, almost@dots{}): | |
1250 | ||
1251 | @table @t | |
1252 | @item #include <lightning.h> | |
1253 | You already know about this. It defines all of @lightning{}'s macros. | |
1254 | ||
1255 | @item static jit_state_t *_jit; | |
1256 | You might wonder about what is @code{jit_state_t}. It is a structure | |
1257 | that stores jit code generation information. The name @code{_jit} is | |
1258 | special, because since multiple jit generators can run at the same | |
1259 | time, you must either @r{#define _jit my_jit_state} or name it | |
1260 | @code{_jit}. | |
1261 | ||
1262 | @item typedef int (*pifi)(int); | |
1263 | Just a handy typedef for a pointer to a function that takes an | |
1264 | @code{int} and returns another. | |
1265 | ||
1266 | @item jit_node_t *in; | |
1267 | Declares a variable to hold an identifier for a function argument. It | |
1268 | is an opaque pointer, that will hold the return of a call to @code{arg} | |
1269 | and be used as argument to @code{getarg}. | |
1270 | ||
1271 | @item pifi incr; | |
1272 | Declares a function pointer variable to a function that receives an | |
1273 | @code{int} and returns an @code{int}. | |
1274 | ||
1275 | @item init_jit(argv[0]); | |
1276 | You must call this function before creating a @code{jit_state_t} | |
1277 | object. This function does global state initialization, and may need | |
1278 | to detect CPU or Operating System features. It receives a string | |
1279 | argument that is later used to read symbols from a shared object using | |
1280 | GNU binutils if disassembly was enabled at configure time. If no | |
1281 | disassembly will be performed a NULL pointer can be used as argument. | |
1282 | ||
1283 | @item _jit = jit_new_state(); | |
1284 | This call initializes a @lightning{} jit state. | |
1285 | ||
1286 | @item jit_prolog(); | |
1287 | Ok, so we start generating code for our beloved function@dots{} | |
1288 | ||
1289 | @item in = jit_arg(); | |
1290 | @itemx jit_getarg(JIT_R0, in); | |
1291 | We retrieve the first (and only) argument, an integer, and store it | |
1292 | into the general-purpose register @code{R0}. | |
1293 | ||
1294 | @item jit_addi(JIT_R0, JIT_R0, 1); | |
1295 | We add one to the content of the register. | |
1296 | ||
1297 | @item jit_retr(JIT_R0); | |
1298 | This instruction generates a standard function epilog that returns | |
1299 | the contents of the @code{R0} register. | |
1300 | ||
1301 | @item incr = jit_emit(); | |
1302 | This instruction is very important. It actually translates the | |
1303 | @lightning{} macros used before to machine code, flushes the generated | |
1304 | code area out of the processor's instruction cache and return a | |
1305 | pointer to the start of the code. | |
1306 | ||
1307 | @item jit_clear_state(); | |
1308 | This call cleanups any data not required for jit execution. Note | |
1309 | that it must be called after any call to @code{jit_print} or | |
1310 | @code{jit_address}, as this call destroy the @lightning{} | |
1311 | intermediate representation. | |
1312 | ||
1313 | @item printf("%d + 1 = %d", 5, incr(5)); | |
1314 | Calling our function is this simple---it is not distinguishable from | |
1315 | a normal C function call, the only difference being that @code{incr} | |
1316 | is a variable. | |
1317 | ||
1318 | @item jit_destroy_state(); | |
1319 | Releases all memory associated with the jit context. It should be | |
1320 | called after known the jit will no longer be called. | |
1321 | ||
1322 | @item finish_jit(); | |
1323 | This call cleanups any global state hold by @lightning{}, and is | |
1324 | advisable to call it once jit code will no longer be generated. | |
1325 | @end table | |
1326 | ||
1327 | @lightning{} abstracts two phases of dynamic code generation: selecting | |
1328 | instructions that map the standard representation, and emitting binary | |
1329 | code for these instructions. The client program has the responsibility | |
1330 | of describing the code to be generated using the standard @lightning{} | |
1331 | instruction set. | |
1332 | ||
1333 | Let's examine the code generated for @code{incr} on the SPARC and x86_64 | |
1334 | architecture (on the right is the code that an assembly-language | |
1335 | programmer would write): | |
1336 | ||
1337 | @table @b | |
1338 | @item SPARC | |
1339 | @example | |
1340 | save %sp, -112, %sp | |
1341 | mov %i0, %g2 retl | |
1342 | inc %g2 inc %o0 | |
1343 | mov %g2, %i0 | |
519a9ea1 PC |
1344 | restore |
1345 | retl | |
1346 | nop | |
4a71579b PC |
1347 | @end example |
1348 | In this case, @lightning{} introduces overhead to create a register | |
1349 | window (not knowing that the procedure is a leaf procedure) and to | |
1350 | move the argument to the general purpose register @code{R0} (which | |
1351 | maps to @code{%g2} on the SPARC). | |
1352 | @end table | |
1353 | ||
1354 | @table @b | |
1355 | @item x86_64 | |
1356 | @example | |
79bfeef6 PC |
1357 | mov %rdi,%rax |
1358 | add $0x1,%rax | |
1359 | ret | |
4a71579b | 1360 | @end example |
79bfeef6 PC |
1361 | In this case, for the x86 port, @lightning{} has simple optimizations |
1362 | to understand it is a leaf function, and that it is not required to | |
1363 | create a stack frame nor update the stack pointer. | |
4a71579b PC |
1364 | @end table |
1365 | ||
1366 | @node printf | |
1367 | @section A simple function call to @code{printf} | |
1368 | ||
1369 | Again, here is the code for the example: | |
1370 | ||
1371 | @example | |
1372 | #include <stdio.h> | |
1373 | #include <lightning.h> | |
1374 | ||
1375 | static jit_state_t *_jit; | |
1376 | ||
1377 | typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */} | |
1378 | ||
1379 | int main(int argc, char *argv[]) | |
1380 | @{ | |
1381 | pvfi myFunction; @rem{/* ptr to generated code */} | |
1382 | jit_node_t *start, *end; @rem{/* a couple of labels */} | |
1383 | jit_node_t *in; @rem{/* to get the argument */} | |
1384 | ||
1385 | init_jit(argv[0]); | |
1386 | _jit = jit_new_state(); | |
1387 | ||
1388 | start = jit_note(__FILE__, __LINE__); | |
1389 | jit_prolog(); | |
1390 | in = jit_arg(); | |
1391 | jit_getarg(JIT_R1, in); | |
1392 | jit_prepare(); | |
1393 | jit_pushargi((jit_word_t)"generated %d bytes\n"); | |
1394 | jit_ellipsis(); | |
1395 | jit_pushargr(JIT_R1); | |
1396 | jit_finishi(printf); | |
1397 | jit_ret(); | |
1398 | jit_epilog(); | |
1399 | end = jit_note(__FILE__, __LINE__); | |
1400 | ||
1401 | myFunction = jit_emit(); | |
1402 | ||
1403 | @rem{/* call the generated code@comma{} passing its size as argument */} | |
1404 | myFunction((char*)jit_address(end) - (char*)jit_address(start)); | |
1405 | jit_clear_state(); | |
1406 | ||
1407 | jit_disassemble(); | |
1408 | ||
1409 | jit_destroy_state(); | |
1410 | finish_jit(); | |
1411 | return 0; | |
1412 | @} | |
1413 | @end example | |
1414 | ||
1415 | The function shows how many bytes were generated. Most of the code | |
1416 | is not very interesting, as it resembles very closely the program | |
1417 | presented in @ref{incr, , A function which increments a number by one}. | |
1418 | ||
1419 | For this reason, we're going to concentrate on just a few statements. | |
1420 | ||
1421 | @table @t | |
1422 | @item start = jit_note(__FILE__, __LINE__); | |
1423 | @itemx @r{@dots{}} | |
1424 | @itemx end = jit_note(__FILE__, __LINE__); | |
1425 | These two instruction call the @code{jit_note} macro, which creates | |
1426 | a note in the jit code; arguments to @code{jit_note} usually are a | |
1427 | filename string and line number integer, but using NULL for the | |
1428 | string argument is perfectly valid if only need to create a simple | |
1429 | marker in the code. | |
1430 | ||
1431 | @item jit_ellipsis(); | |
1432 | @code{ellipsis} usually is only required if calling varargs functions | |
1433 | with double arguments, but it is a good practice to properly describe | |
1434 | the @r{@dots{}} in the call sequence. | |
1435 | ||
1436 | @item jit_pushargi((jit_word_t)"generated %d bytes\n"); | |
1437 | Note the use of the @code{(jit_word_t)} cast, that is used only | |
1438 | to avoid a compiler warning, due to using a pointer where a | |
1439 | wordsize integer type was expected. | |
1440 | ||
1441 | @item jit_prepare(); | |
1442 | @itemx @r{@dots{}} | |
1443 | @itemx jit_finishi(printf); | |
1444 | Once the arguments to @code{printf} have been pushed, what means | |
1445 | moving them to stack or register arguments, the @code{printf} | |
1446 | function is called and the stack cleaned. Note how @lightning{} | |
1447 | abstracts the differences between different architectures and | |
1448 | ABI's -- the client program does not know how parameter passing | |
1449 | works on the host architecture. | |
1450 | ||
1451 | @item jit_epilog(); | |
1452 | Usually it is not required to call @code{epilog}, but because it | |
1453 | is implicitly called when noticing the end of a function, if the | |
1454 | @code{end} variable was set with a @code{note} call after the | |
1455 | @code{ret}, it would not consider the function epilog. | |
1456 | ||
1457 | @item myFunction((char*)jit_address(end) - (char*)jit_address(start)); | |
1458 | This calls the generate jit function passing as argument the offset | |
1459 | difference from the @code{start} and @code{end} notes. The @code{address} | |
1460 | call must be done after the @code{emit} call or either a fatal error | |
1461 | will happen (if @lightning{} is built with assertions enable) or an | |
1462 | undefined value will be returned. | |
1463 | ||
1464 | @item jit_clear_state(); | |
1465 | Note that @code{jit_clear_state} was called after executing jit in | |
1466 | this example. It was done because it must be called after any call | |
1467 | to @code{jit_address} or @code{jit_print}. | |
1468 | ||
1469 | @item jit_disassemble(); | |
1470 | @code{disassemble} will dump the generated code to standard output, | |
1471 | unless @lightning{} was built with the disassembler disabled, in which | |
1472 | case no output will be shown. | |
1473 | @end table | |
1474 | ||
1475 | @node RPN calculator | |
1476 | @section A more complex example, an RPN calculator | |
1477 | ||
1478 | We create a small stack-based RPN calculator which applies a series | |
1479 | of operators to a given parameter and to other numeric operands. | |
1480 | Unlike previous examples, the code generator is fully parameterized | |
1481 | and is able to compile different formulas to different functions. | |
1482 | Here is the code for the expression compiler; a sample usage will | |
1483 | follow. | |
1484 | ||
1485 | Since @lightning{} does not provide push/pop instruction, this | |
1486 | example uses a stack-allocated area to store the data. Such an | |
1487 | area can be allocated using the macro @code{allocai}, which | |
1488 | receives the number of bytes to allocate and returns the offset | |
1489 | from the frame pointer register @code{FP} to the base of the | |
1490 | area. | |
1491 | ||
1492 | Usually, you will use the @code{ldxi} and @code{stxi} instruction | |
1493 | to access stack-allocated variables. However, it is possible to | |
1494 | use operations such as @code{add} to compute the address of the | |
1495 | variables, and pass the address around. | |
1496 | ||
1497 | @example | |
1498 | #include <stdio.h> | |
1499 | #include <lightning.h> | |
1500 | ||
1501 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} | |
1502 | ||
1503 | static jit_state_t *_jit; | |
1504 | ||
1505 | void stack_push(int reg, int *sp) | |
1506 | @{ | |
1507 | jit_stxi_i (*sp, JIT_FP, reg); | |
1508 | *sp += sizeof (int); | |
1509 | @} | |
1510 | ||
1511 | void stack_pop(int reg, int *sp) | |
1512 | @{ | |
1513 | *sp -= sizeof (int); | |
1514 | jit_ldxi_i (reg, JIT_FP, *sp); | |
1515 | @} | |
1516 | ||
1517 | jit_node_t *compile_rpn(char *expr) | |
1518 | @{ | |
1519 | jit_node_t *in, *fn; | |
1520 | int stack_base, stack_ptr; | |
1521 | ||
1522 | fn = jit_note(NULL, 0); | |
1523 | jit_prolog(); | |
1524 | in = jit_arg(); | |
1525 | stack_ptr = stack_base = jit_allocai (32 * sizeof (int)); | |
1526 | ||
79bfeef6 | 1527 | jit_getarg(JIT_R2, in); |
4a71579b PC |
1528 | |
1529 | while (*expr) @{ | |
1530 | char buf[32]; | |
1531 | int n; | |
1532 | if (sscanf(expr, "%[0-9]%n", buf, &n)) @{ | |
1533 | expr += n - 1; | |
1534 | stack_push(JIT_R0, &stack_ptr); | |
1535 | jit_movi(JIT_R0, atoi(buf)); | |
1536 | @} else if (*expr == 'x') @{ | |
1537 | stack_push(JIT_R0, &stack_ptr); | |
1538 | jit_movr(JIT_R0, JIT_R2); | |
1539 | @} else if (*expr == '+') @{ | |
1540 | stack_pop(JIT_R1, &stack_ptr); | |
1541 | jit_addr(JIT_R0, JIT_R1, JIT_R0); | |
1542 | @} else if (*expr == '-') @{ | |
1543 | stack_pop(JIT_R1, &stack_ptr); | |
1544 | jit_subr(JIT_R0, JIT_R1, JIT_R0); | |
1545 | @} else if (*expr == '*') @{ | |
1546 | stack_pop(JIT_R1, &stack_ptr); | |
1547 | jit_mulr(JIT_R0, JIT_R1, JIT_R0); | |
1548 | @} else if (*expr == '/') @{ | |
1549 | stack_pop(JIT_R1, &stack_ptr); | |
1550 | jit_divr(JIT_R0, JIT_R1, JIT_R0); | |
1551 | @} else @{ | |
1552 | fprintf(stderr, "cannot compile: %s\n", expr); | |
1553 | abort(); | |
1554 | @} | |
1555 | ++expr; | |
1556 | @} | |
1557 | jit_retr(JIT_R0); | |
1558 | jit_epilog(); | |
1559 | return fn; | |
1560 | @} | |
1561 | @end example | |
1562 | ||
1563 | The principle on which the calculator is based is easy: the stack top | |
1564 | is held in R0, while the remaining items of the stack are held in the | |
1565 | memory area that we allocate with @code{allocai}. Compiling a numeric | |
1566 | operand or the argument @code{x} pushes the old stack top onto the | |
1567 | stack and moves the operand into R0; compiling an operator pops the | |
1568 | second operand off the stack into R1, and compiles the operation so | |
1569 | that the result goes into R0, thus becoming the new stack top. | |
1570 | ||
1571 | This example allocates a fixed area for 32 @code{int}s. This is not | |
1572 | a problem when the function is a leaf like in this case; in a full-blown | |
1573 | compiler you will want to analyze the input and determine the number | |
1574 | of needed stack slots---a very simple example of register allocation. | |
1575 | The area is then managed like a stack using @code{stack_push} and | |
1576 | @code{stack_pop}. | |
1577 | ||
1578 | Source code for the client (which lies in the same source file) follows: | |
1579 | ||
1580 | @example | |
1581 | int main(int argc, char *argv[]) | |
1582 | @{ | |
1583 | jit_node_t *nc, *nf; | |
1584 | pifi c2f, f2c; | |
1585 | int i; | |
1586 | ||
1587 | init_jit(argv[0]); | |
1588 | _jit = jit_new_state(); | |
1589 | ||
1590 | nc = compile_rpn("32x9*5/+"); | |
1591 | nf = compile_rpn("x32-5*9/"); | |
1592 | (void)jit_emit(); | |
1593 | c2f = (pifi)jit_address(nc); | |
1594 | f2c = (pifi)jit_address(nf); | |
1595 | jit_clear_state(); | |
1596 | ||
1597 | printf("\nC:"); | |
1598 | for (i = 0; i <= 100; i += 10) printf("%3d ", i); | |
1599 | printf("\nF:"); | |
1600 | for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i)); | |
1601 | printf("\n"); | |
1602 | ||
1603 | printf("\nF:"); | |
1604 | for (i = 32; i <= 212; i += 18) printf("%3d ", i); | |
1605 | printf("\nC:"); | |
1606 | for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i)); | |
1607 | printf("\n"); | |
1608 | ||
1609 | jit_destroy_state(); | |
1610 | finish_jit(); | |
1611 | return 0; | |
1612 | @} | |
1613 | @end example | |
1614 | ||
1615 | The client displays a conversion table between Celsius and Fahrenheit | |
1616 | degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The | |
1617 | formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9}, | |
1618 | respectively. | |
1619 | ||
1620 | Providing the formula as an argument to @code{compile_rpn} effectively | |
1621 | parameterizes code generation, making it possible to use the same code | |
1622 | to compile different functions; this is what makes dynamic code | |
1623 | generation so powerful. | |
1624 | ||
1625 | @node Fibonacci | |
1626 | @section Fibonacci numbers | |
1627 | ||
1628 | The code in this section calculates the Fibonacci sequence. That is | |
1629 | modeled by the recurrence relation: | |
1630 | @display | |
1631 | f(0) = 0 | |
1632 | f(1) = f(2) = 1 | |
1633 | f(n) = f(n-1) + f(n-2) | |
1634 | @end display | |
1635 | ||
1636 | The purpose of this example is to introduce branches. There are two | |
1637 | kind of branches: backward branches and forward branches. We'll | |
1638 | present the calculation in a recursive and iterative form; the | |
1639 | former only uses forward branches, while the latter uses both. | |
1640 | ||
1641 | @example | |
1642 | #include <stdio.h> | |
1643 | #include <lightning.h> | |
1644 | ||
1645 | static jit_state_t *_jit; | |
1646 | ||
1647 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} | |
1648 | ||
1649 | int main(int argc, char *argv[]) | |
1650 | @{ | |
1651 | pifi fib; | |
1652 | jit_node_t *label; | |
1653 | jit_node_t *call; | |
1654 | jit_node_t *in; @rem{/* offset of the argument */} | |
1655 | jit_node_t *ref; @rem{/* to patch the forward reference */} | |
1656 | jit_node_t *zero; @rem{/* to patch the forward reference */} | |
1657 | ||
1658 | init_jit(argv[0]); | |
1659 | _jit = jit_new_state(); | |
1660 | ||
1661 | label = jit_label(); | |
1662 | jit_prolog (); | |
1663 | in = jit_arg (); | |
1664 | jit_getarg (JIT_V0, in); @rem{/* R0 = n */} | |
1665 | zero = jit_beqi (JIT_R0, 0); | |
1666 | jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */ | |
1667 | jit_movi (JIT_R0, 1); | |
1668 | ref = jit_blei (JIT_V0, 2); | |
1669 | jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */} | |
1670 | jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */} | |
1671 | jit_prepare(); | |
1672 | jit_pushargr(JIT_V1); | |
1673 | call = jit_finishi(NULL); | |
1674 | jit_patch_at(call, label); | |
1675 | jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */} | |
1676 | jit_prepare(); | |
1677 | jit_pushargr(JIT_V2); | |
1678 | call = jit_finishi(NULL); | |
1679 | jit_patch_at(call, label); | |
1680 | jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */} | |
1681 | jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */} | |
1682 | ||
1683 | jit_patch(ref); @rem{/* patch jump */} | |
1684 | jit_patch(zero); @rem{/* patch jump */} | |
1685 | jit_retr(JIT_R0); | |
1686 | ||
1687 | @rem{/* call the generated code@comma{} passing 32 as an argument */} | |
1688 | fib = jit_emit(); | |
1689 | jit_clear_state(); | |
1690 | printf("fib(%d) = %d\n", 32, fib(32)); | |
1691 | jit_destroy_state(); | |
1692 | finish_jit(); | |
1693 | return 0; | |
1694 | @} | |
1695 | @end example | |
1696 | ||
1697 | As said above, this is the first example of dynamically compiling | |
1698 | branches. Branch instructions have two operands containing the | |
1699 | values to be compared, and return a @code{jit_note_t *} object | |
1700 | to be patched. | |
1701 | ||
1702 | Because labels final address are only known after calling @code{emit}, | |
1703 | it is required to call @code{patch} or @code{patch_at}, what does | |
1704 | tell @lightning{} that the target to patch is actually a pointer to | |
1705 | a @code{jit_node_t *} object, otherwise, it would assume that is | |
1706 | a pointer to a C function. Note that conditional branches do not | |
1707 | receive a label argument, so they must be patched. | |
1708 | ||
1709 | You need to call @code{patch_at} on the return of value @code{calli}, | |
1710 | @code{finishi}, and @code{calli} if it is actually referencing a label | |
1711 | in the jit code. All branch instructions do not receive a label | |
1712 | argument. Note that @code{movi} is an special case, and patching it | |
1713 | is usually done to get the final address of a label, usually to later | |
1714 | call @code{jmpr}. | |
1715 | ||
1716 | Now, here is the iterative version: | |
1717 | ||
1718 | @example | |
1719 | #include <stdio.h> | |
1720 | #include <lightning.h> | |
1721 | ||
1722 | static jit_state_t *_jit; | |
1723 | ||
1724 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} | |
1725 | ||
1726 | int main(int argc, char *argv[]) | |
1727 | @{ | |
1728 | pifi fib; | |
1729 | jit_node_t *in; @rem{/* offset of the argument */} | |
1730 | jit_node_t *ref; @rem{/* to patch the forward reference */} | |
1731 | jit_node_t *zero; @rem{/* to patch the forward reference */} | |
1732 | jit_node_t *jump; @rem{/* jump to start of loop */} | |
1733 | jit_node_t *loop; @rem{/* start of the loop */} | |
1734 | ||
1735 | init_jit(argv[0]); | |
1736 | _jit = jit_new_state(); | |
1737 | ||
1738 | jit_prolog (); | |
1739 | in = jit_arg (); | |
1740 | jit_getarg (JIT_R0, in); @rem{/* R0 = n */} | |
1741 | zero = jit_beqi (JIT_R0, 0); | |
1742 | jit_movr (JIT_R1, JIT_R0); | |
1743 | jit_movi (JIT_R0, 1); | |
1744 | ref = jit_blti (JIT_R1, 2); | |
1745 | jit_subi (JIT_R2, JIT_R2, 2); | |
1746 | jit_movr (JIT_R1, JIT_R0); | |
1747 | ||
1748 | loop= jit_label(); | |
1749 | jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */} | |
1750 | jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */ | |
1751 | jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */ | |
1752 | jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */ | |
1753 | jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */ | |
1754 | jit_patch_at(jump, loop); | |
1755 | ||
1756 | jit_patch(ref); @rem{/* patch forward jump */} | |
1757 | jit_patch(zero); @rem{/* patch forward jump */} | |
1758 | jit_retr (JIT_R0); | |
1759 | ||
1760 | @rem{/* call the generated code@comma{} passing 36 as an argument */} | |
1761 | fib = jit_emit(); | |
1762 | jit_clear_state(); | |
1763 | printf("fib(%d) = %d\n", 36, fib(36)); | |
1764 | jit_destroy_state(); | |
1765 | finish_jit(); | |
1766 | return 0; | |
1767 | @} | |
1768 | @end example | |
1769 | ||
1770 | This code calculates the recurrence relation using iteration (a | |
1771 | @code{for} loop in high-level languages). There are no function | |
1772 | calls anymore: instead, there is a backward jump (the @code{bnei} at | |
1773 | the end of the loop). | |
1774 | ||
1775 | Note that the program must remember the address for backward jumps; | |
1776 | for forward jumps it is only required to remember the jump code, | |
1777 | and call @code{patch} for the implicit label. | |
1778 | ||
1779 | @node Reentrancy | |
1780 | @chapter Re-entrant usage of @lightning{} | |
1781 | ||
1782 | @lightning{} uses the special @code{_jit} identifier. To be able | |
1783 | to be able to use multiple jit generation states at the same | |
1784 | time, it is required to used code similar to: | |
1785 | ||
1786 | @example | |
1787 | struct jit_state lightning; | |
1788 | #define lightning _jit | |
1789 | @end example | |
1790 | ||
1791 | This will cause the symbol defined to @code{_jit} to be passed as | |
1792 | the first argument to the underlying @lightning{} implementation, | |
1793 | that is usually a function with an @code{_} (underscode) prefix | |
1794 | and with an argument named @code{_jit}, in the pattern: | |
1795 | ||
1796 | @example | |
1797 | static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t); | |
1798 | #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v); | |
1799 | @end example | |
1800 | ||
1801 | The reason for this is to use the same syntax as the initial lightning | |
1802 | implementation and to avoid needing the user to keep adding an extra | |
1803 | argument to every call, as multiple jit states generating code in | |
1804 | paralell should be very uncommon. | |
1805 | ||
519a9ea1 | 1806 | @node Registers |
4a71579b PC |
1807 | @chapter Accessing the whole register file |
1808 | ||
1809 | As mentioned earlier in this chapter, all @lightning{} back-ends are | |
1810 | guaranteed to have at least six general-purpose integer registers and | |
1811 | six floating-point registers, but many back-ends will have more. | |
1812 | ||
1813 | To access the entire register files, you can use the | |
1814 | @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They | |
1815 | accept a parameter that identifies the register number, which | |
1816 | must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM} | |
1817 | and @code{JIT_F_NUM} respectively; the number need not be | |
1818 | constant. Of course, expressions like @code{JIT_R0} and | |
1819 | @code{JIT_R(0)} denote the same register, and likewise for | |
1820 | integer callee-saved, or floating-point, registers. | |
1821 | ||
519a9ea1 PC |
1822 | @section Scratch registers |
1823 | ||
1824 | For operations, @lightning{} does not support directly, like storing | |
1825 | a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to | |
1826 | acquire and release a scratch register as in the following pattern: | |
1827 | ||
1828 | @example | |
1829 | jit_int32_t reg = jit_get_reg (jit_class_gpr); | |
1830 | jit_movi (reg, immediate); | |
1831 | jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg); | |
1832 | jit_unget_reg (reg); | |
1833 | @end example | |
1834 | ||
1835 | As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and | |
1836 | reloads but don't follow branches, the code between both must be in | |
1837 | the same basic block and must not contain any branches as in the | |
1838 | following (bad) example. | |
1839 | ||
1840 | @example | |
1841 | jit_int32_t reg = jit_get_reg (jit_class_gpr); | |
1842 | jit_ldxi (reg, JIT_V0, offset); | |
1843 | jump = jit_bnei (reg, V0); | |
1844 | jit_movr (JIT_V1, reg); | |
1845 | jit_patch (jump); | |
1846 | jit_unget_reg (reg); | |
1847 | @end example | |
1848 | ||
4a71579b PC |
1849 | @node Customizations |
1850 | @chapter Customizations | |
1851 | ||
1852 | Frequently it is desirable to have more control over how code is | |
1853 | generated or how memory is used during jit generation or execution. | |
1854 | ||
1855 | @section Memory functions | |
1856 | To aid in complete control of memory allocation and deallocation | |
1857 | @lightning{} provides wrappers that default to standard @code{malloc}, | |
1858 | @code{realloc} and @code{free}. These are loosely based on the | |
1859 | GNU GMP counterparts, with the difference that they use the same | |
1860 | prototype of the system allocation functions, that is, no @code{size} | |
1861 | for @code{free} or @code{old_size} for @code{realloc}. | |
1862 | ||
1863 | @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *)) | |
1864 | @lightning{} guarantees that memory is only allocated or released | |
1865 | using these wrapped functions, but you must note that if lightning | |
1866 | was linked to GNU binutils, malloc is probably will be called multiple | |
1867 | times from there when initializing the disassembler. | |
1868 | ||
1869 | Because @code{init_jit} may call memory functions, if you need to call | |
1870 | @code{jit_set_memory_functions}, it must be called before @code{init_jit}, | |
1871 | otherwise, when calling @code{finish_jit}, a pointer allocated with the | |
1872 | previous or default wrappers will be passed. | |
1873 | @end deftypefun | |
1874 | ||
1875 | @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *)) | |
1876 | Get the current memory allocation function. Also, unlike the GNU GMP | |
1877 | counterpart, it is an error to pass @code{NULL} pointers as arguments. | |
1878 | @end deftypefun | |
1879 | ||
79bfeef6 PC |
1880 | @section Protection |
1881 | Unless an alternate code buffer is used (see below), @code{jit_emit} | |
1882 | set the access protections that the code buffer's memory can be read and | |
1883 | executed, but not modified. One can use the following functions after | |
1884 | @code{jit_emit} but before @code{jit_clear} to temporarily lift the | |
1885 | protection: | |
1886 | ||
1887 | @deftypefun void jit_unprotect () | |
1888 | Changes the access protection that the code buffer's memory can be read and | |
1889 | modified. Before the emitted code can be invoked, @code{jit_protect} | |
1890 | has to be called to reset the change. | |
1891 | ||
1892 | This procedure has no effect when an alternate code buffer (see below) is used. | |
1893 | @end deftypefun | |
1894 | ||
1895 | @deftypefun void jit_protect () | |
1896 | Changes the access protection that the code buffer's memory can be read and | |
1897 | executed. | |
1898 | ||
1899 | This procedure has no effect when an alternate code buffer (see below) is used. | |
1900 | @end deftypefun | |
1901 | ||
4a71579b PC |
1902 | @section Alternate code buffer |
1903 | To instruct @lightning{} to use an alternate code buffer it is required | |
1904 | to call @code{jit_realize} before @code{jit_emit}, and then query states | |
1905 | and customize as appropriate. | |
1906 | ||
1907 | @deftypefun void jit_realize () | |
1908 | Must be called once, before @code{jit_emit}, to instruct @lightning{} | |
1909 | that no other @code{jit_xyz} call will be made. | |
1910 | @end deftypefun | |
1911 | ||
1912 | @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size}) | |
1913 | Returns NULL or the previous value set with @code{jit_set_code}, and | |
1914 | sets the @var{code_size} argument to an appropriate value. | |
1915 | If @code{jit_get_code} is called before @code{jit_emit}, the | |
1916 | @var{code_size} argument is set to the expected amount of bytes | |
1917 | required to generate code. | |
1918 | If @code{jit_get_code} is called after @code{jit_emit}, the | |
1919 | @var{code_size} argument is set to the exact amount of bytes used | |
1920 | by the code. | |
1921 | @end deftypefun | |
1922 | ||
1923 | @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size}) | |
1924 | Instructs @lightning{} to output to the @var{code} argument and | |
1925 | use @var{size} as a guard to not write to invalid memory. If during | |
1926 | @code{jit_emit} @lightning{} finds out that the code would not fit | |
1927 | in @var{size} bytes, it halts code emit and returns @code{NULL}. | |
1928 | @end deftypefun | |
1929 | ||
1930 | A simple example of a loop using an alternate buffer is: | |
1931 | ||
1932 | @example | |
1933 | jit_uint8_t *code; | |
1934 | int *(func)(int); @rem{/* function pointer */} | |
1935 | jit_word_t code_size; | |
1936 | jit_word_t real_code_size; | |
1937 | @rem{...} | |
1938 | jit_realize(); @rem{/* ready to generate code */} | |
1939 | jit_get_code(&code_size); @rem{/* get expected code size */} | |
1940 | code_size = (code_size + 4095) & -4096; | |
1941 | do (;;) @{ | |
1942 | code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE, | |
1943 | MAP_PRIVATE | MAP_ANON, -1, 0); | |
1944 | jit_set_code(code, code_size); | |
1945 | if ((func = jit_emit()) == NULL) @{ | |
1946 | munmap(code, code_size); | |
1947 | code_size += 4096; | |
1948 | @} | |
1949 | @} while (func == NULL); | |
1950 | jit_get_code(&real_code_size); @rem{/* query exact size of the code */} | |
1951 | @end example | |
1952 | ||
1953 | The first call to @code{jit_get_code} should return @code{NULL} and set | |
1954 | the @code{code_size} argument to the expected amount of bytes required | |
1955 | to emit code. | |
1956 | The second call to @code{jit_get_code} is after a successful call to | |
1957 | @code{jit_emit}, and will return the value previously set with | |
1958 | @code{jit_set_code} and set the @code{real_code_size} argument to the | |
1959 | exact amount of bytes used to emit the code. | |
1960 | ||
1961 | @section Alternate data buffer | |
1962 | Sometimes it may be desirable to customize how, or to prevent | |
1963 | @lightning{} from using an extra buffer for constants or debug | |
1964 | annotation. Usually when also using an alternate code buffer. | |
1965 | ||
1966 | @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size}) | |
1967 | Returns @code{NULL} or the previous value set with @code{jit_set_data}, | |
1968 | and sets the @var{data_size} argument to how many bytes are required | |
1969 | for the constants data buffer, and @var{note_size} to how many bytes | |
1970 | are required to store the debug note information. | |
1971 | Note that it always preallocate one debug note entry even if | |
1972 | @code{jit_name} or @code{jit_note} are never called, but will return | |
1973 | zero in the @var{data_size} argument if no constant is required; | |
1974 | constants are only used for the @code{float} and @code{double} operations | |
1975 | that have an immediate argument, and not in all @lightning{} ports. | |
1976 | @end deftypefun | |
1977 | ||
1978 | @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags}) | |
1979 | ||
1980 | @var{data} can be NULL if disabling constants and annotations, otherwise, | |
1981 | a valid pointer must be passed. An assertion is done that the data will | |
1982 | fit in @var{size} bytes (but that is a noop if @lightning{} was built | |
1983 | with @code{-DNDEBUG}). | |
1984 | ||
1985 | @var{size} tells the space in bytes available in @var{data}. | |
1986 | ||
1987 | @var{flags} can be zero to tell to just use the alternate data buffer, | |
1988 | or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE} | |
1989 | ||
1990 | @table @t | |
1991 | @item JIT_DISABLE_DATA | |
1992 | @cindex JIT_DISABLE_DATA | |
1993 | Instructs @lightning{} to not use a constant table, but to use an | |
1994 | alternate method to synthesize those, usually with a larger code | |
1995 | sequence using stack space to transfer the value from a GPR to a | |
1996 | FPR register. | |
1997 | ||
1998 | @item JIT_DISABLE_NOTE | |
1999 | @cindex JIT_DISABLE_NOTE | |
2000 | Instructs @lightning{} to not store file or function name, and | |
2001 | line numbers in the constant buffer. | |
2002 | @end table | |
2003 | @end deftypefun | |
2004 | ||
2005 | A simple example of a preventing usage of a data buffer is: | |
2006 | ||
2007 | @example | |
2008 | @rem{...} | |
2009 | jit_realize(); @rem{/* ready to generate code */} | |
2010 | jit_get_data(NULL, NULL); | |
2011 | jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE); | |
2012 | @rem{...} | |
2013 | @end example | |
2014 | ||
2015 | Or to only use a data buffer, if required: | |
2016 | ||
2017 | @example | |
2018 | jit_uint8_t *data; | |
2019 | jit_word_t data_size; | |
2020 | @rem{...} | |
2021 | jit_realize(); @rem{/* ready to generate code */} | |
2022 | jit_get_data(&data_size, NULL); | |
2023 | if (data_size) | |
2024 | data = malloc(data_size); | |
2025 | else | |
2026 | data = NULL; | |
2027 | jit_set_data(data, data_size, JIT_DISABLE_NOTE); | |
2028 | @rem{...} | |
2029 | if (data) | |
2030 | free(data); | |
2031 | @rem{...} | |
2032 | @end example | |
2033 | ||
2034 | @node Acknowledgements | |
2035 | @chapter Acknowledgements | |
2036 | ||
2037 | As far as I know, the first general-purpose portable dynamic code | |
2038 | generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting. | |
2039 | Further work by Dawson R. Engler resulted in the @sc{vcode} system; | |
2040 | unlike @sc{dcg}, @sc{vcode} used no intermediate representation and | |
2041 | directly inspired @lightning{}. | |
2042 | ||
2043 | Thanks go to Ian Piumarta, who kindly accepted to release his own | |
2044 | program @sc{ccg} under the GNU General Public License, thereby allowing | |
2045 | @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}. | |
2046 | @sc{ccg} provides a way of dynamically assemble programs written in the | |
2047 | underlying architecture's assembly language. So it is not portable, | |
2048 | yet very interesting. | |
2049 | ||
2050 | I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{} | |
2051 | was first developed as a tool to be used in GNU Smalltalk's dynamic | |
2052 | translator from bytecodes to native code. |