| 1 | @ifnottex |
| 2 | @dircategory Software development |
| 3 | @direntry |
| 4 | * lightning: (lightning). Library for dynamic code generation. |
| 5 | @end direntry |
| 6 | @end ifnottex |
| 7 | |
| 8 | @ifnottex |
| 9 | @node Top |
| 10 | @top @lightning{} |
| 11 | |
| 12 | @iftex |
| 13 | @macro comma |
| 14 | @verbatim{|,|} |
| 15 | @end macro |
| 16 | @end iftex |
| 17 | |
| 18 | @ifnottex |
| 19 | @macro comma |
| 20 | @verb{|,|} |
| 21 | @end macro |
| 22 | @end ifnottex |
| 23 | |
| 24 | This document describes @value{TOPIC} the @lightning{} library for |
| 25 | dynamic code generation. |
| 26 | |
| 27 | @menu |
| 28 | * Overview:: What GNU lightning is |
| 29 | * Installation:: Configuring and installing GNU lightning |
| 30 | * The instruction set:: The RISC instruction set used in GNU lightning |
| 31 | * GNU lightning examples:: GNU lightning's examples |
| 32 | * Reentrancy:: Re-entrant usage of GNU lightning |
| 33 | * Registers:: Accessing the whole register file |
| 34 | * Customizations:: Advanced code generation customizations |
| 35 | * Acknowledgements:: Acknowledgements for GNU lightning |
| 36 | @end menu |
| 37 | @end ifnottex |
| 38 | |
| 39 | @node Overview |
| 40 | @chapter Introduction to @lightning{} |
| 41 | |
| 42 | @iftex |
| 43 | This document describes @value{TOPIC} the @lightning{} library for |
| 44 | dynamic code generation. |
| 45 | @end iftex |
| 46 | |
| 47 | Dynamic code generation is the generation of machine code |
| 48 | at runtime. It is typically used to strip a layer of interpretation |
| 49 | by allowing compilation to occur at runtime. One of the most |
| 50 | well-known applications of dynamic code generation is perhaps that |
| 51 | of interpreters that compile source code to an intermediate bytecode |
| 52 | form, which is then recompiled to machine code at run-time: this |
| 53 | approach effectively combines the portability of bytecode |
| 54 | representations with the speed of machine code. Another common |
| 55 | application of dynamic code generation is in the field of hardware |
| 56 | simulators and binary emulators, which can use the same techniques |
| 57 | to translate simulated instructions to the instructions of the |
| 58 | underlying machine. |
| 59 | |
| 60 | Yet other applications come to mind: for example, windowing |
| 61 | @dfn{bitblt} operations, matrix manipulations, and network packet |
| 62 | filters. Albeit very powerful and relatively well known within the |
| 63 | compiler community, dynamic code generation techniques are rarely |
| 64 | exploited to their full potential and, with the exception of the |
| 65 | two applications described above, have remained curiosities because |
| 66 | of their portability and functionality barriers: binary instructions |
| 67 | are generated, so programs using dynamic code generation must be |
| 68 | retargeted for each machine; in addition, coding a run-time code |
| 69 | generator is a tedious and error-prone task more than a difficult one. |
| 70 | |
| 71 | @lightning{} provides a portable, fast and easily retargetable dynamic |
| 72 | code generation system. |
| 73 | |
| 74 | To be portable, @lightning{} abstracts over current architectures' |
| 75 | quirks and unorthogonalities. The interface that it exposes to is that |
| 76 | of a standardized RISC architecture loosely based on the SPARC and MIPS |
| 77 | chips. There are a few general-purpose registers (six, not including |
| 78 | those used to receive and pass parameters between subroutines), and |
| 79 | arithmetic operations involve three operands---either three registers |
| 80 | or two registers and an arbitrarily sized immediate value. |
| 81 | |
| 82 | On one hand, this architecture is general enough that it is possible to |
| 83 | generate pretty efficient code even on CISC architectures such as the |
| 84 | Intel x86 or the Motorola 68k families. On the other hand, it matches |
| 85 | real architectures closely enough that, most of the time, the |
| 86 | compiler's constant folding pass ends up generating code which |
| 87 | assembles machine instructions without further tests. |
| 88 | |
| 89 | @node Installation |
| 90 | @chapter Configuring and installing @lightning{} |
| 91 | |
| 92 | Here we will assume that your system already has the dependencies |
| 93 | necessary to build @lightning{}. For more on dependencies, see |
| 94 | @lightning{}'s @file{README-hacking} file. |
| 95 | |
| 96 | The first thing to do to build @lightning{} is to configure the |
| 97 | program, picking the set of macros to be used on the host |
| 98 | architecture; this configuration is automatically performed by |
| 99 | the @file{configure} shell script; to run it, merely type: |
| 100 | @example |
| 101 | ./configure |
| 102 | @end example |
| 103 | |
| 104 | The @file{configure} accepts the @code{--enable-disassembler} option, |
| 105 | hat enables linking to GNU binutils and optionally print human readable |
| 106 | disassembly of the jit code. This option can be disabled by the |
| 107 | @code{--disable-disassembler} option. |
| 108 | |
| 109 | @file{configure} also accepts the @code{--enable-devel-disassembler}, |
| 110 | option useful to check exactly hat machine instructions were generated |
| 111 | for a @lightning{} instrction. Basically mixing @code{jit_print} and |
| 112 | @code{jit_disassembly}. |
| 113 | |
| 114 | The @code{--enable-assertions} option, which enables several consistency |
| 115 | hecks in the run-time assemblers. These are not usually needed, so you |
| 116 | can decide to simply forget about it; also remember that these consistency |
| 117 | checks tend to slow down your code generator. |
| 118 | |
| 119 | The @code{--enable-devel-strong-type-checking} option that does extra type |
| 120 | checking using @code{assert}. This option also enables the |
| 121 | @code{--enable-assertions} unless it is explicitly disabled. |
| 122 | |
| 123 | The option @code{--enable-devel-get-jit-size} should only be used |
| 124 | when doing updates or maintenance to lightning. It regenerates the |
| 125 | @code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when |
| 126 | translating a @lightning{} instruction to machine code. |
| 127 | |
| 128 | After you've configured @lightning{}, run @file{make} as usual. |
| 129 | |
| 130 | @lightning{} has an extensive set of tests to validate it is working |
| 131 | correctly in the build host. To test it run: |
| 132 | @example |
| 133 | make check |
| 134 | @end example |
| 135 | |
| 136 | The next important step is: |
| 137 | @example |
| 138 | make install |
| 139 | @end example |
| 140 | |
| 141 | This ends the process of installing @lightning{}. |
| 142 | |
| 143 | @node The instruction set |
| 144 | @chapter @lightning{}'s instruction set |
| 145 | |
| 146 | @lightning{}'s instruction set was designed by deriving instructions |
| 147 | that closely match those of most existing RISC architectures, or |
| 148 | that can be easily syntesized if absent. Each instruction is composed |
| 149 | of: |
| 150 | @itemize @bullet |
| 151 | @item |
| 152 | an operation, like @code{sub} or @code{mul} |
| 153 | |
| 154 | @item |
| 155 | most times, a register/immediate flag (@code{r} or @code{i}) |
| 156 | |
| 157 | @item |
| 158 | an unsigned modifier (@code{u}), a type identifier or two, when applicable. |
| 159 | @end itemize |
| 160 | |
| 161 | Examples of legal mnemonics are @code{addr} (integer add, with three |
| 162 | register operands) and @code{muli} (integer multiply, with two |
| 163 | register operands and an immediate operand). Each instruction takes |
| 164 | two or three operands; in most cases, one of them can be an immediate |
| 165 | value instead of a register. |
| 166 | |
| 167 | Most @lightning{} integer operations are signed wordsize operations, |
| 168 | with the exception of operations that convert types, or load or store |
| 169 | values to/from memory. When applicable, the types and C types are as |
| 170 | follow: |
| 171 | |
| 172 | @example |
| 173 | _c @r{signed char} |
| 174 | _uc @r{unsigned char} |
| 175 | _s @r{short} |
| 176 | _us @r{unsigned short} |
| 177 | _i @r{int} |
| 178 | _ui @r{unsigned int} |
| 179 | _l @r{long} |
| 180 | _f @r{float} |
| 181 | _d @r{double} |
| 182 | @end example |
| 183 | |
| 184 | Most integer operations do not need a type modifier, and when loading or |
| 185 | storing values to memory there is an alias to the proper operation |
| 186 | using wordsize operands, that is, if ommited, the type is @r{int} on |
| 187 | 32-bit architectures and @r{long} on 64-bit architectures. Note |
| 188 | that lightning also expects @code{sizeof(void*)} to match the wordsize. |
| 189 | |
| 190 | When an unsigned operation result differs from the equivalent signed |
| 191 | operation, there is a the @code{_u} modifier. |
| 192 | |
| 193 | There are at least seven integer registers, of which six are |
| 194 | general-purpose, while the last is used to contain the frame pointer |
| 195 | (@code{FP}). The frame pointer can be used to allocate and access local |
| 196 | variables on the stack, using the @code{allocai} or @code{allocar} |
| 197 | instruction. |
| 198 | |
| 199 | Of the general-purpose registers, at least three are guaranteed to be |
| 200 | preserved across function calls (@code{V0}, @code{V1} and |
| 201 | @code{V2}) and at least three are not (@code{R0}, @code{R1} and |
| 202 | @code{R2}). Six registers are not very much, but this |
| 203 | restriction was forced by the need to target CISC architectures |
| 204 | which, like the x86, are poor of registers; anyway, backends can |
| 205 | specify the actual number of available registers with the calls |
| 206 | @code{JIT_R_NUM} (for caller-save registers) and @code{JIT_V_NUM} |
| 207 | (for callee-save registers). |
| 208 | |
| 209 | There are at least six floating-point registers, named @code{F0} to |
| 210 | @code{F5}. These are usually caller-save and are separate from the integer |
| 211 | registers on the supported architectures; on Intel architectures, |
| 212 | in 32 bit mode if SSE2 is not available or use of X87 is forced, |
| 213 | the register stack is mapped to a flat register file. As for the |
| 214 | integer registers, the macro @code{JIT_F_NUM} yields the number of |
| 215 | floating-point registers. |
| 216 | |
| 217 | The complete instruction set follows; as you can see, most non-memory |
| 218 | operations only take integers (either signed or unsigned) as operands; |
| 219 | this was done in order to reduce the instruction set, and because most |
| 220 | architectures only provide word and long word operations on registers. |
| 221 | There are instructions that allow operands to be extended to fit a larger |
| 222 | data type, both in a signed and in an unsigned way. |
| 223 | |
| 224 | @table @b |
| 225 | @item Binary ALU operations |
| 226 | These accept three operands; the last one can be an immediate. |
| 227 | @code{addx} operations must directly follow @code{addc}, and |
| 228 | @code{subx} must follow @code{subc}; otherwise, results are undefined. |
| 229 | Most, if not all, architectures do not support @r{float} or @r{double} |
| 230 | immediate operands; lightning emulates those operations by moving the |
| 231 | immediate to a temporary register and emiting the call with only |
| 232 | register operands. |
| 233 | @example |
| 234 | addr _f _d O1 = O2 + O3 |
| 235 | addi _f _d O1 = O2 + O3 |
| 236 | addxr O1 = O2 + (O3 + carry) |
| 237 | addxi O1 = O2 + (O3 + carry) |
| 238 | addcr O1 = O2 + O3, set carry |
| 239 | addci O1 = O2 + O3, set carry |
| 240 | subr _f _d O1 = O2 - O3 |
| 241 | subi _f _d O1 = O2 - O3 |
| 242 | subxr O1 = O2 - (O3 + carry) |
| 243 | subxi O1 = O2 - (O3 + carry) |
| 244 | subcr O1 = O2 - O3, set carry |
| 245 | subci O1 = O2 - O3, set carry |
| 246 | rsbr _f _d O1 = O3 - O1 |
| 247 | rsbi _f _d O1 = O3 - O1 |
| 248 | mulr _f _d O1 = O2 * O3 |
| 249 | muli _f _d O1 = O2 * O3 |
| 250 | hmulr _u O1 = ((O2 * O3) >> WORDSIZE) |
| 251 | hmuli _u O1 = ((O2 * O3) >> WORDSIZE) |
| 252 | divr _u _f _d O1 = O2 / O3 |
| 253 | divi _u _f _d O1 = O2 / O3 |
| 254 | remr _u O1 = O2 % O3 |
| 255 | remi _u O1 = O2 % O3 |
| 256 | andr O1 = O2 & O3 |
| 257 | andi O1 = O2 & O3 |
| 258 | orr O1 = O2 | O3 |
| 259 | ori O1 = O2 | O3 |
| 260 | xorr O1 = O2 ^ O3 |
| 261 | xori O1 = O2 ^ O3 |
| 262 | lshr O1 = O2 << O3 |
| 263 | lshi O1 = O2 << O3 |
| 264 | rshr _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.} |
| 265 | rshi _u O1 = O2 >> O3@footnote{The sign bit is propagated unless using the @code{_u} modifier.} |
| 266 | lrotr O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3)) |
| 267 | lroti O1 = (O2 << O3) | (O3 >> (WORDSIZE - O3)) |
| 268 | rrotr O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3)) |
| 269 | rroti O1 = (O2 >> O3) | (O3 << (WORDSIZE - O3)) |
| 270 | movzr O1 = O3 ? O1 : O2 |
| 271 | movnr O1 = O3 ? O2 : O1 |
| 272 | @end example |
| 273 | |
| 274 | Note that @code{lrotr}, @code{lroti}, @code{rrotr} and @code{rroti} |
| 275 | are described as the fallback operation. These are bit shift/rotation |
| 276 | operation. |
| 277 | |
| 278 | @item Four operand binary ALU operations |
| 279 | These accept two result registers, and two operands; the last one can |
| 280 | be an immediate. The first two arguments cannot be the same register. |
| 281 | |
| 282 | @code{qmul} stores the low word of the result in @code{O1} and the |
| 283 | high word in @code{O2}. For unsigned multiplication, @code{O2} zero |
| 284 | means there was no overflow. For signed multiplication, no overflow |
| 285 | check is based on sign, and can be detected if @code{O2} is zero or |
| 286 | minus one. |
| 287 | |
| 288 | @code{qdiv} stores the quotient in @code{O1} and the remainder in |
| 289 | @code{O2}. It can be used as quick way to check if a division is |
| 290 | exact, in which case the remainder is zero. |
| 291 | |
| 292 | @code{qlsh} shifts from 0 to @emph{wordsize}, doing a normal left |
| 293 | shift for the first result register and setting the second result |
| 294 | resister to the overflow bits. @code{qlsh} can be used as a quick |
| 295 | way to multiply by powers of two. |
| 296 | |
| 297 | @code{qrsh} shifts from 0 to @emph{wordsize}, doing a normal right |
| 298 | shift for the first result register and setting the second result |
| 299 | register to the overflow bits. @code{qrsh} can be used as a quick |
| 300 | way to divide by powers of two. |
| 301 | |
| 302 | Note that @code{qlsh} and @code{qrsh} are basically implemented as |
| 303 | two shifts. It is undefined behavior to pass a value not in the range |
| 304 | 0 to @emph{wordsize}. Most cpus will usually @code{and} the shift |
| 305 | amount with @emph{wordsize} - 1, or possible use the @emph{remainder}. |
| 306 | @lightning{} only generates code to specially handle 0 and @emph{wordsize} |
| 307 | shifts. Since in a code generator for a @emph{safe language} should |
| 308 | usually check the shift amount, these instructions usually should be |
| 309 | used as a fast path to check for division without remainder or |
| 310 | multiplication that does not overflow. |
| 311 | |
| 312 | @example |
| 313 | qmulr _u O1 O2 = O3 * O4 |
| 314 | qmuli _u O1 O2 = O3 * O4 |
| 315 | qdivr _u O1 O2 = O3 / O4 |
| 316 | qdivi _u O1 O2 = O3 / O4 |
| 317 | qlshr _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4) |
| 318 | qlshi _u O1 = O3 << O4, O2 = O3 >> (WORDSIZE - O4) |
| 319 | qrshr _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4) |
| 320 | qrshi _u O1 = O3 >> O4, O2 = O3 << (WORDSIZE - O4) |
| 321 | @end example |
| 322 | |
| 323 | These four operand ALU operations are only defined for float operands. |
| 324 | |
| 325 | @example |
| 326 | fmar _f _d O1 = O2 * O3 + O4 |
| 327 | fmai _f _d O1 = O2 * O3 + O4 |
| 328 | fmsr _f _d O1 = O2 * O3 - O4 |
| 329 | fmsi _f _d O1 = O2 * O3 - O4 |
| 330 | fnmar _f _d O1 = -O2 * O3 - O4 |
| 331 | fnmai _f _d O1 = -O2 * O3 - O4 |
| 332 | fnmsr _f _d O1 = -O2 * O3 + O4 |
| 333 | fnmsi _f _d O1 = -O2 * O3 + O4 |
| 334 | @end example |
| 335 | |
| 336 | These are a family of fused multiply-add instructions. |
| 337 | Note that @lightning{} does not handle rounding modes nor math exceptions. |
| 338 | Also note that not all backends provide a instruction for the equivalent |
| 339 | @lightning{} instruction presented above. Some are completely implemented |
| 340 | as fallbacks and some are composed of one or more instructions. For common |
| 341 | input this should not cause major issues, but note that when implemented by |
| 342 | the cpu, these are implemented as the multiplication calculated with infinite |
| 343 | precision, and after the addition step rounding is done. Due to this, For |
| 344 | specially crafted input different ports might show different output. When |
| 345 | implemented by the CPU, it is also possible to have exceptions that do |
| 346 | not happen if implemented as a fallback. |
| 347 | |
| 348 | @item Unary ALU operations |
| 349 | These accept two operands, the first must be a register and the |
| 350 | second is a register if the @code{r} modifier is used, otherwise, |
| 351 | the @code{i} modifier is used and the second argument is a constant. |
| 352 | |
| 353 | @example |
| 354 | negr _f _d O1 = -O2 |
| 355 | negi _f _d O1 = -O2 |
| 356 | comr O1 = ~O2 |
| 357 | comi O1 = ~O2 |
| 358 | clor O1 = number of leading one bits in O2 |
| 359 | cloi O1 = number of leading one bits in O2 |
| 360 | clzr O1 = number of leading zero bits in O2 |
| 361 | clzi O1 = number of leading zero bits in O2 |
| 362 | ctor O1 = number of trailing one bits in O2 |
| 363 | ctoi O1 = number of trailing one bits in O2 |
| 364 | ctzr O1 = number of trailing zero bits in O2 |
| 365 | ctzi O1 = number of trailing zero bits in O2 |
| 366 | rbitr O1 = bits of O2 reversed |
| 367 | rbiti O1 = bits of O2 reversed |
| 368 | popcntr O1 = number of bits set in O2 |
| 369 | popcnti O1 = number of bits set in O2 |
| 370 | @end example |
| 371 | |
| 372 | Note that @code{ctzr} is basically equivalent of a @code{C} call |
| 373 | @code{ffs} but indexed at bit zero, not one. |
| 374 | |
| 375 | Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input |
| 376 | value of zero is not an error, it just returns the number of bits |
| 377 | in a word, 64 if @lightning{} generates 64 bit instructions, otherwise |
| 378 | it returns 32. |
| 379 | |
| 380 | The @code{clor} and @code{ctor} are just counterparts of the versions |
| 381 | that search for zero bits. |
| 382 | |
| 383 | These unary ALU operations are only defined for float operands. |
| 384 | |
| 385 | @example |
| 386 | absr _f _d O1 = fabs(O2) |
| 387 | absi _f _d O1 = fabs(O2) |
| 388 | sqrtr _f _d O1 = sqrt(O2) |
| 389 | sqrti _f _d O1 = sqrt(O2) |
| 390 | @end example |
| 391 | |
| 392 | Note that for @code{float} and @code{double} unary operations, @lightning{} |
| 393 | will generate code to actually execute the operation at runtime. |
| 394 | |
| 395 | @item Compare instructions |
| 396 | These accept three operands; again, the last can be an immediate. |
| 397 | The last two operands are compared, and the first operand, that must be |
| 398 | an integer register, is set to either 0 or 1, according to whether the |
| 399 | given condition was met or not. |
| 400 | |
| 401 | The conditions given below are for the standard behavior of C, |
| 402 | where the ``unordered'' comparison result is mapped to false. |
| 403 | |
| 404 | @example |
| 405 | ltr _u _f _d O1 = (O2 < O3) |
| 406 | lti _u _f _d O1 = (O2 < O3) |
| 407 | ler _u _f _d O1 = (O2 <= O3) |
| 408 | lei _u _f _d O1 = (O2 <= O3) |
| 409 | gtr _u _f _d O1 = (O2 > O3) |
| 410 | gti _u _f _d O1 = (O2 > O3) |
| 411 | ger _u _f _d O1 = (O2 >= O3) |
| 412 | gei _u _f _d O1 = (O2 >= O3) |
| 413 | eqr _f _d O1 = (O2 == O3) |
| 414 | eqi _f _d O1 = (O2 == O3) |
| 415 | ner _f _d O1 = (O2 != O3) |
| 416 | nei _f _d O1 = (O2 != O3) |
| 417 | unltr _f _d O1 = !(O2 >= O3) |
| 418 | unler _f _d O1 = !(O2 > O3) |
| 419 | ungtr _f _d O1 = !(O2 <= O3) |
| 420 | unger _f _d O1 = !(O2 < O3) |
| 421 | uneqr _f _d O1 = !(O2 < O3) && !(O2 > O3) |
| 422 | ltgtr _f _d O1 = !(O2 >= O3) || !(O2 <= O3) |
| 423 | ordr _f _d O1 = (O2 == O2) && (O3 == O3) |
| 424 | unordr _f _d O1 = (O2 != O2) || (O3 != O3) |
| 425 | @end example |
| 426 | |
| 427 | @item Transfer operations |
| 428 | These accept two operands; for @code{ext} both of them must be |
| 429 | registers, while @code{mov} accepts an immediate value as the second |
| 430 | operand. |
| 431 | |
| 432 | Unlike @code{movr} and @code{movi}, the other instructions are used |
| 433 | to truncate a wordsize operand to a smaller integer data type or to |
| 434 | convert float data types. You can also use @code{extr} to convert an |
| 435 | integer to a floating point value: the usual options are @code{extr_f} |
| 436 | and @code{extr_d}. |
| 437 | |
| 438 | @example |
| 439 | movr _f _d O1 = O2 |
| 440 | movi _f _d O1 = O2 |
| 441 | extr _c _uc _s _us _i _ui _f _d O1 = O2 |
| 442 | truncr _f _d O1 = trunc(O2) |
| 443 | extr O1 = sign_extend(O2[O3:O3+04]) |
| 444 | extr_u O1 = O2[O3:O3+04] |
| 445 | depr O1[O3:O3+O4] = O2 |
| 446 | @end example |
| 447 | |
| 448 | @code{extr}, @code{extr_u} and @code{depr} are useful to access @code{C} |
| 449 | compatible bit fields, provided that these are contained in a machine |
| 450 | word. @code{extr} is used to @emph{extract} and signed extend a value |
| 451 | from a bit field. @code{extr_u} is used to @emph{extract} and zero |
| 452 | extend a value from a bit field. @code{depr} is used to @emph{deposit} |
| 453 | a value into a bit field. |
| 454 | |
| 455 | @example |
| 456 | extr(result, source, offset, length) |
| 457 | extr_u(result, source, offset, length) |
| 458 | depr(result, source, offset, length) |
| 459 | @end example |
| 460 | |
| 461 | A common way to declare @code{C} and @lightning{} compatible bit fields is: |
| 462 | @example |
| 463 | union @{ |
| 464 | struct @{ |
| 465 | jit_word_t signed_bits: @code{length}; |
| 466 | jit_uword_t unsigned_bits: @code{length}; |
| 467 | ... |
| 468 | @} s; |
| 469 | jit_word_t signed_value; |
| 470 | jit_uword_t unsigned_value; |
| 471 | @} u; |
| 472 | @end example |
| 473 | |
| 474 | In 64-bit architectures it may be required to use @code{truncr_f_i}, |
| 475 | @code{truncr_f_l}, @code{truncr_d_i} and @code{truncr_d_l} to match |
| 476 | the equivalent C code. Only the @code{_i} modifier is available in |
| 477 | 32-bit architectures. |
| 478 | |
| 479 | @example |
| 480 | truncr_f_i <int> O1 = <float> O2 |
| 481 | truncr_f_l <long>O1 = <float> O2 |
| 482 | truncr_d_i <int> O1 = <double>O2 |
| 483 | truncr_d_l <long>O1 = <double>O2 |
| 484 | @end example |
| 485 | |
| 486 | The float conversion operations are @emph{destination first, |
| 487 | source second}, but the order of the types is reversed. This happens |
| 488 | for historical reasons. |
| 489 | |
| 490 | @example |
| 491 | extr_f_d <double>O1 = <float> O2 |
| 492 | extr_d_f <float> O1 = <double>O2 |
| 493 | @end example |
| 494 | |
| 495 | The float to/from integer transfer operations are also @emph{destination |
| 496 | first, source second}. These were added later, but follow the pattern |
| 497 | of historic patterns. |
| 498 | |
| 499 | @example |
| 500 | movr_w_f <float>O1 = <int>O2 |
| 501 | movi_w_f <float>O1 = <int>O2 |
| 502 | movr_f_w <int>O1 = <float>O2 |
| 503 | movi_f_w <int>O1 = <float>O2 |
| 504 | movr_w_d <double>O1 = <long>O2 |
| 505 | movi_w_d <double>O1 = <long>O2 |
| 506 | movr_d_w <long>O1 = <double>O2 |
| 507 | movi_d_w <long>O1 = <double>O2 |
| 508 | movr_ww_d <double>O1 = [<int>O2:<int>O3] |
| 509 | movi_ww_d <double>O1 = [<int>O2:<int>O3] |
| 510 | movr_d_ww [<int>O1:<int>O2] = <double>O3 |
| 511 | movi_d_ww [<int>O1:<int>O2] = <double>O3 |
| 512 | @end example |
| 513 | |
| 514 | These are used to transfer bits to/from floats to/from integers, and are |
| 515 | useful to access bits of floating point values. |
| 516 | |
| 517 | @code{movr_w_d}, @code{movi_w_d}, @code{movr_d_w} and @code{movi_d_w} are |
| 518 | only available in 64-bit. Conversely, @code{movr_ww_d}, @code{movi_ww_d}, |
| 519 | @code{movr_d_ww} and @code{movi_d_ww} are only available in 32-bit. |
| 520 | For the int pair to/from double transfers, integer arguments must respect |
| 521 | endianess, to match how the cpu handles the verbatim byte values. |
| 522 | |
| 523 | @item Network extensions |
| 524 | These accept two operands, both of which must be registers; these |
| 525 | two instructions actually perform the same task, yet they are |
| 526 | assigned to two mnemonics for the sake of convenience and |
| 527 | completeness. As usual, the first operand is the destination and |
| 528 | the second is the source. |
| 529 | The @code{_ul} variant is only available in 64-bit architectures. |
| 530 | @example |
| 531 | htonr _us _ui _ul @r{Host-to-network (big endian) order} |
| 532 | ntohr _us _ui _ul @r{Network-to-host order } |
| 533 | @end example |
| 534 | |
| 535 | @code{bswapr} can be used to unconditionally byte-swap an operand. |
| 536 | On little-endian architectures, @code{htonr} and @code{ntohr} resolve |
| 537 | to this. |
| 538 | The @code{_ul} variant is only available in 64-bit architectures. |
| 539 | @example |
| 540 | bswapr _us _ui _ul 01 = byte_swap(02) |
| 541 | @end example |
| 542 | |
| 543 | @item Load operations |
| 544 | @code{ld} accepts two operands while @code{ldx} accepts three; |
| 545 | in both cases, the last can be either a register or an immediate |
| 546 | value. Values are extended (with or without sign, according to |
| 547 | the data type specification) to fit a whole register. |
| 548 | The @code{_ui} and @code{_l} types are only available in 64-bit |
| 549 | architectures. For convenience, there is a version without a |
| 550 | type modifier for integer or pointer operands that uses the |
| 551 | appropriate wordsize call. |
| 552 | @example |
| 553 | ldr _c _uc _s _us _i _ui _l _f _d O1 = *O2 |
| 554 | ldi _c _uc _s _us _i _ui _l _f _d O1 = *O2 |
| 555 | ldxr _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3) |
| 556 | ldxi _c _uc _s _us _i _ui _l _f _d O1 = *(O2+O3) |
| 557 | @end example |
| 558 | |
| 559 | @item Store operations |
| 560 | @code{st} accepts two operands while @code{stx} accepts three; in |
| 561 | both cases, the first can be either a register or an immediate |
| 562 | value. Values are sign-extended to fit a whole register. |
| 563 | @example |
| 564 | str _c _s _i _l _f _d *O1 = O2 |
| 565 | sti _c _s _i _l _f _d *O1 = O2 |
| 566 | stxr _c _s _i _l _f _d *(O1+O2) = O3 |
| 567 | stxi _c _s _i _l _f _d *(O1+O2) = O3 |
| 568 | @end example |
| 569 | Note that the unsigned type modifier is not available, as the store |
| 570 | only writes to the 1, 2, 4 or 8 sized memory address. |
| 571 | The @code{_l} type is only available in 64-bit architectures, and for |
| 572 | convenience, there is a version without a type modifier for integer or |
| 573 | pointer operands that uses the appropriate wordsize call. |
| 574 | |
| 575 | @item Unaligned memory access |
| 576 | These allow access to integers of size 3, in 32-bit, and extra sizes |
| 577 | 5, 6 and 7 in 64-bit. |
| 578 | For floating point values only support for size 4 and 8 is provided. |
| 579 | @example |
| 580 | unldr O1 = *(signed O3 byte integer)* = O2 |
| 581 | unldi O1 = *(signed O3 byte integer)* = O2 |
| 582 | unldr_u O1 = *(unsigned O3 byte integer)* = O2 |
| 583 | unldi_u O1 = *(unsigned O3 byte integer)* = O2 |
| 584 | unldr_x O1 = *(O3 byte float)* = O2 |
| 585 | unldi_x O1 = *(O3 byte float)* = O2 |
| 586 | unstr *(O3 byte integer)O1 = O2 |
| 587 | unsti *(O3 byte integer)O1 = O2 |
| 588 | unstr_x *(O3 byte float)O1 = O2 |
| 589 | unsti_x *(O3 byte float)O1 = O2 |
| 590 | @end example |
| 591 | With the exception of non standard sized integers, these might be |
| 592 | implemented as normal loads and stores, if the processor supports |
| 593 | unaligned memory access, or, mode can be chosen at jit initialization |
| 594 | time, to generate or not generate, code that does trap on unaligned |
| 595 | memory access. Letting the kernel trap means smaller code generation |
| 596 | as it is required to check alignment at runtime@footnote{This requires changing jit_cpu.unaligned to 0 to disable or 1 to enable unaligned code generation. Not all ports have the C jit_cpu.unaligned value.}. |
| 597 | |
| 598 | @item Argument management |
| 599 | These are: |
| 600 | @example |
| 601 | prepare (not specified) |
| 602 | va_start (not specified) |
| 603 | pushargr _c _uc _s _us _i _ui _l _f _d |
| 604 | pushargi _c _uc _s _us _i _ui _l _f _d |
| 605 | va_push (not specified) |
| 606 | arg _c _uc _s _us _i _ui _l _f _d |
| 607 | getarg _c _uc _s _us _i _ui _l _f _d |
| 608 | va_arg _d |
| 609 | putargr _c _uc _s _us _i _ui _l _f _d |
| 610 | putargi _c _uc _s _us _i _ui _l _f _d |
| 611 | ret (not specified) |
| 612 | retr _c _uc _s _us _i _ui _l _f _d |
| 613 | reti _c _uc _s _us _i _ui _l _f _d |
| 614 | reti _f _d |
| 615 | va_end (not specified) |
| 616 | retval _c _uc _s _us _i _ui _l _f _d |
| 617 | epilog (not specified) |
| 618 | @end example |
| 619 | As with other operations that use a type modifier, the @code{_ui} and |
| 620 | @code{_l} types are only available in 64-bit architectures, but there |
| 621 | are operations without a type modifier that alias to the appropriate |
| 622 | integer operation with wordsize operands. |
| 623 | |
| 624 | @code{prepare}, @code{pusharg}, and @code{retval} are used by the caller, |
| 625 | while @code{arg}, @code{getarg} and @code{ret} are used by the callee. |
| 626 | A code snippet that wants to call another procedure and has to pass |
| 627 | arguments must, in order: use the @code{prepare} instruction and use |
| 628 | the @code{pushargr} or @code{pushargi} to push the arguments @strong{in |
| 629 | left to right order}; and use @code{finish} or @code{call} (explained below) |
| 630 | to perform the actual call. |
| 631 | |
| 632 | Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when |
| 633 | handling integer types can be used without a type modifier. |
| 634 | It is suggested to use matching type modifiers to @code{arg}, @code{putarg} |
| 635 | and @code{getarg} otherwise problems will happen if generating jit for |
| 636 | environments that require arguments to be truncated and zero or sign |
| 637 | extended by the caller and/or excess arguments might be passed packed |
| 638 | in the stack. Currently only Apple systems with @code{aarch64} cpus are |
| 639 | known to have this restriction. |
| 640 | |
| 641 | @code{va_start} returns a @code{C} compatible @code{va_list}. To fetch |
| 642 | arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles. |
| 643 | @code{va_push} is required when passing a @code{va_list} to another function, |
| 644 | because not all architectures expect it as a single pointer. Known case |
| 645 | is DEC Alpha, that requires it as a structure passed by value. |
| 646 | |
| 647 | @code{arg}, @code{getarg} and @code{putarg} are used by the callee. |
| 648 | @code{arg} is different from other instruction in that it does not |
| 649 | actually generate any code: instead, it is a function which returns |
| 650 | a value to be passed to @code{getarg} or @code{putarg}. @footnote{``Return |
| 651 | a value'' means that @lightning{} code that compile these |
| 652 | instructions return a value when expanded.} You should call |
| 653 | @code{arg} as soon as possible, before any function call or, more |
| 654 | easily, right after the @code{prolog} instructions |
| 655 | (which is treated later). |
| 656 | |
| 657 | @code{getarg} accepts a register argument and a value returned by |
| 658 | @code{arg}, and will move that argument to the register, extending |
| 659 | it (with or without sign, according to the data type specification) |
| 660 | to fit a whole register. These instructions are more intimately |
| 661 | related to the usage of the @lightning{} instruction set in code |
| 662 | that generates other code, so they will be treated more |
| 663 | specifically in @ref{GNU lightning examples, , Generating code at |
| 664 | run-time}. |
| 665 | |
| 666 | @code{putarg} is a mix of @code{getarg} and @code{pusharg} in that |
| 667 | it accepts as first argument a register or immediate, and as |
| 668 | second argument a value returned by @code{arg}. It allows changing, |
| 669 | or restoring an argument to the current function, and is a |
| 670 | construct required to implement tail call optimization. Note that |
| 671 | arguments in registers are very cheap, but will be overwritten |
| 672 | at any moment, including on some operations, for example division, |
| 673 | that on several ports is implemented as a function call. |
| 674 | |
| 675 | Finally, the @code{retval} instruction fetches the return value of a |
| 676 | called function in a register. The @code{retval} instruction takes a |
| 677 | register argument and copies the return value of the previously called |
| 678 | function in that register. A function with a return value should use |
| 679 | @code{retr} or @code{reti} to put the return value in the return register |
| 680 | before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example. |
| 681 | |
| 682 | @code{epilog} is an optional call, that marks the end of a function |
| 683 | body. It is automatically generated by @lightning{} if starting a new |
| 684 | function (what should be done after a @code{ret} call) or finishing |
| 685 | generating jit. |
| 686 | It is very important to note that the fact that @code{epilog} being |
| 687 | optional may cause a common mistake. Consider this: |
| 688 | @example |
| 689 | fun1: |
| 690 | prolog |
| 691 | ... |
| 692 | ret |
| 693 | fun2: |
| 694 | prolog |
| 695 | @end example |
| 696 | Because @code{epilog} is added when finding a new @code{prolog}, |
| 697 | this will cause the @code{fun2} label to actually be before the |
| 698 | return from @code{fun1}. Because @lightning{} will actually |
| 699 | understand it as: |
| 700 | @example |
| 701 | fun1: |
| 702 | prolog |
| 703 | ... |
| 704 | ret |
| 705 | fun2: |
| 706 | epilog |
| 707 | prolog |
| 708 | @end example |
| 709 | |
| 710 | You should observe a few rules when using these macros. First of |
| 711 | all, if calling a varargs function, you should use the @code{ellipsis} |
| 712 | call to mark the position of the ellipsis in the C prototype. |
| 713 | |
| 714 | You should not nest calls to @code{prepare} inside a |
| 715 | @code{prepare/finish} block. Doing this will result in undefined |
| 716 | behavior. Note that for functions with zero arguments you can use |
| 717 | just @code{call}. |
| 718 | |
| 719 | @item Branch instructions |
| 720 | Like @code{arg}, these also return a value which, in this case, |
| 721 | is to be used to compile forward branches as explained in |
| 722 | @ref{Fibonacci, , Fibonacci numbers}. They accept two operands to be |
| 723 | compared; of these, the last can be either a register or an immediate. |
| 724 | They are: |
| 725 | @example |
| 726 | bltr _u _f _d @r{if }(O2 < O3)@r{ goto }O1 |
| 727 | blti _u _f _d @r{if }(O2 < O3)@r{ goto }O1 |
| 728 | bler _u _f _d @r{if }(O2 <= O3)@r{ goto }O1 |
| 729 | blei _u _f _d @r{if }(O2 <= O3)@r{ goto }O1 |
| 730 | bgtr _u _f _d @r{if }(O2 > O3)@r{ goto }O1 |
| 731 | bgti _u _f _d @r{if }(O2 > O3)@r{ goto }O1 |
| 732 | bger _u _f _d @r{if }(O2 >= O3)@r{ goto }O1 |
| 733 | bgei _u _f _d @r{if }(O2 >= O3)@r{ goto }O1 |
| 734 | beqr _f _d @r{if }(O2 == O3)@r{ goto }O1 |
| 735 | beqi _f _d @r{if }(O2 == O3)@r{ goto }O1 |
| 736 | bner _f _d @r{if }(O2 != O3)@r{ goto }O1 |
| 737 | bnei _f _d @r{if }(O2 != O3)@r{ goto }O1 |
| 738 | |
| 739 | bunltr _f _d @r{if }!(O2 >= O3)@r{ goto }O1 |
| 740 | bunler _f _d @r{if }!(O2 > O3)@r{ goto }O1 |
| 741 | bungtr _f _d @r{if }!(O2 <= O3)@r{ goto }O1 |
| 742 | bunger _f _d @r{if }!(O2 < O3)@r{ goto }O1 |
| 743 | buneqr _f _d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1 |
| 744 | bltgtr _f _d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1 |
| 745 | bordr _f _d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1 |
| 746 | bunordr _f _d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1 |
| 747 | |
| 748 | bmsr @r{if }O2 & O3@r{ goto }O1 |
| 749 | bmsi @r{if }O2 & O3@r{ goto }O1 |
| 750 | bmcr @r{if }!(O2 & O3)@r{ goto }O1 |
| 751 | bmci @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.} |
| 752 | boaddr _u O2 += O3@r{, goto }O1@r{ if overflow} |
| 753 | boaddi _u O2 += O3@r{, goto }O1@r{ if overflow} |
| 754 | bxaddr _u O2 += O3@r{, goto }O1@r{ if no overflow} |
| 755 | bxaddi _u O2 += O3@r{, goto }O1@r{ if no overflow} |
| 756 | bosubr _u O2 -= O3@r{, goto }O1@r{ if overflow} |
| 757 | bosubi _u O2 -= O3@r{, goto }O1@r{ if overflow} |
| 758 | bxsubr _u O2 -= O3@r{, goto }O1@r{ if no overflow} |
| 759 | bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow} |
| 760 | @end example |
| 761 | |
| 762 | Note that the @code{C} code does not have an @code{O1} argument. It is |
| 763 | required to always use the return value as an argument to @code{patch}, |
| 764 | @code{patch_at} or @code{patch_abs}. |
| 765 | |
| 766 | @item Jump and return operations |
| 767 | These accept one argument except @code{ret} and @code{jmpi} which |
| 768 | have none; the difference between @code{finishi} and @code{calli} |
| 769 | is that the latter does not clean the stack from pushed parameters |
| 770 | (if any) and the former must @strong{always} follow a @code{prepare} |
| 771 | instruction. |
| 772 | @example |
| 773 | callr (not specified) @r{function call to register O1} |
| 774 | calli (not specified) @r{function call to immediate O1} |
| 775 | finishr (not specified) @r{function call to register O1} |
| 776 | finishi (not specified) @r{function call to immediate O1} |
| 777 | jmpr (not specified) @r{unconditional jump to register} |
| 778 | jmpi (not specified) @r{unconditional jump} |
| 779 | ret (not specified) @r{return from subroutine} |
| 780 | retr _c _uc _s _us _i _ui _l _f _d |
| 781 | reti _c _uc _s _us _i _ui _l _f _d |
| 782 | retval _c _uc _s _us _i _ui _l _f _d @r{move return value} |
| 783 | @r{to register} |
| 784 | @end example |
| 785 | |
| 786 | Like branch instruction, @code{jmpi} also returns a value which is to |
| 787 | be used to compile forward branches. @xref{Fibonacci, , Fibonacci |
| 788 | numbers}. |
| 789 | |
| 790 | @item Labels |
| 791 | There are 3 @lightning{} instructions to create labels: |
| 792 | @example |
| 793 | label (not specified) @r{simple label} |
| 794 | forward (not specified) @r{forward label} |
| 795 | indirect (not specified) @r{special simple label} |
| 796 | @end example |
| 797 | |
| 798 | The following instruction is used to specify a minimal alignment for |
| 799 | the next instruction, usually with a label: |
| 800 | @example |
| 801 | align (not specified) @r{align code} |
| 802 | @end example |
| 803 | |
| 804 | Similar to @code{align} is the next instruction, also usually used with |
| 805 | a label: |
| 806 | @example |
| 807 | skip (not specified) @r{skip code} |
| 808 | @end example |
| 809 | It is used to specify a minimal number of bytes of nops to be inserted |
| 810 | before the next instruction. |
| 811 | |
| 812 | @code{label} is normally used as @code{patch_at} argument for backward |
| 813 | jumps. |
| 814 | |
| 815 | @example |
| 816 | jit_node_t *jump, *label; |
| 817 | label = jit_label(); |
| 818 | ... |
| 819 | jump = jit_beqr(JIT_R0, JIT_R1); |
| 820 | jit_patch_at(jump, label); |
| 821 | @end example |
| 822 | |
| 823 | @code{forward} is used to patch code generation before the actual |
| 824 | position of the label is known. |
| 825 | |
| 826 | @example |
| 827 | jit_node_t *jump, *label; |
| 828 | label = jit_forward(); |
| 829 | jump = jit_beqr(JIT_R0, JIT_R1); |
| 830 | jit_patch_at(jump, label); |
| 831 | ... |
| 832 | jit_link(label); |
| 833 | @end example |
| 834 | |
| 835 | @code{indirect} is useful when creating jump tables, and tells |
| 836 | @lightning{} to not optimize out a label that is not the target of |
| 837 | any jump, because an indirect jump may land where it is defined. |
| 838 | |
| 839 | @example |
| 840 | jit_node_t *jump, *label; |
| 841 | ... |
| 842 | jmpr(JIT_R0); @rem{/* may jump to label */} |
| 843 | ... |
| 844 | label = jit_indirect(); |
| 845 | @end example |
| 846 | |
| 847 | @code{indirect} is an special case of @code{note} and @code{name} |
| 848 | because it is a valid argument to @code{address}. |
| 849 | |
| 850 | Note that the usual idiom to write the previous example is |
| 851 | @example |
| 852 | jit_node_t *addr, *jump; |
| 853 | addr = jit_movi(JIT_R0, 0); @rem{/* immediate is ignored */} |
| 854 | ... |
| 855 | jmpr(JIT_R0); |
| 856 | ... |
| 857 | jit_patch(addr); @rem{/* implicit label added */} |
| 858 | @end example |
| 859 | |
| 860 | that automatically binds the implicit label added by @code{patch} with |
| 861 | the @code{movi}, but on some special conditions it is required to create |
| 862 | an "unbound" label. |
| 863 | |
| 864 | @code{align} is useful for creating multiple entry points to a |
| 865 | (trampoline) function that are all accessible through a single |
| 866 | function pointer. @code{align} receives an integer argument that |
| 867 | defines the minimal alignment of the address of a label directly |
| 868 | following the @code{align} instruction. The integer argument must be |
| 869 | a power of two and the effective alignment will be a power of two no |
| 870 | less than the argument to @code{align}. If the argument to |
| 871 | @code{align} is 16 or more, the effective alignment will match the |
| 872 | specified minimal alignment exactly. |
| 873 | |
| 874 | @example |
| 875 | jit_node_t *forward, *label1, *label2, *jump; |
| 876 | unsigned char *addr1, *addr2; |
| 877 | forward = jit_forward(); |
| 878 | jit_align(16); |
| 879 | label1 = jit_indirect(); @rem{/* first entry point */} |
| 880 | jump = jit_jmpi(); @rem{/* jump to first handler */} |
| 881 | jit_patch_at(jump, forward); |
| 882 | jit_align(16); |
| 883 | label2 = jit_indirect(); @rem{/* second entry point */} |
| 884 | ... @rem{/* second handler */} |
| 885 | jit_jmpr(...); |
| 886 | jit_link(forward); |
| 887 | ... @rem{/* first handler /*} |
| 888 | jit_jmpr(...); |
| 889 | ... |
| 890 | jit_emit(); |
| 891 | addr1 = jit_address(label1); |
| 892 | addr2 = jit_address(label2); |
| 893 | assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */} |
| 894 | @end example |
| 895 | |
| 896 | @code{skip} is useful for reserving space in the code buffer that can |
| 897 | later be filled (possibly with the help of the pair of functions |
| 898 | @code{jit_unprotect} and @code{jit_protect}). |
| 899 | |
| 900 | @item Function prolog |
| 901 | |
| 902 | These macros are used to set up a function prolog. The @code{allocai} |
| 903 | call accept a single integer argument and returns an offset value |
| 904 | for stack storage access. The @code{allocar} accepts two registers |
| 905 | arguments, the first is set to the offset for stack access, and the |
| 906 | second is the size in bytes argument. |
| 907 | |
| 908 | @example |
| 909 | prolog (not specified) @r{function prolog} |
| 910 | allocai (not specified) @r{reserve space on the stack} |
| 911 | allocar (not specified) @r{allocate space on the stack} |
| 912 | @end example |
| 913 | |
| 914 | @code{allocai} receives the number of bytes to allocate and returns |
| 915 | the offset from the frame pointer register @code{FP} to the base of |
| 916 | the area. |
| 917 | |
| 918 | @code{allocar} receives two register arguments. The first is where |
| 919 | to store the offset from the frame pointer register @code{FP} to the |
| 920 | base of the area. The second argument is the size in bytes. Note |
| 921 | that @code{allocar} is dynamic allocation, and special attention |
| 922 | should be taken when using it. If called in a loop, every iteration |
| 923 | will allocate stack space. Stack space is aligned from 8 to 64 bytes |
| 924 | depending on backend requirements, even if allocating only one byte. |
| 925 | It is advisable to not use it with @code{frame} and @code{tramp}; it |
| 926 | should work with @code{frame} with special care to call only once, |
| 927 | but is not supported if used in @code{tramp}, even if called only |
| 928 | once. |
| 929 | |
| 930 | As a small appetizer, here is a small function that adds 1 to the input |
| 931 | parameter (an @code{int}). I'm using an assembly-like syntax here which |
| 932 | is a bit different from the one used when writing real subroutines with |
| 933 | @lightning{}; the real syntax will be introduced in @xref{GNU lightning |
| 934 | examples, , Generating code at run-time}. |
| 935 | |
| 936 | @example |
| 937 | incr: |
| 938 | prolog |
| 939 | in = arg @rem{! We have an integer argument} |
| 940 | getarg R0, in @rem{! Move it to R0} |
| 941 | addi R0, R0, 1 @rem{! Add 1} |
| 942 | retr R0 @rem{! And return the result} |
| 943 | @end example |
| 944 | |
| 945 | And here is another function which uses the @code{printf} function from |
| 946 | the standard C library to write a number in hexadecimal notation: |
| 947 | |
| 948 | @example |
| 949 | printhex: |
| 950 | prolog |
| 951 | in = arg @rem{! Same as above} |
| 952 | getarg R0, in |
| 953 | prepare @rem{! Begin call sequence for printf} |
| 954 | pushargi "%x" @rem{! Push format string} |
| 955 | ellipsis @rem{! Varargs start here} |
| 956 | pushargr R0 @rem{! Push second argument} |
| 957 | finishi printf @rem{! Call printf} |
| 958 | ret @rem{! Return to caller} |
| 959 | @end example |
| 960 | |
| 961 | @item Register liveness |
| 962 | |
| 963 | During code generation, @lightning{} occasionally needs scratch registers |
| 964 | or needs to use architecture-defined registers. For that, @lightning{} |
| 965 | internally maintains register liveness information. |
| 966 | |
| 967 | In the following example, @code{qdivr} will need special registers like |
| 968 | @code{R0} on some architectures. As @lightning{} understands that |
| 969 | @code{R0} is used in the subsequent instruction, it will create |
| 970 | save/restore code for @code{R0} in case. |
| 971 | |
| 972 | @example |
| 973 | ... |
| 974 | qdivr V0, V1, V2, V3 |
| 975 | movr V3, R0 |
| 976 | ... |
| 977 | @end example |
| 978 | |
| 979 | The same is not true in the example that follows. Here, @code{R0} is |
| 980 | not alive after the division operation because @code{R0} is neither an |
| 981 | argument register nor a callee-save register. Thus, no save/restore |
| 982 | code for @code{R0} will be created in case. |
| 983 | |
| 984 | @example |
| 985 | ... |
| 986 | qdivr V0, V1, V2, V3 |
| 987 | jmpr R1 |
| 988 | ... |
| 989 | @end example |
| 990 | |
| 991 | The @code{live} instruction can be used to mark a register as live after |
| 992 | it as in the following example. Here, @code{R0} will be preserved |
| 993 | across the division. |
| 994 | |
| 995 | @example |
| 996 | ... |
| 997 | qdivr V0, V1, V2, V3 |
| 998 | live R0 |
| 999 | jmpr R1 |
| 1000 | ... |
| 1001 | @end example |
| 1002 | |
| 1003 | The @code{live} instruction is useful at code entry and exit points, |
| 1004 | like after and before a @code{callr} instruction. |
| 1005 | |
| 1006 | @item Trampolines, continuations and tail call optimization |
| 1007 | |
| 1008 | Frequently it is required to generate jit code that must jump to |
| 1009 | code generated later, possibly from another @code{jit_context_t}. |
| 1010 | These require compatible stack frames. |
| 1011 | |
| 1012 | @lightning{} provides two primitives from where trampolines, |
| 1013 | continuations and tail call optimization can be implemented. |
| 1014 | |
| 1015 | @example |
| 1016 | frame (not specified) @r{create stack frame} |
| 1017 | tramp (not specified) @r{assume stack frame} |
| 1018 | @end example |
| 1019 | |
| 1020 | @code{frame} receives an integer argument@footnote{It is not |
| 1021 | automatically computed because it does not know about the |
| 1022 | requirement of later generated code.} that defines the size in |
| 1023 | bytes for the stack frame of the current, @code{C} callable, |
| 1024 | jit function. To calculate this value, a good formula is maximum |
| 1025 | number of arguments to any called native function times |
| 1026 | eight@footnote{Times eight so that it works for double arguments. |
| 1027 | And would not need conditionals for ports that pass arguments in |
| 1028 | the stack.}, plus the sum of the arguments to any call to |
| 1029 | @code{jit_allocai}. @lightning{} automatically adjusts this value |
| 1030 | for any backend specific stack memory it may need, or any |
| 1031 | alignment constraint. |
| 1032 | |
| 1033 | @code{frame} also instructs @lightning{} to save all callee |
| 1034 | save registers in the prolog and reload in the epilog. |
| 1035 | |
| 1036 | @example |
| 1037 | main: @rem{! jit entry point} |
| 1038 | prolog @rem{! function prolog} |
| 1039 | frame 256 @rem{! save all callee save registers and} |
| 1040 | @rem{! reserve at least 256 bytes in stack} |
| 1041 | main_loop: |
| 1042 | ... |
| 1043 | jmpi handler @rem{! jumps to external code} |
| 1044 | ... |
| 1045 | ret @rem{! return to the caller} |
| 1046 | @end example |
| 1047 | |
| 1048 | @code{tramp} differs from @code{frame} only that a prolog and epilog |
| 1049 | will not be generated. Note that @code{prolog} must still be used. |
| 1050 | The code under @code{tramp} must be ready to be entered with a jump |
| 1051 | at the prolog position, and instead of a return, it must end with |
| 1052 | a non conditional jump. @code{tramp} exists solely for the fact |
| 1053 | that it allows optimizing out prolog and epilog code that would |
| 1054 | never be executed. |
| 1055 | |
| 1056 | @example |
| 1057 | handler: @rem{! handler entry point} |
| 1058 | prolog @rem{! function prolog} |
| 1059 | tramp 256 @rem{! assumes all callee save registers} |
| 1060 | @rem{! are saved and there is at least} |
| 1061 | @rem{! 256 bytes in stack} |
| 1062 | ... |
| 1063 | jmpi main_loop @rem{! return to the main loop} |
| 1064 | @end example |
| 1065 | |
| 1066 | @lightning{} only supports Tail Call Optimization using the |
| 1067 | @code{tramp} construct. Any other way is not guaranteed to |
| 1068 | work on all ports. |
| 1069 | |
| 1070 | An example of a simple (recursive) tail call optimization: |
| 1071 | |
| 1072 | @example |
| 1073 | factorial: @rem{! Entry point of the factorial function} |
| 1074 | prolog |
| 1075 | in = arg @rem{! Receive an integer argument} |
| 1076 | getarg R0, in @rem{! Move argument to RO} |
| 1077 | prepare |
| 1078 | pushargi 1 @rem{! This is the accumulator} |
| 1079 | pushargr R0 @rem{! This is the argument} |
| 1080 | finishi fact @rem{! Call the tail call optimized function} |
| 1081 | retval R0 @rem{! Fetch the result} |
| 1082 | retr R0 @rem{! Return it} |
| 1083 | epilog @rem{! Epilog *before* label before prolog} |
| 1084 | |
| 1085 | fact: @rem{! Entry point of the helper function} |
| 1086 | prolog |
| 1087 | frame 16 @rem{! Reserve 16 bytes in the stack} |
| 1088 | fact_entry: @rem{! This is the tail call entry point} |
| 1089 | ac = arg @rem{! The accumulator is the first argument} |
| 1090 | in = arg @rem{! The factorial argument} |
| 1091 | getarg R0, ac @rem{! Move the accumulator to R0} |
| 1092 | getarg R1, in @rem{! Move the argument to R1} |
| 1093 | blei fact_out, R1, 1 @rem{! Done if argument is one or less} |
| 1094 | mulr R0, R0, R1 @rem{! accumulator *= argument} |
| 1095 | putargr R0, ac @rem{! Update the accumulator} |
| 1096 | subi R1, R1, 1 @rem{! argument -= 1} |
| 1097 | putargr R1, in @rem{! Update the argument} |
| 1098 | jmpi fact_entry @rem{! Tail Call Optimize it!} |
| 1099 | fact_out: |
| 1100 | retr R0 @rem{! Return the accumulator} |
| 1101 | @end example |
| 1102 | |
| 1103 | @item Predicates |
| 1104 | @example |
| 1105 | forward_p (not specified) @r{forward label predicate} |
| 1106 | indirect_p (not specified) @r{indirect label predicate} |
| 1107 | target_p (not specified) @r{used label predicate} |
| 1108 | arg_register_p (not specified) @r{argument kind predicate} |
| 1109 | callee_save_p (not specified) @r{callee save predicate} |
| 1110 | pointer_p (not specified) @r{pointer predicate} |
| 1111 | @end example |
| 1112 | |
| 1113 | @code{forward_p} expects a @code{jit_node_t*} argument, and |
| 1114 | returns non zero if it is a forward label reference, that is, |
| 1115 | a label returned by @code{forward}, that still needs a |
| 1116 | @code{link} call. |
| 1117 | |
| 1118 | @code{indirect_p} expects a @code{jit_node_t*} argument, and returns |
| 1119 | non zero if it is an indirect label reference, that is, a label that |
| 1120 | was returned by @code{indirect}. |
| 1121 | |
| 1122 | @code{target_p} expects a @code{jit_node_t*} argument, that is any |
| 1123 | kind of label, and will return non zero if there is at least one |
| 1124 | jump or move referencing it. |
| 1125 | |
| 1126 | @code{arg_register_p} expects a @code{jit_node_t*} argument, that must |
| 1127 | have been returned by @code{arg}, @code{arg_f} or @code{arg_d}, and |
| 1128 | will return non zero if the argument lives in a register. This call |
| 1129 | is useful to know the live range of register arguments, as those |
| 1130 | are very fast to read and write, but have volatile values. |
| 1131 | |
| 1132 | @code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or |
| 1133 | @code{JIT_Fn}, and will return non zero if the register is callee |
| 1134 | save. This call is useful because on several ports, the @code{JIT_Rn} |
| 1135 | and @code{JIT_Fn} registers are actually callee save; no need |
| 1136 | to save and load the values when making function calls. |
| 1137 | |
| 1138 | @code{pointer_p} expects a pointer argument, and will return non |
| 1139 | zero if the pointer is inside the generated jit code. Must be |
| 1140 | called after @code{jit_emit} and before @code{jit_destroy_state}. |
| 1141 | |
| 1142 | @item Atomic operations |
| 1143 | Only compare-and-swap is implemented. It accepts four operands; |
| 1144 | the second can be an immediate. |
| 1145 | |
| 1146 | The first argument is set with a boolean value telling if the operation |
| 1147 | did succeed. |
| 1148 | |
| 1149 | Arguments must be different, cannot use the result register to also pass |
| 1150 | an argument. |
| 1151 | |
| 1152 | The second argument is the address of a machine word. |
| 1153 | |
| 1154 | The third argument is the old value. |
| 1155 | |
| 1156 | The fourth argument is the new value. |
| 1157 | |
| 1158 | @example |
| 1159 | casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0 |
| 1160 | casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0 |
| 1161 | @end example |
| 1162 | |
| 1163 | If value at the address in the second argument is equal to the third |
| 1164 | argument, the address value is atomically modified to the value of the |
| 1165 | fourth argument and the first argument is set to a non zero value. |
| 1166 | |
| 1167 | If the value at the address in the second argument is not equal to the |
| 1168 | third argument nothing is done and the first argument is set to zero. |
| 1169 | @end table |
| 1170 | |
| 1171 | @node GNU lightning examples |
| 1172 | @chapter Generating code at run-time |
| 1173 | |
| 1174 | To use @lightning{}, you should include the @file{lightning.h} file that |
| 1175 | is put in your include directory by the @samp{make install} command. |
| 1176 | |
| 1177 | Each of the instructions above translates to a macro or function call. |
| 1178 | All you have to do is prepend @code{jit_} (lowercase) to opcode names |
| 1179 | and @code{JIT_} (uppercase) to register names. Of course, parameters |
| 1180 | are to be put between parentheses. |
| 1181 | |
| 1182 | This small tutorial presents three examples: |
| 1183 | |
| 1184 | @iftex |
| 1185 | @itemize @bullet |
| 1186 | @item |
| 1187 | The @code{incr} function found in @ref{The instruction set, , |
| 1188 | @lightning{}'s instruction set}: |
| 1189 | |
| 1190 | @item |
| 1191 | A simple function call to @code{printf} |
| 1192 | |
| 1193 | @item |
| 1194 | An RPN calculator. |
| 1195 | |
| 1196 | @item |
| 1197 | Fibonacci numbers |
| 1198 | @end itemize |
| 1199 | @end iftex |
| 1200 | @ifnottex |
| 1201 | @menu |
| 1202 | * incr:: A function which increments a number by one |
| 1203 | * printf:: A simple function call to printf |
| 1204 | * RPN calculator:: A more complex example, an RPN calculator |
| 1205 | * Fibonacci:: Calculating Fibonacci numbers |
| 1206 | @end menu |
| 1207 | @end ifnottex |
| 1208 | |
| 1209 | @node incr |
| 1210 | @section A function which increments a number by one |
| 1211 | |
| 1212 | Let's see how to create and use the sample @code{incr} function created |
| 1213 | in @ref{The instruction set, , @lightning{}'s instruction set}: |
| 1214 | |
| 1215 | @example |
| 1216 | #include <stdio.h> |
| 1217 | #include <lightning.h> |
| 1218 | |
| 1219 | static jit_state_t *_jit; |
| 1220 | |
| 1221 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} |
| 1222 | |
| 1223 | int main(int argc, char *argv[]) |
| 1224 | @{ |
| 1225 | jit_node_t *in; |
| 1226 | pifi incr; |
| 1227 | |
| 1228 | init_jit(argv[0]); |
| 1229 | _jit = jit_new_state(); |
| 1230 | |
| 1231 | jit_prolog(); @rem{/* @t{ prolog } */} |
| 1232 | in = jit_arg(); @rem{/* @t{ in = arg } */} |
| 1233 | jit_getarg(JIT_R0, in); @rem{/* @t{ getarg R0 } */} |
| 1234 | jit_addi(JIT_R0, JIT_R0, 1); @rem{/* @t{ addi R0@comma{} R0@comma{} 1 } */} |
| 1235 | jit_retr(JIT_R0); @rem{/* @t{ retr R0 } */} |
| 1236 | |
| 1237 | incr = jit_emit(); |
| 1238 | jit_clear_state(); |
| 1239 | |
| 1240 | @rem{/* call the generated code@comma{} passing 5 as an argument */} |
| 1241 | printf("%d + 1 = %d\n", 5, incr(5)); |
| 1242 | |
| 1243 | jit_destroy_state(); |
| 1244 | finish_jit(); |
| 1245 | return 0; |
| 1246 | @} |
| 1247 | @end example |
| 1248 | |
| 1249 | Let's examine the code line by line (well, almost@dots{}): |
| 1250 | |
| 1251 | @table @t |
| 1252 | @item #include <lightning.h> |
| 1253 | You already know about this. It defines all of @lightning{}'s macros. |
| 1254 | |
| 1255 | @item static jit_state_t *_jit; |
| 1256 | You might wonder about what is @code{jit_state_t}. It is a structure |
| 1257 | that stores jit code generation information. The name @code{_jit} is |
| 1258 | special, because since multiple jit generators can run at the same |
| 1259 | time, you must either @r{#define _jit my_jit_state} or name it |
| 1260 | @code{_jit}. |
| 1261 | |
| 1262 | @item typedef int (*pifi)(int); |
| 1263 | Just a handy typedef for a pointer to a function that takes an |
| 1264 | @code{int} and returns another. |
| 1265 | |
| 1266 | @item jit_node_t *in; |
| 1267 | Declares a variable to hold an identifier for a function argument. It |
| 1268 | is an opaque pointer, that will hold the return of a call to @code{arg} |
| 1269 | and be used as argument to @code{getarg}. |
| 1270 | |
| 1271 | @item pifi incr; |
| 1272 | Declares a function pointer variable to a function that receives an |
| 1273 | @code{int} and returns an @code{int}. |
| 1274 | |
| 1275 | @item init_jit(argv[0]); |
| 1276 | You must call this function before creating a @code{jit_state_t} |
| 1277 | object. This function does global state initialization, and may need |
| 1278 | to detect CPU or Operating System features. It receives a string |
| 1279 | argument that is later used to read symbols from a shared object using |
| 1280 | GNU binutils if disassembly was enabled at configure time. If no |
| 1281 | disassembly will be performed a NULL pointer can be used as argument. |
| 1282 | |
| 1283 | @item _jit = jit_new_state(); |
| 1284 | This call initializes a @lightning{} jit state. |
| 1285 | |
| 1286 | @item jit_prolog(); |
| 1287 | Ok, so we start generating code for our beloved function@dots{} |
| 1288 | |
| 1289 | @item in = jit_arg(); |
| 1290 | @itemx jit_getarg(JIT_R0, in); |
| 1291 | We retrieve the first (and only) argument, an integer, and store it |
| 1292 | into the general-purpose register @code{R0}. |
| 1293 | |
| 1294 | @item jit_addi(JIT_R0, JIT_R0, 1); |
| 1295 | We add one to the content of the register. |
| 1296 | |
| 1297 | @item jit_retr(JIT_R0); |
| 1298 | This instruction generates a standard function epilog that returns |
| 1299 | the contents of the @code{R0} register. |
| 1300 | |
| 1301 | @item incr = jit_emit(); |
| 1302 | This instruction is very important. It actually translates the |
| 1303 | @lightning{} macros used before to machine code, flushes the generated |
| 1304 | code area out of the processor's instruction cache and return a |
| 1305 | pointer to the start of the code. |
| 1306 | |
| 1307 | @item jit_clear_state(); |
| 1308 | This call cleanups any data not required for jit execution. Note |
| 1309 | that it must be called after any call to @code{jit_print} or |
| 1310 | @code{jit_address}, as this call destroy the @lightning{} |
| 1311 | intermediate representation. |
| 1312 | |
| 1313 | @item printf("%d + 1 = %d", 5, incr(5)); |
| 1314 | Calling our function is this simple---it is not distinguishable from |
| 1315 | a normal C function call, the only difference being that @code{incr} |
| 1316 | is a variable. |
| 1317 | |
| 1318 | @item jit_destroy_state(); |
| 1319 | Releases all memory associated with the jit context. It should be |
| 1320 | called after known the jit will no longer be called. |
| 1321 | |
| 1322 | @item finish_jit(); |
| 1323 | This call cleanups any global state hold by @lightning{}, and is |
| 1324 | advisable to call it once jit code will no longer be generated. |
| 1325 | @end table |
| 1326 | |
| 1327 | @lightning{} abstracts two phases of dynamic code generation: selecting |
| 1328 | instructions that map the standard representation, and emitting binary |
| 1329 | code for these instructions. The client program has the responsibility |
| 1330 | of describing the code to be generated using the standard @lightning{} |
| 1331 | instruction set. |
| 1332 | |
| 1333 | Let's examine the code generated for @code{incr} on the SPARC and x86_64 |
| 1334 | architecture (on the right is the code that an assembly-language |
| 1335 | programmer would write): |
| 1336 | |
| 1337 | @table @b |
| 1338 | @item SPARC |
| 1339 | @example |
| 1340 | save %sp, -112, %sp |
| 1341 | mov %i0, %g2 retl |
| 1342 | inc %g2 inc %o0 |
| 1343 | mov %g2, %i0 |
| 1344 | restore |
| 1345 | retl |
| 1346 | nop |
| 1347 | @end example |
| 1348 | In this case, @lightning{} introduces overhead to create a register |
| 1349 | window (not knowing that the procedure is a leaf procedure) and to |
| 1350 | move the argument to the general purpose register @code{R0} (which |
| 1351 | maps to @code{%g2} on the SPARC). |
| 1352 | @end table |
| 1353 | |
| 1354 | @table @b |
| 1355 | @item x86_64 |
| 1356 | @example |
| 1357 | mov %rdi,%rax |
| 1358 | add $0x1,%rax |
| 1359 | ret |
| 1360 | @end example |
| 1361 | In this case, for the x86 port, @lightning{} has simple optimizations |
| 1362 | to understand it is a leaf function, and that it is not required to |
| 1363 | create a stack frame nor update the stack pointer. |
| 1364 | @end table |
| 1365 | |
| 1366 | @node printf |
| 1367 | @section A simple function call to @code{printf} |
| 1368 | |
| 1369 | Again, here is the code for the example: |
| 1370 | |
| 1371 | @example |
| 1372 | #include <stdio.h> |
| 1373 | #include <lightning.h> |
| 1374 | |
| 1375 | static jit_state_t *_jit; |
| 1376 | |
| 1377 | typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */} |
| 1378 | |
| 1379 | int main(int argc, char *argv[]) |
| 1380 | @{ |
| 1381 | pvfi myFunction; @rem{/* ptr to generated code */} |
| 1382 | jit_node_t *start, *end; @rem{/* a couple of labels */} |
| 1383 | jit_node_t *in; @rem{/* to get the argument */} |
| 1384 | |
| 1385 | init_jit(argv[0]); |
| 1386 | _jit = jit_new_state(); |
| 1387 | |
| 1388 | start = jit_note(__FILE__, __LINE__); |
| 1389 | jit_prolog(); |
| 1390 | in = jit_arg(); |
| 1391 | jit_getarg(JIT_R1, in); |
| 1392 | jit_prepare(); |
| 1393 | jit_pushargi((jit_word_t)"generated %d bytes\n"); |
| 1394 | jit_ellipsis(); |
| 1395 | jit_pushargr(JIT_R1); |
| 1396 | jit_finishi(printf); |
| 1397 | jit_ret(); |
| 1398 | jit_epilog(); |
| 1399 | end = jit_note(__FILE__, __LINE__); |
| 1400 | |
| 1401 | myFunction = jit_emit(); |
| 1402 | |
| 1403 | @rem{/* call the generated code@comma{} passing its size as argument */} |
| 1404 | myFunction((char*)jit_address(end) - (char*)jit_address(start)); |
| 1405 | jit_clear_state(); |
| 1406 | |
| 1407 | jit_disassemble(); |
| 1408 | |
| 1409 | jit_destroy_state(); |
| 1410 | finish_jit(); |
| 1411 | return 0; |
| 1412 | @} |
| 1413 | @end example |
| 1414 | |
| 1415 | The function shows how many bytes were generated. Most of the code |
| 1416 | is not very interesting, as it resembles very closely the program |
| 1417 | presented in @ref{incr, , A function which increments a number by one}. |
| 1418 | |
| 1419 | For this reason, we're going to concentrate on just a few statements. |
| 1420 | |
| 1421 | @table @t |
| 1422 | @item start = jit_note(__FILE__, __LINE__); |
| 1423 | @itemx @r{@dots{}} |
| 1424 | @itemx end = jit_note(__FILE__, __LINE__); |
| 1425 | These two instruction call the @code{jit_note} macro, which creates |
| 1426 | a note in the jit code; arguments to @code{jit_note} usually are a |
| 1427 | filename string and line number integer, but using NULL for the |
| 1428 | string argument is perfectly valid if only need to create a simple |
| 1429 | marker in the code. |
| 1430 | |
| 1431 | @item jit_ellipsis(); |
| 1432 | @code{ellipsis} usually is only required if calling varargs functions |
| 1433 | with double arguments, but it is a good practice to properly describe |
| 1434 | the @r{@dots{}} in the call sequence. |
| 1435 | |
| 1436 | @item jit_pushargi((jit_word_t)"generated %d bytes\n"); |
| 1437 | Note the use of the @code{(jit_word_t)} cast, that is used only |
| 1438 | to avoid a compiler warning, due to using a pointer where a |
| 1439 | wordsize integer type was expected. |
| 1440 | |
| 1441 | @item jit_prepare(); |
| 1442 | @itemx @r{@dots{}} |
| 1443 | @itemx jit_finishi(printf); |
| 1444 | Once the arguments to @code{printf} have been pushed, what means |
| 1445 | moving them to stack or register arguments, the @code{printf} |
| 1446 | function is called and the stack cleaned. Note how @lightning{} |
| 1447 | abstracts the differences between different architectures and |
| 1448 | ABI's -- the client program does not know how parameter passing |
| 1449 | works on the host architecture. |
| 1450 | |
| 1451 | @item jit_epilog(); |
| 1452 | Usually it is not required to call @code{epilog}, but because it |
| 1453 | is implicitly called when noticing the end of a function, if the |
| 1454 | @code{end} variable was set with a @code{note} call after the |
| 1455 | @code{ret}, it would not consider the function epilog. |
| 1456 | |
| 1457 | @item myFunction((char*)jit_address(end) - (char*)jit_address(start)); |
| 1458 | This calls the generate jit function passing as argument the offset |
| 1459 | difference from the @code{start} and @code{end} notes. The @code{address} |
| 1460 | call must be done after the @code{emit} call or either a fatal error |
| 1461 | will happen (if @lightning{} is built with assertions enable) or an |
| 1462 | undefined value will be returned. |
| 1463 | |
| 1464 | @item jit_clear_state(); |
| 1465 | Note that @code{jit_clear_state} was called after executing jit in |
| 1466 | this example. It was done because it must be called after any call |
| 1467 | to @code{jit_address} or @code{jit_print}. |
| 1468 | |
| 1469 | @item jit_disassemble(); |
| 1470 | @code{disassemble} will dump the generated code to standard output, |
| 1471 | unless @lightning{} was built with the disassembler disabled, in which |
| 1472 | case no output will be shown. |
| 1473 | @end table |
| 1474 | |
| 1475 | @node RPN calculator |
| 1476 | @section A more complex example, an RPN calculator |
| 1477 | |
| 1478 | We create a small stack-based RPN calculator which applies a series |
| 1479 | of operators to a given parameter and to other numeric operands. |
| 1480 | Unlike previous examples, the code generator is fully parameterized |
| 1481 | and is able to compile different formulas to different functions. |
| 1482 | Here is the code for the expression compiler; a sample usage will |
| 1483 | follow. |
| 1484 | |
| 1485 | Since @lightning{} does not provide push/pop instruction, this |
| 1486 | example uses a stack-allocated area to store the data. Such an |
| 1487 | area can be allocated using the macro @code{allocai}, which |
| 1488 | receives the number of bytes to allocate and returns the offset |
| 1489 | from the frame pointer register @code{FP} to the base of the |
| 1490 | area. |
| 1491 | |
| 1492 | Usually, you will use the @code{ldxi} and @code{stxi} instruction |
| 1493 | to access stack-allocated variables. However, it is possible to |
| 1494 | use operations such as @code{add} to compute the address of the |
| 1495 | variables, and pass the address around. |
| 1496 | |
| 1497 | @example |
| 1498 | #include <stdio.h> |
| 1499 | #include <lightning.h> |
| 1500 | |
| 1501 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} |
| 1502 | |
| 1503 | static jit_state_t *_jit; |
| 1504 | |
| 1505 | void stack_push(int reg, int *sp) |
| 1506 | @{ |
| 1507 | jit_stxi_i (*sp, JIT_FP, reg); |
| 1508 | *sp += sizeof (int); |
| 1509 | @} |
| 1510 | |
| 1511 | void stack_pop(int reg, int *sp) |
| 1512 | @{ |
| 1513 | *sp -= sizeof (int); |
| 1514 | jit_ldxi_i (reg, JIT_FP, *sp); |
| 1515 | @} |
| 1516 | |
| 1517 | jit_node_t *compile_rpn(char *expr) |
| 1518 | @{ |
| 1519 | jit_node_t *in, *fn; |
| 1520 | int stack_base, stack_ptr; |
| 1521 | |
| 1522 | fn = jit_note(NULL, 0); |
| 1523 | jit_prolog(); |
| 1524 | in = jit_arg(); |
| 1525 | stack_ptr = stack_base = jit_allocai (32 * sizeof (int)); |
| 1526 | |
| 1527 | jit_getarg(JIT_R2, in); |
| 1528 | |
| 1529 | while (*expr) @{ |
| 1530 | char buf[32]; |
| 1531 | int n; |
| 1532 | if (sscanf(expr, "%[0-9]%n", buf, &n)) @{ |
| 1533 | expr += n - 1; |
| 1534 | stack_push(JIT_R0, &stack_ptr); |
| 1535 | jit_movi(JIT_R0, atoi(buf)); |
| 1536 | @} else if (*expr == 'x') @{ |
| 1537 | stack_push(JIT_R0, &stack_ptr); |
| 1538 | jit_movr(JIT_R0, JIT_R2); |
| 1539 | @} else if (*expr == '+') @{ |
| 1540 | stack_pop(JIT_R1, &stack_ptr); |
| 1541 | jit_addr(JIT_R0, JIT_R1, JIT_R0); |
| 1542 | @} else if (*expr == '-') @{ |
| 1543 | stack_pop(JIT_R1, &stack_ptr); |
| 1544 | jit_subr(JIT_R0, JIT_R1, JIT_R0); |
| 1545 | @} else if (*expr == '*') @{ |
| 1546 | stack_pop(JIT_R1, &stack_ptr); |
| 1547 | jit_mulr(JIT_R0, JIT_R1, JIT_R0); |
| 1548 | @} else if (*expr == '/') @{ |
| 1549 | stack_pop(JIT_R1, &stack_ptr); |
| 1550 | jit_divr(JIT_R0, JIT_R1, JIT_R0); |
| 1551 | @} else @{ |
| 1552 | fprintf(stderr, "cannot compile: %s\n", expr); |
| 1553 | abort(); |
| 1554 | @} |
| 1555 | ++expr; |
| 1556 | @} |
| 1557 | jit_retr(JIT_R0); |
| 1558 | jit_epilog(); |
| 1559 | return fn; |
| 1560 | @} |
| 1561 | @end example |
| 1562 | |
| 1563 | The principle on which the calculator is based is easy: the stack top |
| 1564 | is held in R0, while the remaining items of the stack are held in the |
| 1565 | memory area that we allocate with @code{allocai}. Compiling a numeric |
| 1566 | operand or the argument @code{x} pushes the old stack top onto the |
| 1567 | stack and moves the operand into R0; compiling an operator pops the |
| 1568 | second operand off the stack into R1, and compiles the operation so |
| 1569 | that the result goes into R0, thus becoming the new stack top. |
| 1570 | |
| 1571 | This example allocates a fixed area for 32 @code{int}s. This is not |
| 1572 | a problem when the function is a leaf like in this case; in a full-blown |
| 1573 | compiler you will want to analyze the input and determine the number |
| 1574 | of needed stack slots---a very simple example of register allocation. |
| 1575 | The area is then managed like a stack using @code{stack_push} and |
| 1576 | @code{stack_pop}. |
| 1577 | |
| 1578 | Source code for the client (which lies in the same source file) follows: |
| 1579 | |
| 1580 | @example |
| 1581 | int main(int argc, char *argv[]) |
| 1582 | @{ |
| 1583 | jit_node_t *nc, *nf; |
| 1584 | pifi c2f, f2c; |
| 1585 | int i; |
| 1586 | |
| 1587 | init_jit(argv[0]); |
| 1588 | _jit = jit_new_state(); |
| 1589 | |
| 1590 | nc = compile_rpn("32x9*5/+"); |
| 1591 | nf = compile_rpn("x32-5*9/"); |
| 1592 | (void)jit_emit(); |
| 1593 | c2f = (pifi)jit_address(nc); |
| 1594 | f2c = (pifi)jit_address(nf); |
| 1595 | jit_clear_state(); |
| 1596 | |
| 1597 | printf("\nC:"); |
| 1598 | for (i = 0; i <= 100; i += 10) printf("%3d ", i); |
| 1599 | printf("\nF:"); |
| 1600 | for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i)); |
| 1601 | printf("\n"); |
| 1602 | |
| 1603 | printf("\nF:"); |
| 1604 | for (i = 32; i <= 212; i += 18) printf("%3d ", i); |
| 1605 | printf("\nC:"); |
| 1606 | for (i = 32; i <= 212; i += 18) printf("%3d ", f2c(i)); |
| 1607 | printf("\n"); |
| 1608 | |
| 1609 | jit_destroy_state(); |
| 1610 | finish_jit(); |
| 1611 | return 0; |
| 1612 | @} |
| 1613 | @end example |
| 1614 | |
| 1615 | The client displays a conversion table between Celsius and Fahrenheit |
| 1616 | degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The |
| 1617 | formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9}, |
| 1618 | respectively. |
| 1619 | |
| 1620 | Providing the formula as an argument to @code{compile_rpn} effectively |
| 1621 | parameterizes code generation, making it possible to use the same code |
| 1622 | to compile different functions; this is what makes dynamic code |
| 1623 | generation so powerful. |
| 1624 | |
| 1625 | @node Fibonacci |
| 1626 | @section Fibonacci numbers |
| 1627 | |
| 1628 | The code in this section calculates the Fibonacci sequence. That is |
| 1629 | modeled by the recurrence relation: |
| 1630 | @display |
| 1631 | f(0) = 0 |
| 1632 | f(1) = f(2) = 1 |
| 1633 | f(n) = f(n-1) + f(n-2) |
| 1634 | @end display |
| 1635 | |
| 1636 | The purpose of this example is to introduce branches. There are two |
| 1637 | kind of branches: backward branches and forward branches. We'll |
| 1638 | present the calculation in a recursive and iterative form; the |
| 1639 | former only uses forward branches, while the latter uses both. |
| 1640 | |
| 1641 | @example |
| 1642 | #include <stdio.h> |
| 1643 | #include <lightning.h> |
| 1644 | |
| 1645 | static jit_state_t *_jit; |
| 1646 | |
| 1647 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} |
| 1648 | |
| 1649 | int main(int argc, char *argv[]) |
| 1650 | @{ |
| 1651 | pifi fib; |
| 1652 | jit_node_t *label; |
| 1653 | jit_node_t *call; |
| 1654 | jit_node_t *in; @rem{/* offset of the argument */} |
| 1655 | jit_node_t *ref; @rem{/* to patch the forward reference */} |
| 1656 | jit_node_t *zero; @rem{/* to patch the forward reference */} |
| 1657 | |
| 1658 | init_jit(argv[0]); |
| 1659 | _jit = jit_new_state(); |
| 1660 | |
| 1661 | label = jit_label(); |
| 1662 | jit_prolog (); |
| 1663 | in = jit_arg (); |
| 1664 | jit_getarg (JIT_V0, in); @rem{/* R0 = n */} |
| 1665 | zero = jit_beqi (JIT_R0, 0); |
| 1666 | jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */ |
| 1667 | jit_movi (JIT_R0, 1); |
| 1668 | ref = jit_blei (JIT_V0, 2); |
| 1669 | jit_subi (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */} |
| 1670 | jit_subi (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */} |
| 1671 | jit_prepare(); |
| 1672 | jit_pushargr(JIT_V1); |
| 1673 | call = jit_finishi(NULL); |
| 1674 | jit_patch_at(call, label); |
| 1675 | jit_retval(JIT_V1); @rem{/* V1 = fib(n-1) */} |
| 1676 | jit_prepare(); |
| 1677 | jit_pushargr(JIT_V2); |
| 1678 | call = jit_finishi(NULL); |
| 1679 | jit_patch_at(call, label); |
| 1680 | jit_retval(JIT_R0); @rem{/* R0 = fib(n-2) */} |
| 1681 | jit_addr(JIT_R0, JIT_R0, JIT_V1); @rem{/* R0 = R0 + V1 */} |
| 1682 | |
| 1683 | jit_patch(ref); @rem{/* patch jump */} |
| 1684 | jit_patch(zero); @rem{/* patch jump */} |
| 1685 | jit_retr(JIT_R0); |
| 1686 | |
| 1687 | @rem{/* call the generated code@comma{} passing 32 as an argument */} |
| 1688 | fib = jit_emit(); |
| 1689 | jit_clear_state(); |
| 1690 | printf("fib(%d) = %d\n", 32, fib(32)); |
| 1691 | jit_destroy_state(); |
| 1692 | finish_jit(); |
| 1693 | return 0; |
| 1694 | @} |
| 1695 | @end example |
| 1696 | |
| 1697 | As said above, this is the first example of dynamically compiling |
| 1698 | branches. Branch instructions have two operands containing the |
| 1699 | values to be compared, and return a @code{jit_note_t *} object |
| 1700 | to be patched. |
| 1701 | |
| 1702 | Because labels final address are only known after calling @code{emit}, |
| 1703 | it is required to call @code{patch} or @code{patch_at}, what does |
| 1704 | tell @lightning{} that the target to patch is actually a pointer to |
| 1705 | a @code{jit_node_t *} object, otherwise, it would assume that is |
| 1706 | a pointer to a C function. Note that conditional branches do not |
| 1707 | receive a label argument, so they must be patched. |
| 1708 | |
| 1709 | You need to call @code{patch_at} on the return of value @code{calli}, |
| 1710 | @code{finishi}, and @code{calli} if it is actually referencing a label |
| 1711 | in the jit code. All branch instructions do not receive a label |
| 1712 | argument. Note that @code{movi} is an special case, and patching it |
| 1713 | is usually done to get the final address of a label, usually to later |
| 1714 | call @code{jmpr}. |
| 1715 | |
| 1716 | Now, here is the iterative version: |
| 1717 | |
| 1718 | @example |
| 1719 | #include <stdio.h> |
| 1720 | #include <lightning.h> |
| 1721 | |
| 1722 | static jit_state_t *_jit; |
| 1723 | |
| 1724 | typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} |
| 1725 | |
| 1726 | int main(int argc, char *argv[]) |
| 1727 | @{ |
| 1728 | pifi fib; |
| 1729 | jit_node_t *in; @rem{/* offset of the argument */} |
| 1730 | jit_node_t *ref; @rem{/* to patch the forward reference */} |
| 1731 | jit_node_t *zero; @rem{/* to patch the forward reference */} |
| 1732 | jit_node_t *jump; @rem{/* jump to start of loop */} |
| 1733 | jit_node_t *loop; @rem{/* start of the loop */} |
| 1734 | |
| 1735 | init_jit(argv[0]); |
| 1736 | _jit = jit_new_state(); |
| 1737 | |
| 1738 | jit_prolog (); |
| 1739 | in = jit_arg (); |
| 1740 | jit_getarg (JIT_R0, in); @rem{/* R0 = n */} |
| 1741 | zero = jit_beqi (JIT_R0, 0); |
| 1742 | jit_movr (JIT_R1, JIT_R0); |
| 1743 | jit_movi (JIT_R0, 1); |
| 1744 | ref = jit_blti (JIT_R1, 2); |
| 1745 | jit_subi (JIT_R2, JIT_R2, 2); |
| 1746 | jit_movr (JIT_R1, JIT_R0); |
| 1747 | |
| 1748 | loop= jit_label(); |
| 1749 | jit_subi (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */} |
| 1750 | jit_movr (JIT_V0, JIT_R0); /* V0 = R0 */ |
| 1751 | jit_addr (JIT_R0, JIT_R0, JIT_R1); /* R0 = R0 + R1 */ |
| 1752 | jit_movr (JIT_R1, JIT_V0); /* R1 = V0 */ |
| 1753 | jump= jit_bnei (JIT_R2, 0); /* if (R2) goto loop; */ |
| 1754 | jit_patch_at(jump, loop); |
| 1755 | |
| 1756 | jit_patch(ref); @rem{/* patch forward jump */} |
| 1757 | jit_patch(zero); @rem{/* patch forward jump */} |
| 1758 | jit_retr (JIT_R0); |
| 1759 | |
| 1760 | @rem{/* call the generated code@comma{} passing 36 as an argument */} |
| 1761 | fib = jit_emit(); |
| 1762 | jit_clear_state(); |
| 1763 | printf("fib(%d) = %d\n", 36, fib(36)); |
| 1764 | jit_destroy_state(); |
| 1765 | finish_jit(); |
| 1766 | return 0; |
| 1767 | @} |
| 1768 | @end example |
| 1769 | |
| 1770 | This code calculates the recurrence relation using iteration (a |
| 1771 | @code{for} loop in high-level languages). There are no function |
| 1772 | calls anymore: instead, there is a backward jump (the @code{bnei} at |
| 1773 | the end of the loop). |
| 1774 | |
| 1775 | Note that the program must remember the address for backward jumps; |
| 1776 | for forward jumps it is only required to remember the jump code, |
| 1777 | and call @code{patch} for the implicit label. |
| 1778 | |
| 1779 | @node Reentrancy |
| 1780 | @chapter Re-entrant usage of @lightning{} |
| 1781 | |
| 1782 | @lightning{} uses the special @code{_jit} identifier. To be able |
| 1783 | to be able to use multiple jit generation states at the same |
| 1784 | time, it is required to used code similar to: |
| 1785 | |
| 1786 | @example |
| 1787 | struct jit_state lightning; |
| 1788 | #define lightning _jit |
| 1789 | @end example |
| 1790 | |
| 1791 | This will cause the symbol defined to @code{_jit} to be passed as |
| 1792 | the first argument to the underlying @lightning{} implementation, |
| 1793 | that is usually a function with an @code{_} (underscode) prefix |
| 1794 | and with an argument named @code{_jit}, in the pattern: |
| 1795 | |
| 1796 | @example |
| 1797 | static void _jit_mnemonic(jit_state_t *, jit_gpr_t, jit_gpr_t); |
| 1798 | #define jit_mnemonic(u, v) _jit_mnemonic(_jit, u, v); |
| 1799 | @end example |
| 1800 | |
| 1801 | The reason for this is to use the same syntax as the initial lightning |
| 1802 | implementation and to avoid needing the user to keep adding an extra |
| 1803 | argument to every call, as multiple jit states generating code in |
| 1804 | paralell should be very uncommon. |
| 1805 | |
| 1806 | @node Registers |
| 1807 | @chapter Accessing the whole register file |
| 1808 | |
| 1809 | As mentioned earlier in this chapter, all @lightning{} back-ends are |
| 1810 | guaranteed to have at least six general-purpose integer registers and |
| 1811 | six floating-point registers, but many back-ends will have more. |
| 1812 | |
| 1813 | To access the entire register files, you can use the |
| 1814 | @code{JIT_R}, @code{JIT_V} and @code{JIT_F} macros. They |
| 1815 | accept a parameter that identifies the register number, which |
| 1816 | must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM} |
| 1817 | and @code{JIT_F_NUM} respectively; the number need not be |
| 1818 | constant. Of course, expressions like @code{JIT_R0} and |
| 1819 | @code{JIT_R(0)} denote the same register, and likewise for |
| 1820 | integer callee-saved, or floating-point, registers. |
| 1821 | |
| 1822 | @section Scratch registers |
| 1823 | |
| 1824 | For operations, @lightning{} does not support directly, like storing |
| 1825 | a literal in memory, @code{jit_get_reg} and @code{jit_unget_reg} can be used to |
| 1826 | acquire and release a scratch register as in the following pattern: |
| 1827 | |
| 1828 | @example |
| 1829 | jit_int32_t reg = jit_get_reg (jit_class_gpr); |
| 1830 | jit_movi (reg, immediate); |
| 1831 | jit_stxi (offsetof (some_struct, some_field), JIT_V0, reg); |
| 1832 | jit_unget_reg (reg); |
| 1833 | @end example |
| 1834 | |
| 1835 | As @code{jit_get_reg} and @code{jit_unget_reg} may generate spills and |
| 1836 | reloads but don't follow branches, the code between both must be in |
| 1837 | the same basic block and must not contain any branches as in the |
| 1838 | following (bad) example. |
| 1839 | |
| 1840 | @example |
| 1841 | jit_int32_t reg = jit_get_reg (jit_class_gpr); |
| 1842 | jit_ldxi (reg, JIT_V0, offset); |
| 1843 | jump = jit_bnei (reg, V0); |
| 1844 | jit_movr (JIT_V1, reg); |
| 1845 | jit_patch (jump); |
| 1846 | jit_unget_reg (reg); |
| 1847 | @end example |
| 1848 | |
| 1849 | @node Customizations |
| 1850 | @chapter Customizations |
| 1851 | |
| 1852 | Frequently it is desirable to have more control over how code is |
| 1853 | generated or how memory is used during jit generation or execution. |
| 1854 | |
| 1855 | @section Memory functions |
| 1856 | To aid in complete control of memory allocation and deallocation |
| 1857 | @lightning{} provides wrappers that default to standard @code{malloc}, |
| 1858 | @code{realloc} and @code{free}. These are loosely based on the |
| 1859 | GNU GMP counterparts, with the difference that they use the same |
| 1860 | prototype of the system allocation functions, that is, no @code{size} |
| 1861 | for @code{free} or @code{old_size} for @code{realloc}. |
| 1862 | |
| 1863 | @deftypefun void jit_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t), @* void (*@var{free_func_ptr}) (void *)) |
| 1864 | @lightning{} guarantees that memory is only allocated or released |
| 1865 | using these wrapped functions, but you must note that if lightning |
| 1866 | was linked to GNU binutils, malloc is probably will be called multiple |
| 1867 | times from there when initializing the disassembler. |
| 1868 | |
| 1869 | Because @code{init_jit} may call memory functions, if you need to call |
| 1870 | @code{jit_set_memory_functions}, it must be called before @code{init_jit}, |
| 1871 | otherwise, when calling @code{finish_jit}, a pointer allocated with the |
| 1872 | previous or default wrappers will be passed. |
| 1873 | @end deftypefun |
| 1874 | |
| 1875 | @deftypefun void jit_get_memory_functions (@* void *(**@var{alloc_func_ptr}) (size_t), @* void *(**@var{realloc_func_ptr}) (void *, size_t), @* void (**@var{free_func_ptr}) (void *)) |
| 1876 | Get the current memory allocation function. Also, unlike the GNU GMP |
| 1877 | counterpart, it is an error to pass @code{NULL} pointers as arguments. |
| 1878 | @end deftypefun |
| 1879 | |
| 1880 | @section Protection |
| 1881 | Unless an alternate code buffer is used (see below), @code{jit_emit} |
| 1882 | set the access protections that the code buffer's memory can be read and |
| 1883 | executed, but not modified. One can use the following functions after |
| 1884 | @code{jit_emit} but before @code{jit_clear} to temporarily lift the |
| 1885 | protection: |
| 1886 | |
| 1887 | @deftypefun void jit_unprotect () |
| 1888 | Changes the access protection that the code buffer's memory can be read and |
| 1889 | modified. Before the emitted code can be invoked, @code{jit_protect} |
| 1890 | has to be called to reset the change. |
| 1891 | |
| 1892 | This procedure has no effect when an alternate code buffer (see below) is used. |
| 1893 | @end deftypefun |
| 1894 | |
| 1895 | @deftypefun void jit_protect () |
| 1896 | Changes the access protection that the code buffer's memory can be read and |
| 1897 | executed. |
| 1898 | |
| 1899 | This procedure has no effect when an alternate code buffer (see below) is used. |
| 1900 | @end deftypefun |
| 1901 | |
| 1902 | @section Alternate code buffer |
| 1903 | To instruct @lightning{} to use an alternate code buffer it is required |
| 1904 | to call @code{jit_realize} before @code{jit_emit}, and then query states |
| 1905 | and customize as appropriate. |
| 1906 | |
| 1907 | @deftypefun void jit_realize () |
| 1908 | Must be called once, before @code{jit_emit}, to instruct @lightning{} |
| 1909 | that no other @code{jit_xyz} call will be made. |
| 1910 | @end deftypefun |
| 1911 | |
| 1912 | @deftypefun jit_pointer_t jit_get_code (jit_word_t *@var{code_size}) |
| 1913 | Returns NULL or the previous value set with @code{jit_set_code}, and |
| 1914 | sets the @var{code_size} argument to an appropriate value. |
| 1915 | If @code{jit_get_code} is called before @code{jit_emit}, the |
| 1916 | @var{code_size} argument is set to the expected amount of bytes |
| 1917 | required to generate code. |
| 1918 | If @code{jit_get_code} is called after @code{jit_emit}, the |
| 1919 | @var{code_size} argument is set to the exact amount of bytes used |
| 1920 | by the code. |
| 1921 | @end deftypefun |
| 1922 | |
| 1923 | @deftypefun void jit_set_code (jit_ponter_t @var{code}, jit_word_t @var{size}) |
| 1924 | Instructs @lightning{} to output to the @var{code} argument and |
| 1925 | use @var{size} as a guard to not write to invalid memory. If during |
| 1926 | @code{jit_emit} @lightning{} finds out that the code would not fit |
| 1927 | in @var{size} bytes, it halts code emit and returns @code{NULL}. |
| 1928 | @end deftypefun |
| 1929 | |
| 1930 | A simple example of a loop using an alternate buffer is: |
| 1931 | |
| 1932 | @example |
| 1933 | jit_uint8_t *code; |
| 1934 | int *(func)(int); @rem{/* function pointer */} |
| 1935 | jit_word_t code_size; |
| 1936 | jit_word_t real_code_size; |
| 1937 | @rem{...} |
| 1938 | jit_realize(); @rem{/* ready to generate code */} |
| 1939 | jit_get_code(&code_size); @rem{/* get expected code size */} |
| 1940 | code_size = (code_size + 4095) & -4096; |
| 1941 | do (;;) @{ |
| 1942 | code = mmap(NULL, code_size, PROT_EXEC | PROT_READ | PROT_WRITE, |
| 1943 | MAP_PRIVATE | MAP_ANON, -1, 0); |
| 1944 | jit_set_code(code, code_size); |
| 1945 | if ((func = jit_emit()) == NULL) @{ |
| 1946 | munmap(code, code_size); |
| 1947 | code_size += 4096; |
| 1948 | @} |
| 1949 | @} while (func == NULL); |
| 1950 | jit_get_code(&real_code_size); @rem{/* query exact size of the code */} |
| 1951 | @end example |
| 1952 | |
| 1953 | The first call to @code{jit_get_code} should return @code{NULL} and set |
| 1954 | the @code{code_size} argument to the expected amount of bytes required |
| 1955 | to emit code. |
| 1956 | The second call to @code{jit_get_code} is after a successful call to |
| 1957 | @code{jit_emit}, and will return the value previously set with |
| 1958 | @code{jit_set_code} and set the @code{real_code_size} argument to the |
| 1959 | exact amount of bytes used to emit the code. |
| 1960 | |
| 1961 | @section Alternate data buffer |
| 1962 | Sometimes it may be desirable to customize how, or to prevent |
| 1963 | @lightning{} from using an extra buffer for constants or debug |
| 1964 | annotation. Usually when also using an alternate code buffer. |
| 1965 | |
| 1966 | @deftypefun jit_pointer_t jit_get_data (jit_word_t *@var{data_size}, jit_word_t *@var{note_size}) |
| 1967 | Returns @code{NULL} or the previous value set with @code{jit_set_data}, |
| 1968 | and sets the @var{data_size} argument to how many bytes are required |
| 1969 | for the constants data buffer, and @var{note_size} to how many bytes |
| 1970 | are required to store the debug note information. |
| 1971 | Note that it always preallocate one debug note entry even if |
| 1972 | @code{jit_name} or @code{jit_note} are never called, but will return |
| 1973 | zero in the @var{data_size} argument if no constant is required; |
| 1974 | constants are only used for the @code{float} and @code{double} operations |
| 1975 | that have an immediate argument, and not in all @lightning{} ports. |
| 1976 | @end deftypefun |
| 1977 | |
| 1978 | @deftypefun void jit_set_data (jit_pointer_t @var{data}, jit_word_t @var{size}, jit_word_t @var{flags}) |
| 1979 | |
| 1980 | @var{data} can be NULL if disabling constants and annotations, otherwise, |
| 1981 | a valid pointer must be passed. An assertion is done that the data will |
| 1982 | fit in @var{size} bytes (but that is a noop if @lightning{} was built |
| 1983 | with @code{-DNDEBUG}). |
| 1984 | |
| 1985 | @var{size} tells the space in bytes available in @var{data}. |
| 1986 | |
| 1987 | @var{flags} can be zero to tell to just use the alternate data buffer, |
| 1988 | or a composition of @code{JIT_DISABLE_DATA} and @code{JIT_DISABLE_NOTE} |
| 1989 | |
| 1990 | @table @t |
| 1991 | @item JIT_DISABLE_DATA |
| 1992 | @cindex JIT_DISABLE_DATA |
| 1993 | Instructs @lightning{} to not use a constant table, but to use an |
| 1994 | alternate method to synthesize those, usually with a larger code |
| 1995 | sequence using stack space to transfer the value from a GPR to a |
| 1996 | FPR register. |
| 1997 | |
| 1998 | @item JIT_DISABLE_NOTE |
| 1999 | @cindex JIT_DISABLE_NOTE |
| 2000 | Instructs @lightning{} to not store file or function name, and |
| 2001 | line numbers in the constant buffer. |
| 2002 | @end table |
| 2003 | @end deftypefun |
| 2004 | |
| 2005 | A simple example of a preventing usage of a data buffer is: |
| 2006 | |
| 2007 | @example |
| 2008 | @rem{...} |
| 2009 | jit_realize(); @rem{/* ready to generate code */} |
| 2010 | jit_get_data(NULL, NULL); |
| 2011 | jit_set_data(NULL, 0, JIT_DISABLE_DATA | JIT_DISABLE_NOTE); |
| 2012 | @rem{...} |
| 2013 | @end example |
| 2014 | |
| 2015 | Or to only use a data buffer, if required: |
| 2016 | |
| 2017 | @example |
| 2018 | jit_uint8_t *data; |
| 2019 | jit_word_t data_size; |
| 2020 | @rem{...} |
| 2021 | jit_realize(); @rem{/* ready to generate code */} |
| 2022 | jit_get_data(&data_size, NULL); |
| 2023 | if (data_size) |
| 2024 | data = malloc(data_size); |
| 2025 | else |
| 2026 | data = NULL; |
| 2027 | jit_set_data(data, data_size, JIT_DISABLE_NOTE); |
| 2028 | @rem{...} |
| 2029 | if (data) |
| 2030 | free(data); |
| 2031 | @rem{...} |
| 2032 | @end example |
| 2033 | |
| 2034 | @node Acknowledgements |
| 2035 | @chapter Acknowledgements |
| 2036 | |
| 2037 | As far as I know, the first general-purpose portable dynamic code |
| 2038 | generator is @sc{dcg}, by Dawson R.@: Engler and T.@: A.@: Proebsting. |
| 2039 | Further work by Dawson R. Engler resulted in the @sc{vcode} system; |
| 2040 | unlike @sc{dcg}, @sc{vcode} used no intermediate representation and |
| 2041 | directly inspired @lightning{}. |
| 2042 | |
| 2043 | Thanks go to Ian Piumarta, who kindly accepted to release his own |
| 2044 | program @sc{ccg} under the GNU General Public License, thereby allowing |
| 2045 | @lightning{} to use the run-time assemblers he had wrote for @sc{ccg}. |
| 2046 | @sc{ccg} provides a way of dynamically assemble programs written in the |
| 2047 | underlying architecture's assembly language. So it is not portable, |
| 2048 | yet very interesting. |
| 2049 | |
| 2050 | I also thank Steve Byrne for writing GNU Smalltalk, since @lightning{} |
| 2051 | was first developed as a tool to be used in GNU Smalltalk's dynamic |
| 2052 | translator from bytecodes to native code. |