./configure
@end example
-@lightning{} supports the @code{--enable-disassembler} option, that
-enables linking to GNU binutils and optionally print human readable
+The @file{configure} accepts the @code{--enable-disassembler} option,
+hat enables linking to GNU binutils and optionally print human readable
disassembly of the jit code. This option can be disabled by the
@code{--disable-disassembler} option.
-Another option that @file{configure} accepts is
-@code{--enable-assertions}, which enables several consistency checks in
-the run-time assemblers. These are not usually needed, so you can
-decide to simply forget about it; also remember that these consistency
+@file{configure} also accepts the @code{--enable-devel-disassembler},
+option useful to check exactly hat machine instructions were generated
+for a @lightning{} instrction. Basically mixing @code{jit_print} and
+@code{jit_disassembly}.
+
+The @code{--enable-assertions} option, which enables several consistency
+hecks in the run-time assemblers. These are not usually needed, so you
+can decide to simply forget about it; also remember that these consistency
checks tend to slow down your code generator.
+The @code{--enable-devel-strong-type-checking} option that does extra type
+checking using @code{assert}. This option also enables the
+@code{--enable-assertions} unless it is explicitly disabled.
+
+The option @code{--enable-devel-get-jit-size} should only be used
+when doing updates or maintenance to lightning. It regenerates the
+@code{jit_$ARCH]-sz.c} creating a table or maximum bytes usage when
+translating a @lightning{} instruction to machine code.
+
After you've configured @lightning{}, run @file{make} as usual.
@lightning{} has an extensive set of tests to validate it is working
@example
negr _f _d O1 = -O2
comr O1 = ~O2
+clor O1 = number of leading one bits
+clzr O1 = number of leading zero bits
+ctor O1 = number of trailing one bits
+ctzr O1 = number of trailing zero bits
@end example
+Note that @code{ctzr} is basically equivalent of a @code{C} call
+@code{ffs} but indexed at bit zero, not one.
+
+Contrary to @code{__builtin_ctz} and @code{__builtin_clz}, an input
+value of zero is not an error, it just returns the number of bits
+in a word, 64 if @lightning{} generates 64 bit instructions, otherwise
+it returns 32.
+
+The @code{clor} and @code{ctor} are just counterparts of the versions
+that search for zero bits.
+
These unary ALU operations are only defined for float operands.
@example
absr _f _d O1 = fabs(O2)
-sqrtr O1 = sqrt(O2)
+sqrtr _f _d O1 = sqrt(O2)
@end example
Besides requiring the @code{r} modifier, there are no unary operations
both cases, the first can be either a register or an immediate
value. Values are sign-extended to fit a whole register.
@example
-str _c _uc _s _us _i _ui _l _f _d *O1 = O2
-sti _c _uc _s _us _i _ui _l _f _d *O1 = O2
-stxr _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
-stxi _c _uc _s _us _i _ui _l _f _d *(O1+O2) = O3
+str _c _s _i _l _f _d *O1 = O2
+sti _c _s _i _l _f _d *O1 = O2
+stxr _c _s _i _l _f _d *(O1+O2) = O3
+stxi _c _s _i _l _f _d *(O1+O2) = O3
@end example
-As for the load operations, the @code{_ui} and @code{_l} types are
-only available in 64-bit architectures, and for convenience, there
-is a version without a type modifier for integer or pointer operands
-that uses the appropriate wordsize call.
+Note that the unsigned type modifier is not available, as the store
+only writes to the 1, 2, 4 or 8 sized memory address.
+The @code{_l} type is only available in 64-bit architectures, and for
+convenience, there is a version without a type modifier for integer or
+pointer operands that uses the appropriate wordsize call.
@item Argument management
These are:
@example
prepare (not specified)
va_start (not specified)
-pushargr _f _d
-pushargi _f _d
+pushargr _c _uc _s _us _i _ui _l _f _d
+pushargi _c _uc _s _us _i _ui _l _f _d
va_push (not specified)
-arg _f _d
+arg _c _uc _s _us _i _ui _l _f _d
getarg _c _uc _s _us _i _ui _l _f _d
va_arg _d
-putargr _f _d
-putargi _f _d
+putargr _c _uc _s _us _i _ui _l _f _d
+putargi _c _uc _s _us _i _ui _l _f _d
ret (not specified)
-retr _f _d
+retr _c _uc _s _us _i _ui _l _f _d
+reti _c _uc _s _us _i _ui _l _f _d
reti _f _d
va_end (not specified)
retval _c _uc _s _us _i _ui _l _f _d
left to right order}; and use @code{finish} or @code{call} (explained below)
to perform the actual call.
+Note that @code{arg}, @code{pusharg}, @code{putarg} and @code{ret} when
+handling integer types can be used without a type modifier.
+It is suggested to use matching type modifiers to @code{arg}, @code{putarg}
+and @code{getarg} otherwise problems will happen if generating jit for
+environments that require arguments to be truncated and zero or sign
+extended by the caller and/or excess arguments might be passed packed
+in the stack. Currently only Apple systems with @code{aarch64} cpus are
+known to have this restriction.
+
@code{va_start} returns a @code{C} compatible @code{va_list}. To fetch
arguments, use @code{va_arg} for integers and @code{va_arg_d} for doubles.
@code{va_push} is required when passing a @code{va_list} to another function,
bxsubi _u O2 -= O3@r{, goto }O1@r{ if no overflow}
@end example
+Note that the @code{C} code does not have an @code{O1} argument. It is
+required to always use the return value as an argument to @code{patch},
+@code{patch_at} or @code{patch_abs}.
+
@item Jump and return operations
These accept one argument except @code{ret} and @code{jmpi} which
have none; the difference between @code{finishi} and @code{calli}
indirect (not specified) @r{special simple label}
@end example
+The following instruction is used to specify a minimal alignment for
+the next instruction, usually with a label:
+@example
+align (not specified) @r{align code}
+@end example
+
+Similar to @code{align} is the next instruction, also usually used with
+a label:
+@example
+skip (not specified) @r{skip code}
+@end example
+It is used to specify a minimal number of bytes of nops to be inserted
+before the next instruction.
+
@code{label} is normally used as @code{patch_at} argument for backward
jumps.
the @code{movi}, but on some special conditions it is required to create
an "unbound" label.
+@code{align} is useful for creating multiple entry points to a
+(trampoline) function that are all accessible through a single
+function pointer. @code{align} receives an integer argument that
+defines the minimal alignment of the address of a label directly
+following the @code{align} instruction. The integer argument must be
+a power of two and the effective alignment will be a power of two no
+less than the argument to @code{align}. If the argument to
+@code{align} is 16 or more, the effective alignment will match the
+specified minimal alignment exactly.
+
+@example
+ jit_node_t *forward, *label1, *label2, *jump;
+ unsigned char *addr1, *addr2;
+forward = jit_forward();
+ jit_align(16);
+label1 = jit_indirect(); @rem{/* first entry point */}
+jump = jit_jmpi(); @rem{/* jump to first handler */}
+ jit_patch_at(jump, forward);
+ jit_align(16);
+label2 = jit_indirect(); @rem{/* second entry point */}
+ ... @rem{/* second handler */}
+ jit_jmpr(...);
+ jit_link(forward);
+ ... @rem{/* first handler /*}
+ jit_jmpr(...);
+ ...
+ jit_emit();
+ addr1 = jit_address(label1);
+ addr2 = jit_address(label2);
+ assert(addr2 - addr1 == 16); @rem{/* only one of the addresses needs to be remembered */}
+@end example
+
+@code{skip} is useful for reserving space in the code buffer that can
+later be filled (possibly with the help of the pair of functions
+@code{jit_unprotect} and @code{jit_protect}).
+
@item Function prolog
These macros are used to set up a function prolog. The @code{allocai}
is useful to know the live range of register arguments, as those
are very fast to read and write, but have volatile values.
-@code{callee_save_p} exects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
+@code{callee_save_p} expects a valid @code{JIT_Rn}, @code{JIT_Vn}, or
@code{JIT_Fn}, and will return non zero if the register is callee
save. This call is useful because on several ports, the @code{JIT_Rn}
and @code{JIT_Fn} registers are actually callee save; no need
@code{pointer_p} expects a pointer argument, and will return non
zero if the pointer is inside the generated jit code. Must be
called after @code{jit_emit} and before @code{jit_destroy_state}.
+
+@item Atomic operations
+Only compare-and-swap is implemented. It accepts four operands;
+the second can be an immediate.
+
+The first argument is set with a boolean value telling if the operation
+did succeed.
+
+Arguments must be different, cannot use the result register to also pass
+an argument.
+
+The second argument is the address of a machine word.
+
+The third argument is the old value.
+
+The fourth argument is the new value.
+
+@example
+casr 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
+casi 01 = (*O2 == O3) ? (*O2 = O4, 1) : 0
+@end example
+
+If value at the address in the second argument is equal to the third
+argument, the address value is atomically modified to the value of the
+fourth argument and the first argument is set to a non zero value.
+
+If the value at the address in the second argument is not equal to the
+third argument nothing is done and the first argument is set to zero.
@end table
@node GNU lightning examples
@table @b
@item x86_64
@example
- sub $0x30,%rsp
- mov %rbp,(%rsp)
- mov %rsp,%rbp
- sub $0x18,%rsp
- mov %rdi,%rax mov %rdi, %rax
- add $0x1,%rax inc %rax
- mov %rbp,%rsp
- mov (%rsp),%rbp
- add $0x30,%rsp
- retq retq
+ mov %rdi,%rax
+ add $0x1,%rax
+ ret
@end example
-In this case, the main overhead is due to the function's prolog and
-epilog, and stack alignment after reserving stack space for word
-to/from float conversions or moving data from/to x87 to/from SSE.
-Note that besides allocating space to save callee saved registers,
-no registers are saved/restored because @lightning{} notices those
-registers are not modified. There is currently no logic to detect
-if it needs to allocate stack space for type conversions neither
-proper leaf function detection, but these are subject to change
-(FIXME).
+In this case, for the x86 port, @lightning{} has simple optimizations
+to understand it is a leaf function, and that it is not required to
+create a stack frame nor update the stack pointer.
@end table
@node printf
in = jit_arg();
stack_ptr = stack_base = jit_allocai (32 * sizeof (int));
- jit_getarg_i(JIT_R2, in);
+ jit_getarg(JIT_R2, in);
while (*expr) @{
char buf[32];
counterpart, it is an error to pass @code{NULL} pointers as arguments.
@end deftypefun
+@section Protection
+Unless an alternate code buffer is used (see below), @code{jit_emit}
+set the access protections that the code buffer's memory can be read and
+executed, but not modified. One can use the following functions after
+@code{jit_emit} but before @code{jit_clear} to temporarily lift the
+protection:
+
+@deftypefun void jit_unprotect ()
+Changes the access protection that the code buffer's memory can be read and
+modified. Before the emitted code can be invoked, @code{jit_protect}
+has to be called to reset the change.
+
+This procedure has no effect when an alternate code buffer (see below) is used.
+@end deftypefun
+
+@deftypefun void jit_protect ()
+Changes the access protection that the code buffer's memory can be read and
+executed.
+
+This procedure has no effect when an alternate code buffer (see below) is used.
+@end deftypefun
+
@section Alternate code buffer
To instruct @lightning{} to use an alternate code buffer it is required
to call @code{jit_realize} before @code{jit_emit}, and then query states