Skip to content

Commit

Permalink
Fix inefficiency/bug in x86 reg alloc of floating-point stack
Browse files Browse the repository at this point in the history
The x86 register allocator for the floating-point stack for a simple
float-to-float move tries to optimize for the common case of moving from an
allocated but to-be-removed memloc by simply reassigning the floating-point
stack slot to the destination.  However, this was mistakenly done before
performing the register-allocation "pre" work, which had a number of unfortunate
side effects.

Since the destination memloc has not been written to memory, its newly assigned
floating-point stack slot is marked "not synchronized".  The subsequent "pre"
work would redundantly save the destination to memory in order to "synchronize".
This is redundant, because the float-to-float move is typically the first step
of performing a floating-point operation (remember, in x86 assembly, a source
operand is first moved to the destination operand and then the operation is
performed (possibly with additional source operands), since x86 instructions use
one operand as both a source and destination).

Another side effect is that the "pre" work may have modified the floating-point
stack (e.g., to get the destination to the floating-point-stack top in order for
the `fst` instruction to write it to memory).  The instruction was marking the
original source floating-point-stack-slot location as "not
synchronized" (because it has been reassigned to the destination memloc), but,
due to the potential modification of the floating-point stack, this would
incorrectly mark some other memloc's floating-point-stack slot as "not
synchronized".

Typically, this would again simply be an inefficiency, since the incorrectly
marked memloc/floating-point-stack-slot would simply be redundantly written back
to memory at some latter point.

In rare circumstances, the incorrectly marked memloc corresponds to a
floating-point constant stored in the immutable static heap.  (Because it is a
pain to load `float` and `double` constants in x86/amd64 assembly, the compiler
collects up all of the floating-point constants and allocates global vectors of
constants and then just fetches the constants from those vectors. Since they are
global constants, they can be allocated in the "immutable static
heap" (`staticHeapI`).)  Writing such a value back to memory would be incorrect,
since the immutable static heap should be read-only memory.  This would manifest
as a segmentation fault at an instruction like:

	fstpL (staticHeapI+0x1E8)

However, this bug wasn't observed before, because the "immutable static heap"
was only introduced much after the switch to amd64 as the primary development
platform.  (Prior to the "immutable static heap", the floating point constants
were allocated as vectors in the "normal" heap, which is, of course, writeable.)

The fix is to simply perform the register-allocation "pre" work before the
reassignment of the floating-point stack slot to the destination.
  • Loading branch information
MatthewFluet committed Nov 11, 2023
1 parent 04d7690 commit 8c6450e
Showing 1 changed file with 10 additions and 33 deletions.
43 changes: 10 additions & 33 deletions mlton/codegen/x86-codegen/x86-allocate-registers.fun
Original file line number Diff line number Diff line change
Expand Up @@ -6599,18 +6599,18 @@ struct
registerAllocation,
src, dst, srcsize, dstsize} =
let
val {uses,defs,kills}
= Instruction.uses_defs_kills instruction
val {assembly = assembly_pre,
registerAllocation}
= RA.pre {uses = uses,
defs = defs,
kills = kills,
info = info,
registerAllocation = registerAllocation}

fun default ()
= let
val {uses,defs,kills}
= Instruction.uses_defs_kills instruction
val {assembly = assembly_pre,
registerAllocation}
= RA.pre {uses = uses,
defs = defs,
kills = kills,
info = info,
registerAllocation = registerAllocation}

val {operand = final_src,
assembly = assembly_src,
registerAllocation,
Expand Down Expand Up @@ -6680,16 +6680,6 @@ struct

fun default' ()
= let
val {uses,defs,kills}
= Instruction.uses_defs_kills instruction
val {assembly = assembly_pre,
registerAllocation}
= RA.pre {uses = uses,
defs = defs,
kills = kills,
info = info,
registerAllocation = registerAllocation}

val {operand = final_src,
assembly = assembly_src,
registerAllocation,
Expand Down Expand Up @@ -6799,19 +6789,6 @@ struct
registerAllocation
= registerAllocation}

val {uses,defs,kills}
= Instruction.uses_defs_kills
instruction
val {assembly = assembly_pre,
registerAllocation}
= RA.pre
{uses = uses,
defs = defs,
kills = kills,
info = info,
registerAllocation
= registerAllocation}

val final_uses = []
val final_defs
= [Operand.fltregister
Expand Down

0 comments on commit 8c6450e

Please sign in to comment.