Fix inefficiency/bug in x86 reg alloc of floating-point stack

The x86 register allocator for the floating-point stack for a simple float-to-float move tries to optimize for the common case of moving from an allocated but to-be-removed memloc by simply reassigning the floating-point stack slot to the destination. However, this was mistakenly done before performing the register-allocation "pre" work, which had a number of unfortunate side effects. Since the destination memloc has not been written to memory, its newly assigned floating-point stack slot is marked "not synchronized". The subsequent "pre" work would redundantly save the destination to memory in order to "synchronize". This is redundant, because the float-to-float move is typically the first step of performing a floating-point operation (remember, in x86 assembly, a source operand is first moved to the destination operand and then the operation is performed (possibly with additional source operands), since x86 instructions use one operand as both a source and destination). Another side effect is that the "pre" work may have modified the floating-point stack (e.g., to get the destination to the floating-point-stack top in order for the `fst` instruction to write it to memory). The instruction was marking the original source floating-point-stack-slot location as "not synchronized" (because it has been reassigned to the destination memloc), but, due to the potential modification of the floating-point stack, this would incorrectly mark some other memloc's floating-point-stack slot as "not synchronized". Typically, this would again simply be an inefficiency, since the incorrectly marked memloc/floating-point-stack-slot would simply be redundantly written back to memory at some latter point. In rare circumstances, the incorrectly marked memloc corresponds to a floating-point constant stored in the immutable static heap. (Because it is a pain to load `float` and `double` constants in x86/amd64 assembly, the compiler collects up all of the floating-point constants and allocates global vectors of constants and then just fetches the constants from those vectors. Since they are global constants, they can be allocated in the "immutable static heap" (`staticHeapI`).) Writing such a value back to memory would be incorrect, since the immutable static heap should be read-only memory. This would manifest as a segmentation fault at an instruction like: fstpL (staticHeapI+0x1E8) However, this bug wasn't observed before, because the "immutable static heap" was only introduced much after the switch to amd64 as the primary development platform. (Prior to the "immutable static heap", the floating point constants were allocated as vectors in the "normal" heap, which is, of course, writeable.) The fix is to simply perform the register-allocation "pre" work before the reassignment of the floating-point stack slot to the destination.
MLton · Nov 11, 2023 · 8c6450e · 8c6450e
1 parent 04d7690
commit 8c6450e
Showing 1 changed file with 10 additions and 33 deletions.
diff --git a/mlton/codegen/x86-codegen/x86-allocate-registers.fun b/mlton/codegen/x86-codegen/x86-allocate-registers.fun
@@ -6599,18 +6599,18 @@ struct
                  registerAllocation,
                  src, dst, srcsize, dstsize} =
          let
+            val {uses,defs,kills}
+               = Instruction.uses_defs_kills instruction
+            val {assembly = assembly_pre,
+                 registerAllocation}
+               = RA.pre {uses = uses,
+                         defs = defs,
+                         kills = kills,
+                         info = info,
+                         registerAllocation = registerAllocation}
+
             fun default ()
                = let
-                    val {uses,defs,kills} 
-                       = Instruction.uses_defs_kills instruction
-                    val {assembly = assembly_pre,
-                         registerAllocation}
-                       = RA.pre {uses = uses,
-                                 defs = defs,
-                                 kills = kills,
-                                 info = info,
-                                 registerAllocation = registerAllocation}
-
                     val {operand = final_src,
                          assembly = assembly_src,
                          registerAllocation,
@@ -6680,16 +6680,6 @@ struct
 
             fun default' ()
                = let
-                    val {uses,defs,kills} 
-                       = Instruction.uses_defs_kills instruction
-                    val {assembly = assembly_pre,
-                         registerAllocation}
-                       = RA.pre {uses = uses,
-                                 defs = defs,
-                                 kills = kills,
-                                 info = info,
-                                 registerAllocation = registerAllocation}
-
                     val {operand = final_src,
                          assembly = assembly_src,
                          registerAllocation,
@@ -6799,19 +6789,6 @@ struct
                                                         registerAllocation 
                                                         = registerAllocation}
 
-                                                  val {uses,defs,kills} 
-                                                     = Instruction.uses_defs_kills
-                                                       instruction
-                                                  val {assembly = assembly_pre,
-                                                       registerAllocation}
-                                                     = RA.pre 
-                                                       {uses = uses,
-                                                        defs = defs,
-                                                        kills = kills,
-                                                        info = info,
-                                                        registerAllocation 
-                                                        = registerAllocation}
-
                                                   val final_uses = []
                                                   val final_defs 
                                                      = [Operand.fltregister