Automatic data types fusion when deriving generator for constructor #55

buzden · 2023-03-20T13:05:13Z

This can be seen as generation-time optimisation by growing the size of derived code. @AlgebraicWolf tested fusion in an ad-hoc example and it gave better results. We can generalise this.

The idea is to generate a data type for each data constructor (which is of a (complex?) dependent type) which would contain all constraints of data types of all constructor parameters. When we generate arguments for this particular constructor, instead of a sequence of subgeneration calls and then assembling it to the constructor's call, we can simply generate that specialised data structure and use it to generate the constructor call.

For example, consider couple of GADTs:

data X : Nat -> Type where
  X1 : Double -> X1
  Xn : (n : Nat) -> X (S n)

data Y : Nat -> Type where
  Y0 : String -> Y 0
  Y1 : Char -> Y 1
  Yn : Y n

If we derive a generator for data type Z defined like

data Z : Type where
  Zx : Z
  Zn : (n : Nat) -> X n -> Y n -> Z

we now make generator for Z to be alternatives of generator for Zx and generator for Zn, where generator of Zn is an alternative between approximately

do (n ** xn) <- genAnyX
   yn <- genY n
   pure $ Zn n xn yn

and

do (n ** yn) <- genAnyY
   xn <- genX n
   pure $ Zn n xn yn

What is suggested here is to create a special data type

data ForZn : Type where
  ForZn_X1_Y1 : Double -> Char -> ForZn
  ForZn_X1_Yn : Double -> ForZn
  ForZn_Xn_Y1 : Char -> ForZn
  ForZn_Xn_Yn : (n : Nat) -> ForZn

which contains all possible data from all possible arguments of the constructor Zn in a fused way. So, there exists a function
zn : ForZn -> Z which produces all possible calls to constructor Zn:

zn : ForZn -> Z
zn $ ForZn_X1_Y1 d k = Zn $ 1 (X1 d) (Y1 k)
zn $ ForZn_X1_Yn d   = Zn $ 1 (X1 d) Yn
zn $ ForZn_Xn_Y1 c   = Zn $ 1 (Xn 0) (Y1 k)
zn $ ForZn_Xn_Yn n   = Zn $ (S n) (Xn n) Yn

So, generator for Zn can simply look like zn <$> genForZn, and generator for ForZn can be simply derived as usual (it does not contain dependent types in its constructors, so this mechanism should not be applied to it recursively).

The text was updated successfully, but these errors were encountered:

AlgebraicWolf · 2024-02-29T09:48:02Z

A simple example demonstrating the importance of fusion would be the following set of data types:

data X : Nat -> Nat -> Type where
  MkX : X 42 m

data Y : Nat -> Nat -> Type where
  MkY : Y n 1337

data Z : Type where
  MkZ : (n : Nat) -> (m : Nat) -> X n m -> Y n m -> Z

Currently, the derived generator for Z would turn out to be grossly inefficient. There are two possible orderings of argument generation for constructor MkZ: X[], Y[0, 1] and Y[], X[0, 1], and either one is impractical to use.

However, this specific case could be solved with even the simplest fusion algorithm that works only on non-recursive data types. It would result in the following type:

XY : Nat -> Nat -> Type where
  MkXMkY : XY 42 1337

Which can be efficiently generated.

buzden · 2024-03-05T13:14:18Z

Just for a record, when you have the following data type definitions (based on the comment above)

data X : Nat -> Nat -> Type where
  MkX : X 42 m
  MkX2 : X n m
  MkX3 : X 42 24

data Y : Nat -> Nat -> Type where
  MkY : Y n 1337

data Z : Type where
  MkZ : (n : Nat) -> (m : Nat) -> X n m -> Y n m -> Z

having available the following (effective) generators

genX : Fuel -> Gen MaybeEmpty (n ** m ** X n m)
genXm : Fuel -> (m : Nat) -> Gen MaybeEmpty (n ** X n m)
genXnm : Fuel -> (n, m : Nat) -> Gen MaybeEmpty (X n m)

genY : Fuel -> Gen MaybeEmpty (n ** m ** Y n m)
genYn : Fuel -> (n : Nat) -> Gen MaybeEmpty (m ** Y n m)
genYnm : Fuel -> (n, m : Nat) -> Gen MaybeEmpty (Y n m)

currently we derive generator for Z which is equivalent to the following variant (using simplicifaction hack):

genZ : Fuel -> Gen MaybeEmpty Z
genZ fl = do
  (n ** m ** x) <- genX fl
  y <- genYnm fl n m
  pure $ MkZ n m x y

or, alternatively, this one

genZ' : Fuel -> Gen MaybeEmpty Z
genZ' fl = do
  (n ** m ** y) <- genY fl
  x <- genXnm fl n m
  pure $ MkZ n m x y

or alternative of both is simplification-hack option is turned off. Notice, that technically derived code is not the same, but it is equivalent for this particular example.

If we tries to use most effective subgenerators for both X and Y, we would run into a chicken and egg problem:

genZ'' : Fuel -> Gen MaybeEmpty Z
genZ'' fl = do
  (m ** y) <- genYn fl ?generated_below
  (n ** x) <- genXm fl m
  pure $ MkZ n m x ?y's_n_does_not_unify

So, what we could do is to employ the fusion. For this particular case, we can see than in the MkZ when using X and Y types, only two free variables are used, n and m, thus we can derive a fused data type indexed by them:

data XY : (n : Nat) -> (m : Nat) -> Type where
  MkX_MkY : XY 42 1337
  MkX2_MkY : XY n 1337

It contains a cartesian product of all the constructors that match each other (i.e. where actual indices unify with each other).

Also, we can derive the splitting function

splitXY : XY n m -> (X n m, Y n m)
splitXY MkX_MkY        = (MkX {m=1337}     , MkY {n=42})
splitXY (MkX2_MkY {n}) = (MkX2 {n} {m=1337}, MkY {n})

Thus, we can call usual derivation for the fused type:

genXY : Fuel -> Gen MaybeEmpty (n ** m ** XY n m)

and derive the following generator for Z:

genZ_ultimate : Fuel -> Gen MaybeEmpty Z
genZ_ultimate fl = do
  (n ** m ** xy) <- genXY fl
  let (x, y) = splitXY xy
  pure $ MkZ n m x y

buzden added the code: heuristics Involves solution that applied only to a specific case label Mar 25, 2024

SimonTsirikov linked a pull request Apr 23, 2024 that will close this issue

[ derive ] Implements data types fusion #144

Draft

buzden linked a pull request Apr 23, 2024 that will close this issue

[ derive ] Implements data types fusion #144

Draft

buzden mentioned this issue Apr 23, 2024

Pre-calculate and filter out combinations of unfortunate orderings, especially given indices #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic data types fusion when deriving generator for constructor #55

Automatic data types fusion when deriving generator for constructor #55

buzden commented Mar 20, 2023

AlgebraicWolf commented Feb 29, 2024 •

edited

Loading

buzden commented Mar 5, 2024 •

edited

Loading

Automatic data types fusion when deriving generator for constructor #55

Automatic data types fusion when deriving generator for constructor #55

Comments

buzden commented Mar 20, 2023

AlgebraicWolf commented Feb 29, 2024 • edited Loading

buzden commented Mar 5, 2024 • edited Loading

AlgebraicWolf commented Feb 29, 2024 •

edited

Loading

buzden commented Mar 5, 2024 •

edited

Loading