Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic data types fusion when deriving generator for constructor #55

Open
buzden opened this issue Mar 20, 2023 · 2 comments · May be fixed by #144
Open

Automatic data types fusion when deriving generator for constructor #55

buzden opened this issue Mar 20, 2023 · 2 comments · May be fixed by #144
Labels
code: enhancement New feature or improvement code: heuristics Involves solution that applied only to a specific case derive: least-effort Relates to the `LeastEffort` derivation algorithm issue: performance When work takes too much resources part: derivation Related to automated derivation of generators status: feature request Request for new functionality or improvement

Comments

@buzden
Copy link
Owner

buzden commented Mar 20, 2023

This can be seen as generation-time optimisation by growing the size of derived code. @AlgebraicWolf tested fusion in an ad-hoc example and it gave better results. We can generalise this.

The idea is to generate a data type for each data constructor (which is of a (complex?) dependent type) which would contain all constraints of data types of all constructor parameters. When we generate arguments for this particular constructor, instead of a sequence of subgeneration calls and then assembling it to the constructor's call, we can simply generate that specialised data structure and use it to generate the constructor call.


For example, consider couple of GADTs:

data X : Nat -> Type where
  X1 : Double -> X1
  Xn : (n : Nat) -> X (S n)

data Y : Nat -> Type where
  Y0 : String -> Y 0
  Y1 : Char -> Y 1
  Yn : Y n

If we derive a generator for data type Z defined like

data Z : Type where
  Zx : Z
  Zn : (n : Nat) -> X n -> Y n -> Z

we now make generator for Z to be alternatives of generator for Zx and generator for Zn, where generator of Zn is an alternative between approximately

do (n ** xn) <- genAnyX
   yn <- genY n
   pure $ Zn n xn yn

and

do (n ** yn) <- genAnyY
   xn <- genX n
   pure $ Zn n xn yn

What is suggested here is to create a special data type

data ForZn : Type where
  ForZn_X1_Y1 : Double -> Char -> ForZn
  ForZn_X1_Yn : Double -> ForZn
  ForZn_Xn_Y1 : Char -> ForZn
  ForZn_Xn_Yn : (n : Nat) -> ForZn

which contains all possible data from all possible arguments of the constructor Zn in a fused way. So, there exists a function
zn : ForZn -> Z which produces all possible calls to constructor Zn:

zn : ForZn -> Z
zn $ ForZn_X1_Y1 d k = Zn $ 1 (X1 d) (Y1 k)
zn $ ForZn_X1_Yn d   = Zn $ 1 (X1 d) Yn
zn $ ForZn_Xn_Y1 c   = Zn $ 1 (Xn 0) (Y1 k)
zn $ ForZn_Xn_Yn n   = Zn $ (S n) (Xn n) Yn

So, generator for Zn can simply look like zn <$> genForZn, and generator for ForZn can be simply derived as usual (it does not contain dependent types in its constructors, so this mechanism should not be applied to it recursively).

@buzden buzden added code: enhancement New feature or improvement part: derivation Related to automated derivation of generators status: feature request Request for new functionality or improvement issue: performance When work takes too much resources derive: least-effort Relates to the `LeastEffort` derivation algorithm labels Mar 20, 2023
@AlgebraicWolf
Copy link
Contributor

AlgebraicWolf commented Feb 29, 2024

A simple example demonstrating the importance of fusion would be the following set of data types:

data X : Nat -> Nat -> Type where
  MkX : X 42 m

data Y : Nat -> Nat -> Type where
  MkY : Y n 1337

data Z : Type where
  MkZ : (n : Nat) -> (m : Nat) -> X n m -> Y n m -> Z

Currently, the derived generator for Z would turn out to be grossly inefficient. There are two possible orderings of argument generation for constructor MkZ: X[], Y[0, 1] and Y[], X[0, 1], and either one is impractical to use.

However, this specific case could be solved with even the simplest fusion algorithm that works only on non-recursive data types. It would result in the following type:

XY : Nat -> Nat -> Type where
  MkXMkY : XY 42 1337

Which can be efficiently generated.

@buzden
Copy link
Owner Author

buzden commented Mar 5, 2024

Just for a record, when you have the following data type definitions (based on the comment above)

data X : Nat -> Nat -> Type where
  MkX : X 42 m
  MkX2 : X n m
  MkX3 : X 42 24

data Y : Nat -> Nat -> Type where
  MkY : Y n 1337

data Z : Type where
  MkZ : (n : Nat) -> (m : Nat) -> X n m -> Y n m -> Z

having available the following (effective) generators

genX : Fuel -> Gen MaybeEmpty (n ** m ** X n m)
genXm : Fuel -> (m : Nat) -> Gen MaybeEmpty (n ** X n m)
genXnm : Fuel -> (n, m : Nat) -> Gen MaybeEmpty (X n m)

genY : Fuel -> Gen MaybeEmpty (n ** m ** Y n m)
genYn : Fuel -> (n : Nat) -> Gen MaybeEmpty (m ** Y n m)
genYnm : Fuel -> (n, m : Nat) -> Gen MaybeEmpty (Y n m)

currently we derive generator for Z which is equivalent to the following variant (using simplicifaction hack):

genZ : Fuel -> Gen MaybeEmpty Z
genZ fl = do
  (n ** m ** x) <- genX fl
  y <- genYnm fl n m
  pure $ MkZ n m x y

or, alternatively, this one

genZ' : Fuel -> Gen MaybeEmpty Z
genZ' fl = do
  (n ** m ** y) <- genY fl
  x <- genXnm fl n m
  pure $ MkZ n m x y

or alternative of both is simplification-hack option is turned off. Notice, that technically derived code is not the same, but it is equivalent for this particular example.

If we tries to use most effective subgenerators for both X and Y, we would run into a chicken and egg problem:

genZ'' : Fuel -> Gen MaybeEmpty Z
genZ'' fl = do
  (m ** y) <- genYn fl ?generated_below
  (n ** x) <- genXm fl m
  pure $ MkZ n m x ?y's_n_does_not_unify

So, what we could do is to employ the fusion. For this particular case, we can see than in the MkZ when using X and Y types, only two free variables are used, n and m, thus we can derive a fused data type indexed by them:

data XY : (n : Nat) -> (m : Nat) -> Type where
  MkX_MkY : XY 42 1337
  MkX2_MkY : XY n 1337

It contains a cartesian product of all the constructors that match each other (i.e. where actual indices unify with each other).

Also, we can derive the splitting function

splitXY : XY n m -> (X n m, Y n m)
splitXY MkX_MkY        = (MkX {m=1337}     , MkY {n=42})
splitXY (MkX2_MkY {n}) = (MkX2 {n} {m=1337}, MkY {n})

Thus, we can call usual derivation for the fused type:

genXY : Fuel -> Gen MaybeEmpty (n ** m ** XY n m)

and derive the following generator for Z:

genZ_ultimate : Fuel -> Gen MaybeEmpty Z
genZ_ultimate fl = do
  (n ** m ** xy) <- genXY fl
  let (x, y) = splitXY xy
  pure $ MkZ n m x y

@buzden buzden added the code: heuristics Involves solution that applied only to a specific case label Mar 25, 2024
@SimonTsirikov SimonTsirikov linked a pull request Apr 23, 2024 that will close this issue
@buzden buzden linked a pull request Apr 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code: enhancement New feature or improvement code: heuristics Involves solution that applied only to a specific case derive: least-effort Relates to the `LeastEffort` derivation algorithm issue: performance When work takes too much resources part: derivation Related to automated derivation of generators status: feature request Request for new functionality or improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants