Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate missing change proposals to RFC 00014 #595

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 48 additions & 39 deletions rfc/0014-attribute-based-valuetypes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ SPDX-License-Identifier: AGPL-3.0-only

# RFC 0014: Attributed-based syntax for defining value types

| | |
|---|---|
| Feature Tag | `attribute-based value types` | <!-- TODO: choose a unique and declarative feature name -->
| Status | `DRAFT` | <!-- Possible values: DRAFT, DISCUSSION, ACCEPTED, REJECTED -->
| Responsible | `dirkriehle` | <!-- TODO: assign yourself as main driver of this RFC -->
<!--
| | |
| ----------- | ----------------------------- | --------------------------------------------------------------- |
| Feature Tag | `attribute-based value types` | <!-- TODO: choose a unique and declarative feature name --> |
| Status | `DISCUSSION` | <!-- Possible values: DRAFT, DISCUSSION, ACCEPTED, REJECTED --> |
| Responsible | `dirkriehle` | <!-- TODO: assign yourself as main driver of this RFC --> |

<!--
Status Overview:
- DRAFT: The RFC is not ready for a review and currently under change. Feel free to already ask for feedback on the structure and contents at this stage.
- DISCUSSION: The RFC is open for discussion. Usually, we open a PR to trigger discussions.
Expand All @@ -21,17 +22,11 @@ SPDX-License-Identifier: AGPL-3.0-only

## Summary

I'd like to allow the definition of user-defined single-attribute value types. In addition to the current syntax, in which an underlying value type is referenced through 'oftype', it does not use instantiation/inheritance but rather composition.
Value types are a core concept in Jayvee for data validation.
This RFC reworks the syntax to an attribute-based syntax.

## Motivation

The purpose of using composition is that it is

1. more similar to traditional approaches and
2. is needed anyway for multi-attribute value types.

## Explanation

To define a value type like CorrelationCoefficient and its range of -1 to +1, we have to write:

```jayvee
Expand All @@ -42,47 +37,61 @@ valuetype CorrelationCoefficient oftype decimal {
}
```

This syntax is smart in that you don't have to list and name an attribute but rather rely on an implicit 'value' attribute. Still, I propose to make that attribute explicit. New syntax would be:
This syntax is smart in that you don't have to list and name an attribute but rather rely on an implicit 'value' attribute.

The current syntax, in which an underlying value type is referenced through 'oftype', it does not use instantiation/inheritance but rather composition.
The purpose of using composition is that it is

1. more similar to traditional programming and
2. is needed anyway for multi-attribute value types.

Thus, this RFC removes reference to a parent value type in a value type definition and proposes an explicit attribute-based syntax instead that is extendable to a future multi-attribute syntax.

## Explanation

I propose to make that attribute explicit. The new syntax would be:

```jayvee
valuetype CorrelationCoefficient {
attributes: [
value oftype decimal;
];
constraints: [
MinusOneToPlusOneRange; // Don't know how to attach this to value attribute
];
property correlation oftype decimal;

constraint minusOneToPlusOneRange: MinusOneToPlusOneRange on correlation;
}

constraint MinusOneToPlusOneRange on decimal:
value >= -1 and value <= 1;
```

While more verbose, it prepares the way for
The syntax is kept flat rather than using arrays in alignment with the [Jayvee Design Principles](https://jvalue.github.io/jayvee/docs/0.4.0/dev/design-principles#jayvee-manifesto).

Despite referencing reusable constraints, the syntax also allows in-place constraint definition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this out of the RFC to make it smaller. We should probably have in-place constraint definitions but imho can do so in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind adding it here. It's not huge anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for a view on the syntax, if we go with my suggestion, that would be:

constraints: [ value >= 1 and value <=1 on correlation ]

equivalent to

constraints: [ MinusOneToPlusOneRange on correlation ]

My issue with that was that reading the in-place constraint definition with the following "on" is kind of confusing... But honestly, seeing how it looks here I am kind of happy with it. So I'll revert what I said and say we should have in place constraints as well, yeah.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't we use

constraints: [ correlation >= 1 and correlation <=1 ]

?

And combined:

constraints: [
    correlation >= 1 and correlation <=1,
    MinusOneToPlusOneRange on correlation
]

I don't see why we need to use on for in-place constraints as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you model two different attributes in that case? The reason I put the on correlation at the end of the reference is to allow for more than one attribute, e.g. correlation and unit and you could have

constraints: [
    value >= 1 and value <=1 on correlation,
    SomethingElse on unit
]

I'd semi-strongly feel about continuing to use value for inside the constraint and a clear reference to which attribute it is on to allow for future multi-attribute value types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use correlation instead of value.
This allows for example

constraints: [
    correlation >= 1 and otherUnit <=1
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. I dislike that because it introduces different syntax for inline constraints vs. references and I like the ability to always just create a constraint reference by c&p the inline code. But that is something that reasonable people can disagree on :D.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point @rhazn, and I see value in keeping the syntax consistent independently if defined in-line or as stand-alone element.

What do you think about changing the constraint definition syntax in a similar way to keep it consistent? Instead of the value keyword, constraint definitions would refer to the value type's properties they are based on:

constraint CorrelationCoefficient on decimal:
  value >= 1 and value <= 1; // "value" is the property name of builtin value types (so nothing changes for them)
  
constraint PostiveCorrelationConstraint on CorrelationCoefficient:
  correlation > 0; // "correlation" is a property of the value type CorrelationCoefficient
  
// Future, for multi-dimensional value types
constraint Coordinate2DInLeftUpperQuadrantConstraint on Coordinate2D:
  x > 0 and y > 0;

An obvious downside is the circular dependency of value type and constraint that could be incentivized (or should be prevented by hinting to in-line constraints in such cases).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this comment three times, not sure I can come to a strong opinion so in the end I would probably accept majority opinion, but:

I think that is unfortunate because it again introduces magic with the value name for builtin value types. It also still means it is weird to reuse constraints between inline definitions and general definitions. I would prefer if we keep value in general for one-dimensional value types.

I can not really think of a good way to model constrains on multi-dimensional value types that do not need to know the names of properties... For consistencies sake I would go with value.x. That does not resolve the issue of course, but it keeps the rule intact that value always refers to the underlying value of the value type.

Copy link
Member Author

@georg-schwarz georg-schwarz Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see two alternatives:

Alternative 1: Inline constraints use property names

Inline constraints use property names, while referenceable constraints use the value keyword for uni-dimensional (built-in) value type.

This leads to different syntax for inline and referenceable constraints.
However, users can call their property of single-attribute value types "value," and we can promote that through our documentation. Then, there is no syntactic difference when extracting an in-line constraint to a referenceable constraint.

// Referenceable constraint
valuetype CorrelationCoefficient {
  property correlation oftype decimal; 
  constraint minusOneToPlusOneRange: CorrelationCoefficient on correlation;
}
constraint CorrelationCoefficient on decimal:
  value >= -1 and value <= 1;

// Inline constraint (not following convention)
valuetype CorrelationCoefficient {
  property correlation oftype decimal; 

  constraint minusOneToPlusOneRange:
    correlation >= -1 and correlation <=1;
}

// Inline constraint (following convention)
valuetype CorrelationCoefficient {
  property value oftype decimal; // convention: should be called value for one-dim value types

  constraint minusOneToPlusOneRange:
    value >= -1 and value <=1;
}

Alternative 2: Constraints always use value

This alternative leads to similar syntax for inline and referenceable constraints.

// Referenceable constraint 
valuetype CorrelationCoefficient {
  property correlation oftype decimal; 
  constraint minusOneToPlusOneRange: CorrelationCoefficient on correlation;
}
constraint CorrelationCoefficient on decimal:
  value >= -1 and value <= 1;

// Inline constraint (not following convention)
valuetype CorrelationCoefficient {
  property correlation oftype decimal; 

  constraint minusOneToPlusOneRange on correlation:
    value >= -1 and value <=1;
}

The question remaining for this alternative is how to model multi-dimensional constraints, like @rhazn already mentioned... I tried a few times, but didn't come to a good solution.


```jayvee
valuetype Money {
attributes: [
amount oftype decimal;
currency oftype Currency;
];
valuetype CorrelationCoefficient {
property correlation oftype decimal;

constraint minusOneToPlusOneRange:
correlation >= -1 and correlation <=1;
}
```
which we'll need anyway. Conceivably, the step to multi-attribute value types could be merged with this one, but I simply wanted to try the RFC process rather than keep sending email ;-)

<!--
TODO: Explain the details of the RFC.
If the RFC contains more than a single cohesive aspect, structure this section accordingly.
Make sure to provide realistic modelling examples on the example data set introduced above.
-->

## Drawbacks

1. Introduces a redundant syntax,
2. Creates extra work/may be too difficult, and
3. May be disruptive to how the language currently works.
1. No backward compatible syntax (breaking change).
2. Removes the inheritance mechanism.

## Alternatives

Conceivably, we could use multiple inheritance for multi-attribute value types... just joking.
1. Keep the current syntax.
2. Use arrays instead of the flat syntax.
georg-schwarz marked this conversation as resolved.
Show resolved Hide resolved

## Possible Future Changes/Enhancements

This proposal is to prepare the way for multi-attribute value types.
1. Multi-attribute value types.

```jayvee
valuetype Coordinate2D {
property x oftype decimal;
property y oftype decimal;
}
```
Loading