Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data shape based minimalistic modeling and validation #215

Open
amiika opened this issue Jan 23, 2025 · 5 comments
Open

Data shape based minimalistic modeling and validation #215

amiika opened this issue Jan 23, 2025 · 5 comments

Comments

@amiika
Copy link

amiika commented Jan 23, 2025

It would be cool to be able to define and validate data using yaml shapes & yamlscript. Sure one can define data using JSON schema structures and use that to validate YAML, but something more minimalistic and data-driven like malli schemas in YAML would be more concise and easier to maintain.

Something like:

foo: 
  bar: boolean # Fields mandatory by default
  baz ?: string # Optional
  qux +: number? # 1..n 
  quux *: integer? # 0..n
  quuux 0..4: pos-int? #
  foo ?: foo # Ref to self

This type of syntax could be parsed to build for example malli schema registry and utilize existing built-in schemas as much as possible:

[:schema
 {:registry {"foo" [:map
                    {:closed true}
                    [:bar :boolean]
                    [:baz {:optional true} :string]
                    [:qux [:+ number?]]
                    [:quux [:* integer?]]
                    [:quuux [:repeat {:min 0, :max 4} pos-int?]]
                    [:foo {:optional true} [:ref "foo"]]]}}
             
 "foo"]
@ingydotnet
Copy link
Member

Hi @amiika !

This is good timing. @gugod and I have been working on a YS based schema language for the past few weeks. We call it SchemaYS. And it is indeed very compact and very powerful.

https://github.com/ingydotnet/schemays-test/ is a demo repo.

Schemas and types are just functions.

https://github.com/ingydotnet/schemays-test/blob/main/classes.yaml#L5

The !:classes tag here is function call which validates the entire document.

https://github.com/ingydotnet/schemays-test/blob/main/classes.yaml#L3

loads a library that defines the classes type (function).

https://github.com/ingydotnet/schemays-test/blob/main/schema/class.ys

is the library (schema) file where classes is defined.

etc.

I'll put this stuff into a proper schemays repo today or tomorrow.

If you are interested I'd love your help bringing this together.

Also I'll take a look at malli. TIL

@amiika
Copy link
Author

amiika commented Jan 24, 2025

Looks great. There's a lot of approaches and different use cases for data modeling and validation. My example had a focus on defining and validating the structure of the object. Where as the class.ys example focuses on validating the value domain.

Ideally minimal schema language would include both:

  • Structure of the objects (Classes and cardinalities on Attributes and References)
  • Constraints on the value domain (minLength/maxLength, Regexp, Logical constraints not/and/or)

Im new to YAMLScript and Clojure, but got inspired by how you "bent" the rules of YAML to define operators for the YAMLScript. Squeezing more semantics into property names while still being valid YAML makes it relatively easy to parse the added semantics without writing totally custom parser.

I'v been using YAML a lot, for example for defining OpenAPI specifications. One thing that one could take from that realm is the idea of multi-file definitions used by for example redocly. Maintaining large models gets much easier with ability to split the model to multiple files that can be versioned and controlled in some versioning system.

For example JSON schema based model splitted to two yaml files:

# Person.yaml
title: Person
type: object
description: A individual human being who may be dead or alive, but not imaginary.
properties:
  fullName:
    title: Full name
    description: The complete name of the Person as one string.
    type: string
  dateOfBirth:
    title: Date of birth
    description: The point in time on which the Person was born.
    type: string
    format: date
  identifier:
    $ref: './Identifier.yaml'
required:
  - fullName
  - dateOfBirth
---
# Identifier.yaml
title: Identifier
type: object
description: A unique set of characters used to identify the legal entity.
properties:
  issuingAuthorityName:
    title: issuingAuthorityName
    description: The name of the public authority responsible for issuing the identifier.
    type: string
    minLength: 2
    example: 'SEBOLREG'
  notation:
    title: notation
    description: A string of characters to uniquely identify a legal entity
    type: string
    pattern: ^[0-9][-0-9]{0,10}$
    example: '552345-123'
required:
  - issuingAuthorityName
  - notation

I'll post another example later how this same data could be represented in more "data shape" or "data-driven "way.

ingydotnet added a commit to ingydotnet/schemays-test that referenced this issue Jan 24, 2025
@ingydotnet
Copy link
Member

Hi again @amiika !

I saw this issue and responded late last night. There's so much to say here and this runs deep. In 2016 era I spent many months working on a similar ambition which I dubbed SchemaType. I've been thinking on this a long time. It was only recently that I realized (like you) that YS can make this easy and powerful. SchemaType was heavily influenced by OpenAPI, with the realization that validation was just the tip of the iceberg for well defined schemas and types.

JSON schema imho offers so little for so much code.

I converted your examples above here: https://github.com/ingydotnet/schemays-test/tree/main/215-a

These are very weak schemas and I left them weak in my conversion.

For example https://github.com/ingydotnet/schemays-test/blob/main/215-a/identifier.yaml#L9-L13
A string of 2 or more chars could be the entire text of War and Peace or the binary string of the ys interpreter CLI.

There's a lot to unwrap here.

I think the best course is to make a schemays repo and start writing tests of what it should do.
I started it here: https://github.com/pkgys/schemays

We should also write a jsonschema-to-schemays converter. That would guide us pretty far. I'm quite sure that json schema can be a full subset of schemays.

I took a look at malli and was excited to see it was written in clojure. That can only be helpful.

I hope you have a lot of questions. Don't hold back!

@ingydotnet
Copy link
Member

Let's move continued discussion over to https://github.com/pkgys/schemays/discussions

I'll leave this issue open for a while to possibly draw more attention to SchemaYS.

@yaml yaml locked and limited conversation to collaborators Jan 24, 2025
@yaml yaml unlocked this conversation Jan 24, 2025
@ingydotnet
Copy link
Member

@amiika Let me know if you can comment on pkgys/schemays#2

I'm new to using github discussions.

I unlocked this issue so you could comment here if you are having problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants