beacon-biosignals · jrevels · Oct 26, 2022 · Oct 18, 2022 · Oct 20, 2022 · Oct 20, 2022
diff --git a/Project.toml b/Project.toml
@@ -6,6 +6,7 @@ version = "0.5.0"
 [deps]
 Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
 Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
+UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
 
 [compat]
 Arrow = "2"

diff --git a/docs/src/arrow-concepts.md b/docs/src/arrow-concepts.md
@@ -12,8 +12,8 @@ Legolas defines a special field `legolas_schema_qualified` that Legolas-aware Ar
 
 Arrow tables which include this field are considered to "support Legolas schema discovery" and are referred to as "Legolas-discoverable", since Legolas consumers may employ this field to automatically match the table against available application-layer Legolas schema definitions.
 
-If present, the `legolas_schema_qualified` field's value must be a [fully qualified schema identifier](@ref schema_identifier_specification).
+If present, the `legolas_schema_qualified` field's value must be a [fully qualified schema version identifier](@ref schema_version_identifier_specification).
 
 ## Arrow File Naming Conventions
 
-When writing a Legolas-discoverable Arrow table to a file, prefer using the file extension `*.<unqualified schema name>.arrow`. For example, if the file's table's Legolas schema is `baz.supercar@1>bar.automobile@1`, use the file extension `*.baz.supercar.arrow`.
+When writing a Legolas-discoverable Arrow table to a file, prefer using the file extension `*.<schema name>.arrow`. For example, if the file's table's full Legolas schema version identifier is `baz.supercar@1>bar.automobile@1`, use the file extension `*.baz.supercar.arrow`.
diff --git a/docs/src/faq.md b/docs/src/faq.md
@@ -8,6 +8,10 @@ The package originated from code developed internally at Beacon to wrangling het
 
 ## Why does Legolas.jl support Arrow as a (de)serialization target, but not, say, JSON?
 
-Technically, Legolas.jl's core `row`/`Schema` functionality is totally agnostic to (de)serialization and could be useful for anybody who wants to wrangle Tables.jl-compliant values.
+Technically, Legolas.jl's core `@schema`/`@version` functionality is agnostic to (de)serialization and could be useful for anybody who wants to wrangle Tables.jl-compliant values.
 
 Otherwise, with regards to (de)serialization-specific functionality, Beacon has put effort into ensuring Legolas.jl works well with [Arrow.jl](https://github.com/JuliaData/Arrow.jl) "by default" simply because we're heavy users of the Arrow format. There's nothing stopping users from composing the package with [JSON3.jl](https://github.com/quinnj/JSON3.jl) or other packages.
+
+## Why are Legolas.jl's generated record types defined the way that they are? For example, why is the version number hardcoded
+
+Many of Legolas' current choices on this front stem from refactoring efforts undertaken as part of [this pull request](https://github.com/beacon-biosignals/Legolas.jl/pull/54), and directly resulted from a [design mini-investigation](https://gist.github.com/jrevels/fdfe939109bee23566d425440b7c759e) associated with those efforts.
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -11,24 +11,27 @@ CurrentModule = Legolas
 ## Legolas `Schema`s
 
 ```@docs
-Legolas.Schema
+Legolas.SchemaVersion
 Legolas.@schema
+Legolas.@version
 Legolas.is_valid_schema_name
-Legolas.parse_schema_identifier
-Legolas.schema_name
-Legolas.schema_version
-Legolas.schema_identifier
-Legolas.schema_parent
-Legolas.schema_fields
-Legolas.schema_declaration
-Legolas.schema_declared
-Legolas.row
+Legolas.parse_identifier
+Legolas.name
+Legolas.version
+Legolas.identifier
+Legolas.parent
+Legolas.required_fields
+Legolas.declaration
+Legolas.declared
+Legolas.find_violation
+Legolas.complies_with
+Legolas.validate
 ```
 
 ## Validating/Writing/Reading Legolas Tables
 
 ```@docs
-Legolas.extract_legolas_schema
+Legolas.extract_schema_version
 Legolas.write
 Legolas.read
 ```
@@ -38,7 +41,6 @@ Legolas.read
 ```@docs
 Legolas.lift
 Legolas.construct
-Legolas.guess_schema
 Legolas.assign_to_table_metadata!
 Legolas.gather
 Legolas.locations

diff --git a/docs/src/schema-concepts.md b/docs/src/schema-concepts.md
@@ -4,59 +4,49 @@
 
     If you're a newcomer to Legolas.jl, please familiarize yourself with the [tour](https://github.com/beacon-biosignals/Legolas.jl/blob/main/examples/tour.jl) before diving into this documentation.
 
-## [Schema Identifiers](@id schema_identifier_specification)
+## [Schema Version Identifiers](@id schema_version_identifier_specification)
 
-Legolas defines "schema identifiers" as strings of the form:
+Legolas defines "schema version identifiers" as strings of the form:
 
 - `name@version` where:
     - `name` is a lowercase alphanumeric string and may include the special characters `.` and `-`.
     - `version` is a non-negative integer.
-- or, `x>y` where `x` and `y` are valid schema identifiers and `>` denotes "extends from".
+- or, `x>y` where `x` and `y` are valid schema version identifiers and `>` denotes "extends from".
 
-A schema identifier is said to be *fully qualified* if it includes the identifiers of all known ancestors of the particular schema that it directly identifies.
+A schema version identifier is said to be *fully qualified* if it includes the identifiers of all known ancestors of the particular schema version that it directly identifies.
 
-Schema authors should follow the below conventions when choosing the `name` part of a new schema's identifier:
+Schema authors should follow the below conventions when choosing the name of a new schema:
 
 1. Include a namespace. For example, assuming the schema is defined in a package Foo.jl, `foo.automobile` is good, `automobile` is bad.
 2. Prefer singular over plural. For example, `foo.automobile` is good, `foo.automobiles` is bad.
 3. Don't "overqualify" the schema name with ancestor-derived information. For example, `bar.automobile@1>foo.automobile@1` is good, `baz.supercar@1>bar.automobile@1` is good, `bar.foo.automobile@1>foo.automobile@1` is bad, `baz.automobile.supercar@1>bar.automobile@1` is bad.
 
 ## Schema Versioning: You Break It, You Bump It
 
-While it is fairly established practice to [semantically version source code](https://semver.org/), the world of data/artifact versioning is a bit more varied. As presented in the tour, each `Legolas.Schema` has a single version integer. The central rule that governs Legolas' schema versioning approach is:
+While it is fairly established practice to [semantically version source code](https://semver.org/), the world of data/artifact versioning is a bit more varied. As presented in the tour, each `Legolas.SchemaVersion` carries a single version integer. The central rule that governs Legolas' schema versioning approach is:
 
-**If an update is made to a schema that potentially requires existing data to be rewritten in order to comply with the updated schema, then the version integer associated with that schema should be incremented.**
+**Do not introduce a change to an existing schema version that might cause existing compliant data to become non-compliant; instead, incorporate the intended change in a new schema version whose version number is one greater than the previous version number.**
 
-In other words: you break it, you bump it!
+For example, a schema author must introduce a new schema version for any of the following changes:
 
-For example, a schema author must increment their existing schema's version integer if any of the following changes are made:
-
-- A new non-`>:Missing` required field is added to the schema.
+- A new type-restricted required field is added to the schema.
 - An existing required field's type restriction is tightened.
 - An existing required field is renamed.
 
-One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@schema("my-schema@1", ...)` and `@schema("my-schema@2", ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments.
+One benefit of Legolas' approach is that multiple schema versions may be defined in the same codebase, e.g. there's nothing that prevents `@version("my-schema@1", ...)` and `@version("my-schema@2", ...)` from being defined and utilized simultaneously. The source code that defines any given Legolas schema version and/or consumes/produces Legolas tables is presumably already semantically versioned, such that consumer/producer packages can determine their compatibility with each other in the usual manner via interpreting major/minor/patch increments.
 
-## Important Expectations Regarding Custom Field Assignments
+Note that it is preferable to avoid introducing new versions of an existing schema, if possible, in order to minimize code/data churn for downstream producers/consumers. Thus, authors should prefer conservative field type restrictions from the get-go. Remember: loosening a field type restriction is not a breaking change, but tightening one is.
 
-Schema authors should ensure that their schema declarations meet two important expectations so that Legolas' `row` function behaves as intended and inter-schema composability is preserved.
+## Important Expectations Regarding Custom Field Assignments
 
-First, a schema's custom field assignments should preserve the [idempotency](https://en.wikipedia.org/wiki/Idempotence) of `row` invocations, such that the following holds for all valid values of `fields`:
+Schema authors should ensure that their `@version` declarations meet two important expectations so that generated record types behaves as intended:
 
-```jl
-row(schema, row(schema, fields)) == row(schema, fields)
-```
+1. Custom field assignments should preserve the [idempotency](https://en.wikipedia.org/wiki/Idempotence) of record type constructors.
+2. Custom field assignments should not observe mutable non-local state.
 
-Second, a schema's custom field assignments should not observe mutable non-local state, such that the following holds for all valid values of `fields`:
+Thus, given a Legolas-generated record type `R`, the following should hold for all valid values of `fields`:
 
 ```jl
-row(schema, fields) == row(schema, fields)
+R(R(fields)) == R(fields)
+R(fields) == R(fields)
 ```
-
-## How to Avoid Breaking Schema Changes
-
-It is preferable to avoid incrementing a schema's version integer ("making a breaking change") whenever possible to avoid code/data churn for consumers. Following the below guidelines should help make breaking changes less likely:
-
-1. Allow required fields to be `Missing` whenever reasonable.
-2. Prefer conservative field type restrictions from the get-go, to avoid needing to tighten them later.
-3. Handle/enforce "potential deprecation paths" in a required field's RHS definition when possible. For example, imagine a schema that contains a required field `id::Union{UUID,String} = id` where `id` is either a `UUID`, or a `String` that may be parsed as a `UUID`. Now, let's imagine we decided we wanted to update the schema such that new tables ALWAYS normalize `id` to a proper `UUID`. In this case, it is preferable to simply update this required field to `id::Union{UUID,String} = UUID(id)` instead of `id::UUID = id`. The latter is a breaking change that requires incrementing the schema's version integer, while the former achieves the same practical result without breaking consumers of old data.