Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEEDBACK WANTED - API proposal - supporting multiple JSON schema versions, breaking changes to Schema definition #242

Closed
GREsau opened this issue Sep 2, 2023 · 8 comments
Labels
1.0 An issue that must be solved in order to release schemars 1.0 feedback wanted

Comments

@GREsau
Copy link
Owner

GREsau commented Sep 2, 2023

In Project Status and Road to 1.0, one of the requirements for schemars 1.0 was "handling of different JSON schema versions/dialects (and how to handle future JSON schema versions)". Handling future versions of JSON schema in a non-breaking way would have been difficult-to-impossible a few years ago, but should be achievable now that the latest version of JSON Schema is expected to be stable - see https://json-schema.org/blog/posts/future-of-json-schema and https://json-schema.org/blog/posts/the-last-breaking-change.

This API proposal shows how the way a JSON Schema is modelled within schemars may be changed to support any arbitrary version of JSON schema without future breaking changes (although this change itself would be breaking).

The World Today

Currently, schemars has a few types that define a schema which are mostly based on the draft-07 version of JSON Schema - most prominently (simplified for brevity):

pub enum Schema {
    Bool(bool),
    Object(SchemaObject),
}

pub struct RootSchema {
    #[serde(rename = "$schema"]
    pub meta_schema: Option<String>,
    #[serde(flatten)]
    pub schema: SchemaObject,
    #[serde(alias = "$defs"]
    pub definitions: Map<String, Schema>,
}

pub struct SchemaObject {
    #[serde(flatten)]
    pub metadata: Option<Box<Metadata>>,
    #[serde(rename = "type")]
    pub instance_type: Option<SingleOrVec<InstanceType>>,
    pub format: Option<String>,
    #[serde(rename = "enum")]
    pub enum_values: Option<Vec<Value>>,
    #[serde(rename = "const")]
    pub const_value: Option<Value>,
    #[serde(flatten)]
    pub subschemas: Option<Box<SubschemaValidation>>,
    #[serde(flatten)]
    pub number: Option<Box<NumberValidation>>,
    #[serde(flatten)]
    pub string: Option<Box<StringValidation>>,
    #[serde(flatten)]
    pub array: Option<Box<ArrayValidation>>,
    #[serde(flatten)]
    pub object: Option<Box<ObjectValidation>>,
    #[serde(rename = "$ref")]
    pub reference: Option<String>,
    #[serde(flatten)]
    pub extensions: Map<String, Value>,
}

Fun fact: the multiple #[serde(flatten)]'d fields were an optimisation to try to save a little memory for the common case of simple schemas with only one or two properties. This was probably a premature optimisation that needlessly complicated everything!

Strongly-typing the schema likes this makes it easier for both schemars and its consumers to keep schemas in a valid state. However, it also causes some problems, particularly around supporting multiple versions of JSON Schema, e.g.

  • reusable schemas are always serialised under definitions instead of $defs which has been preferred since 2019-09
  • the items keyword is defined as a SingleOrVec<Schema> (i.e. it can be a single schema or an array of schemas), but since 2020-12 it can only be a single schema
  • $schema is only defined in the top-most schema (RootSchema), but can leagally appear in subschemas e.g. when bundling subschemas with different $schemas
  • InstanceType is defined as an exhaustive enum, even though vocabularies may define their own types (like integer, which schemars supports despite it not being defined by JSON Schema)
  • schemars does not define fields for some lesser-used JSON Schema keywords

Some of these problems could be solved by one-time breaking change to schemars that completely drops support for older versions of JSON schema (and swagger/openapi). Alternatively, we could change these structs to match JSON Schema 2020-12, while semi-supporting old versions by using the extensions map to set a value if it does not conform to the 2020-12 types. But even then, supporting future versions of JSON Schema may introduce new problems: non-breaking changes like adding a new keyword may be difficult to support in schemars in a non-breaking fashion, unless pretty much every struct is annotated with #[non_exhaustive], which would make constructing schemas much more difficult - and that still wouldn't be sufficient to support other potential non-breaking changes (e.g. if an existing keyword was updated to allow additional data types).

Proposed Change

As anyone who knows my feelings on JS vs TS can attest, I am a vehement proponent of strict typing - but I think that going forward, schemars should no longer define a Schema type with a list of fields corresponding to JSON Schema keywords. More concretely, I propose that schemars define Schema as simply:

#[repr(transparent)]
pub struct Schema(serde_json::Value);

where the inner Value is either a Value::Object or Value::Bool. Then, properties of the schema (assuming it's an object, not a bool) can be any arbitary JSON-compatible value. The inner Value is not pub, so is not actually part of the public API.

Note that this would be conceptually similar to:

pub enum Schema {
    Bool(bool),
    Object(Map<String, Value>),
}

An advantage of the enum instead of the newtype struct would be that it makes invalid states (e.g. trying to use a number as a schema) unrepresentable in the type system. The main reason I'm proposing a newtype struct instead is to allow converting a &Value/&mut Value to a &Schema/&mut Schema (probably via ref-cast), which would be useful in a number of scenarios including implementing visitors - this is why the struct has #[repr(transparent)]. It should be impossible to construct a Schema from a Value that is neither a bool nor an object (hence the inner value field not being pub), and any functions exposed by schemars that construct a Schema must uphold this invariant - so e.g. Schema would implement TryFrom<Value> rather than From<Value>.

While this would be a fairly major breaking change for any consumers of schemars who construct and/or manipulate schemas, the vast majority of consumers who just #[derive(JsonSchema)], generate a schema for their types and serialise it to JSON would not be affected by this proposed change. And conveniently for me, the vast majority of schemars's tests can be left largely as they are!

Notable traits and functions on a Schema would include:

impl TryFrom<Value> for Schema { ... }
impl TryFrom<&Value> for &Schema { ... }
impl TryFrom<&mut Value> for &mut Schema { ... }

impl From<bool> for Schema { ... }
impl From<Map<String, Value>> for Schema { ... }

impl From<Schema> for Value { ... }
impl From<Schema> for Map<String, Value> { ... }

impl Schema {
    pub fn as_bool(&self) -> Option<bool> { ... }
    pub fn as_object(&self) -> Option<&Map<String, Value>> { ... }
    pub fn as_object_mut(&mut self) -> Option<&mut Map<String, Value>> { ... }
    // alternatively, as_* could return Err(_) for non-matching schemas:
    pub fn as_bool(&self) -> Result<bool, &Map<String, Value>> { ... }
    pub fn as_object(&self) -> Result<&Map<String, Value>, bool> { ... }
    // ...but then what about as_object_mut?

    // converts bool schemas to objects, so infallible
    pub fn ensure_object(&mut self) -> &mut Map<String, Value> { ... }

    pub fn get(&self, key: impl Borrow<str>) -> Option<&Value> { ... }
    pub fn get_mut(&mut self, key: impl Borrow<str>) -> Option<&mut Value> { ... }
    // converts bool schemas to objects
    pub fn set(&mut self, key: String, value: Value) -> Option<Value> { ... }
}

For convenience, schemars could also export a macro similar to serde_json's json!() that constructs a Schema while ensuring it's passed an object or bool:

let schema: Schema = json_schema!({}); // OK
let schema: Schema = json_schema!({ "type": "string" }); // OK
let schema: Schema = json_schema!(true); // OK
let schema: Schema = json_schema!("uh oh!"); // compile-time error

Note that such a macro would probably not validate that all properties are well-defined JSON Schema keywords, e.g. json_schema!({ "foobar": 123 }) would be allowed. Bear in mind an equivalent schema can be already constructed today due to the existing extensions field.

Further possibilities

In lieu of fields, Schema could also have getter/setter/mutator functions to aid processing and manipulating schemas. Then if new keywords are added to JSON Schema, corresponding functions could be added to schemars as a non-breaking change. Defining these would be fairly straightforward for "simple" properties like strings or numbers:

impl Schema {
    pub fn format(&self) => Option<&str> { ... }
    pub fn set_format(&mut self, format: String) { ... }
}

But for more complex properties that may require in-place mutation, this may require functions like xyz_mut(&mut self) -> Option<&mut XYZ> which would require schemars to define new types to wrap the underlying &mut Value. It may also be useful to define an entry-like API instead of (or as well as) the xyz_mut functions. Either way, such methods are not part of this proposal, but could be added later as a non-breaking change. Until/unless that happens, the main way to manipulate schema properties would be with either the get/get_mut/set methods proposed above, or getting a mut reference to the schema's underlying Map.

How different JSON Schema versions would be supported

When implementing JsonSchema on a type, it's currently not clear which version of JSON schema should be produced - schemars currently assumes that the generated schema is compatible with draft 2019-09 (which the current Schema/SchemaObject definition is mostly-compatible with), but this isn't documented anywhere. So I propose these high-level guidelines for determing which version of JSON schema to produce:

  • the implementation of the json_schema() function may check the requested meta_schema (available on the settings of the SchemaGenerator passed in as an argument) to determine which type/version of JSON schema has been requested, and generate the schema according to that version
  • if the implementation doesn't recognise the meta schema URI, or (probably more likely) the implementor doesn't want to deal with the complexity of supporting multiple versions of JSON schema, it should generate a schema that's valid under draft 2020-12. Then, if the originally requested version is older than 2020-12 (and supported by schemars), the SchemaGenerator will transform the schema to the originally requested version using something like the Visitors that exist today. If the requested version is newer than 2020-12 (i.e. a future version) then there should be no work required, assuming that all future versions are indeed backward-compatible

Open questions:

  • is meta_schema sufficient, or should SchemaSettings also/instead have some sort of "compatibility mode", e.g. to support custom meta schemas that are based on a specific version of JSON schema?
@GREsau GREsau added 1.0 An issue that must be solved in order to release schemars 1.0 feedback wanted labels Sep 2, 2023
@GREsau GREsau pinned this issue Sep 2, 2023
@ahl
Copy link
Contributor

ahl commented Dec 28, 2023

I like this idea of (roughly) JsonSchema::json_schema(..) -> serde_json::Value a lot. At first, I thought it was weird and lame, but the more I thought it over, the more it appeals to me. Consider my own use case: we're using schemars for a bunch of OpenAPI related stuff. We have to specify that we're interested in the OpenAPI format and fish around the extensions for nullable. It would be much crisper to just get back a json Value and use that as is.

Your insight that this divorces the output type from the weird differences of various JSON schema revisions is extremely compelling. This even opens the door for a generic "schema" representation that can have a JsonSchema impl (e.g. https://crates.io/crates/schema and there could be a impl<T:schema::Schema> schemars::JsonSchema for T {}).

One could imagine schema descriptions even unrelated to JSON schema... though perhaps that's a bridge too far.

In addition, I think it makes a ton of sense to have the structural use of JSON schema types live in a different crate -- the design goals are distinct and not necessarily well-aligned.

One question: do you definitely want to use serde_json::Value? On one hand, consumers of schemars almost certainly also depend on serde_json. On the other hand, I've often wished there were a distinct Value object so I didn't need to pull in all of serde_json.


This does seem to complicate hand-written JsonSchema implementations in that one might need to pay closer attention to the generation settings. It might be useful to provide some mechanism that's effectively "here's my output in 2019-09" format, could you please transform this according to what the caller is asking for?"


Cool stuff; would be happy to contribute; I've already been planning a 2020-12 JSON Schema crate for an OpenAPI v3.1 compatible version of https://crates.io/crates/openapiv3

@gagbo
Copy link

gagbo commented Feb 23, 2024

Hello,

The changes all make sense, as my main (only) use case for the library is to produce programmatically the #/components/schemas part of an OpenAPI schema from a collection of Rust structs, I’m very happy to see it’s going to provide some extra flexibility in usage.

For the ability to produce extra schema outputs, I feel like that having the option to specify in the json_schema call which variant to target is going to be the simplest thing to do. For all people that don’t know/care about the version, we could have both json_schema and json_schema_with_spec or something.

I’d like the ability to also output the data to yaml format if possible, it would be useful at least for my use case I think. But the more I think about it, the more I think that I might just want a macro that creates an openapiv3::Schema from arbitrary structs, so maybe all of this is irrelevant and I should just look into the code that respects serde attributes here.

I think all the specific getters/setters like format() set_format would fit better if they were added to extension traits like trait JsonSchema202012 which you’d implement on Schema. This way consumers that want to use these would need to bring the trait in scope and it would make completion/method list smaller. Also it would allow external crates to implement their own extension traits if they want to, I think.

@GREsau
Copy link
Owner Author

GREsau commented May 28, 2024

One question: do you definitely want to use serde_json::Value?

If schemars does't use serde_json::Value, then it would have to define its own type that would match the JSON data structure, which would be practically identical to serde_json::Value (plus its various trait implementations). I think that the extra simplicity, reduction in API surface area and ease of maintenance outweights the cost of referncing serde_json for projects that otherwise wouldn't need it.

This does seem to complicate hand-written JsonSchema implementations in that one might need to pay closer attention to the generation settings. It might be useful to provide some mechanism that's effectively "here's my output in 2019-09" format, could you please transform this according to what the caller is asking for?"

That mechanism you suggest is almost exactly the same as in the proposal, except that the proposal standardises on 2020-12 instead of 2019-09!

For the ability to produce extra schema outputs, I feel like that having the option to specify in the json_schema call which variant to target is going to be the simplest thing to do. For all people that don’t know/care about the version, we could have both json_schema and json_schema_with_spec or something.

As above, JsonSchema implementations that don't know/care about the version can just return a 2020-12 schema. Otherwise, they may check the meta_schema for a known version.

I’d like the ability to also output the data to yaml format if possible, it would be useful at least for my use case I think.

I believe that's already possible, both in schemars 0.8 and with the proposed change. As long as Schema implements serde::Serialize, you can serialize it to YAML using serde_yaml.

But the more I think about it, the more I think that I might just want a macro that creates an openapiv3::Schema from arbitrary structs

Another handy thing you would be able to do with this proposal is convert a Schema into any compatible struct using something like serde_json::from_value, e.g.

let s1: schemars::Schema = schema_for!(SomeStruct);
let s2: openapiv3::Schema = serde_json::from_value(s1.into()).unwrap();

@GREsau
Copy link
Owner Author

GREsau commented May 28, 2024

This proposal has now been implemented on the v1 branch, and released to crates.io as 1.0.0-alpha.1 - please try it out and let me know what you think!

@sbarral
Copy link

sbarral commented Jun 18, 2024

Disclaimer: I haven't used schemars yet as I am waiting for version 1.0.0 to make JsonScheme a required trait in the public API of one of our libraries; in other words, my comment is likely to reflect my lack of experience with schemars, but here it goes...

First off, schemars looks like an impressive and incredibly useful library, I am cheering for 1.0.0!

Is my understanding correct that with this change, schemars would effectively become independent of the meta-scheme and might be one day able to output other schemes (e.g. TypeSchema, CDDL, etc.) without change to the API, provided of course that these can be serialized with serde?
If true, then maybe 1.0.0 might be an opportunity to change some API names to more generic versions, such as Schema instead of JsonSchema, possibly keeping the old names as aliases to provide a smooth upgrade path?

@GREsau
Copy link
Owner Author

GREsau commented Aug 12, 2024

Is my understanding correct that with this change, schemars would effectively become independent of the meta-scheme and might be one day able to output other schemes (e.g. TypeSchema, CDDL, etc.) without change to the API, provided of course that these can be serialized with serde?

I have no plans to widen the scope of the project to support any schemas beyond JSON Schema. The only way that would be possible with schemars would be to use schemars to generate a JSON schema, and then convert that JSON schema to the desired schema type.

@GREsau
Copy link
Owner Author

GREsau commented Aug 12, 2024

Closing as this is implemented in 1.0.0-alpha.1 (and beyond)

@GREsau GREsau closed this as completed Aug 12, 2024
@GREsau GREsau unpinned this issue Aug 17, 2024
@wiiznokes
Copy link

wiiznokes commented Oct 13, 2024

Sorry for the late reply + on a closed issue, i've only started to use 0.8 recently.

I'm using schemars to build a tree of valid nodes, that i can use to produce an UI in my app.
The strong typing of schemars has been incredibly helpful, and also the doc pointing to the json-shema doc for every type.

I must be the only one that actually use this part of schemars, but the recent changes will be a disaster for me 😭 (tho, now that i have a working function that map the schema, it should be easier that starting directly from 1.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 An issue that must be solved in order to release schemars 1.0 feedback wanted
Projects
None yet
Development

No branches or pull requests

5 participants