Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for serializing namespaces in serde addressing #218 #466

Closed
wants to merge 6 commits into from

Conversation

JOSEPHGILBY
Copy link

@JOSEPHGILBY JOSEPHGILBY commented Aug 26, 2022

I'm interested in forwarding the design / implementing a solution to the problem of serializing namespaces in Serde.

Proposal from issue #218

The current proposal is to implement a 'namespace' attribute macro where the first argument is the prefix and the second argument is the namespace. Additionally, the proposal indicates that any child field will automatically take the defined prefix unless the field is shadowed by an additional namespace macro + serde flatten helper attribute.

Input:

#[derive(Serialize, Deserialize)]
#[namespace(F,foourn)]
struct Foo {
  id: String,
  #[serde(flatten)]
  #[namespace(B,barurn)]
  bar: Bar
}

#[derive(Serialize, Deserialize)]
struct Bar {
  name: String,
  desc: String
}

output:

<F:foo xmlns:B="foourn" xmlns:F="barurn">
      <F:id>123</F:id>
      <B:name>asdf</B:name>
      <B:desc>foobar </B:desc>
</F:foo>

Potential downside

One downside I see with #218 is that the second xmlns statement will have to be 'hoisted' up into the foo tag. Additionally, multiple namespaces will always have to be used with the serde-flatten macro. The downside of using the flatten-macro is that quick-xml doesn't allow attributes to be directly defined (as currently implemented) in the parent container. As seen in the example above, the "id" field gets serialized into <id>123</id> and not id="123". Thus, it doesn't look like it would be possible for id to be an xml attribute, which could be limiting for serializing xml documents.

Alternative idea

Despite the proposal above, I want to explore a little bit on some other design ideas and get feedback on it. Playing around a little bit with attribute macros, I was able to get the following to process using syn::parse2. I haven't gotten it to fully serialize into xml quite yet, but I can indicate the expected xml.

The proposal

Create a new derive macro called QuickXml with two helper attributes xmlns and xmlpre. What Serde does is that the Serialize and Deserialize macros are derive macros, but they list serde as a helper attribute. Thus, anytime you see #[serde(...)], it isn't actually a separate macro which expands. Instead, the Serialize and Deserialize macros will be fed the location and parameters to the #[serde(...)] statements and use that for their expansion.

In a similar vein, we could have QuickXml be the derive macro which runs, but is fed xmlns and xmlpre helper attributes to expand-out namespace and prefix information for the struct. In contrast to the above proposal, there won't be a need for 'hoisting up' additional namespaces into the containing tag. Additionally, without being forced to use serde(flatten), it's possible to both have multiple namespaces for a single tag, and, define attributes and elements at the same time for the tag. This would give a good amount of flexibility to the programmer for representing xml files as serde structs.

Another point is that xmlns and xlmpre would be one-to-one with the xml document in contrast to the above proposal. That is, as the programmer is reading through an example xml output which they would like to represent in rust using quick-xml and serde, everytime they see an 'xmlns:prefix=" and "prefix:attr-or-element-name" statement in the xml document, they would put an #[xmlns:prefix="..."] and #[xmlpre:prefix]attr_or_element_name: type into the rust serde struct. The downside is of course typing 'xmlpre' for each field in the struct and struct name which has a prefix on it. However, one-to-oneness makes it easier to visually verify the correctness between an example xml document and the rust serde struct the programmer is trying to make.

Finally, the format xmlns:prefix="namespace" or xmlns="default-namespace" used inside the attribute helper might be somewhat non-standard. However, this format allows the programmer to directly copy-and-paste xmlns statements from an example xml document, which provides a usability advantage.

Example

Input:

#[derive(Serialize, Deserialize, QuickXml)]
#[xmlns:F="foourn"]
#[xmlns:B="barurn"]
#[xmlns="http://this-is-a-default-namespace"]
#[xmlpre:B]
#[serde(rename = foo)]
struct Foo {

    // attributes

    #[xmlpre:F]
    id: String,

    mar: String

    // elements

    #[xmlpre:F]
    #[serde(rename = "$unflatten=element")]
    element: String,

    #[xmlpre:B]
    bar: Bar,
}

#[derive(Serialize, Deserialize, QuickXml)]
#[xmlns="http://this-is-another-default-namespace-shadowing-the-previous"]
#[serde(rename = "bar")]
struct Bar {

    // attributes

    #[xmlpre:F]
    name: String,

    #[xmlpre:F]
    desc: String,

    #[serde(rename = "default-namespace-field"]
    default_namespace_field: String

    // elements

    #[serde(rename = "$value")]
    pub body: String,
}

Intended output:

<B:foo xmlns:F="foourn" xmlns:B="barurn" xmlns="http://this-is-a-default-namespace" F:id="123" mar="asdf">
    <element>this is an element</element>
    <B:bar xmlns="http://this-is-another-default-namespace-shadowing-the-previous" F:name="a name" F:desc="asdf" default-namespace-field="another filled field">this is a value</B:bar>
</B:foo>

Remaining questions

  1. One possible issue from this proposal is multiply-defined prefixes for fields. That is, if a parent struct Parent has a field #[xmlpre:A]child: Child, but, Child is defined by #[xmlpre:B] struct Child, what should the outcome be?

@JOSEPHGILBY JOSEPHGILBY changed the title placeholder commit to trigger draft PR Propsal for serializing namespaces in serde addressing #218 Aug 26, 2022
@JOSEPHGILBY JOSEPHGILBY changed the title Propsal for serializing namespaces in serde addressing #218 Proposal for serializing namespaces in serde addressing #218 Aug 26, 2022
@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML namespaces Issues related to namespaces support labels Aug 26, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #466 (fe85679) into master (be8138f) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #466   +/-   ##
=======================================
  Coverage   53.28%   53.28%           
=======================================
  Files          29       29           
  Lines       13527    13527           
=======================================
  Hits         7208     7208           
  Misses       6319     6319           
Flag Coverage Δ
unittests 53.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/se/mod.rs 93.81% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Mingun
Copy link
Collaborator

Mingun commented Oct 26, 2022

Actually, there are already exists a library on top of quick-xml, that introduces custom traits & derive macro: xmlserde. After quick look it seems that it does not support namespaces nowadays. So maybe it would be better to contribute to that library.


About overall design. It should not be necessary to define prefixes in the type. Prefix is an entity of a concrete XML instance, not of an XML Schema, which is defined by Rust types. You can have an XML with multiple prefixes mapped to the one namespace and you should be able to deserialize type from any of them (even when part of fields uses one prefix, and the other part is another).

I think, it would be better to leave special trait and macro to third-party library (i.e. the second part of your proposal), and quick-xml itself will provide a serde serializer & deserializer as best as it could.


Your opening example then could be rewritten as:

#[derive(Serialize, Deserialize)]
#[xml(namespace = "foourn")]
struct Foo {
  id: String,   // QName: {foourn}id
  name: String, // QName: {foourn}name
  #[serde(flatten)]
  bar: Bar,     // QName: {foourn}bar
}

#[derive(Serialize, Deserialize)]
#[xml(namespace = "barurn")]
struct Bar {
  name: String, // QName: {barurn}name
  desc: String, // QName: {barurn}desc
}

Note, how different prefixes B1 and B2 mapped to the same namespace forming a valid XML, and how that is also work for a default namespace too:

<F:foo xmlns="foourn" xmlns:F="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
      <F:id>123</F:id>
      <name>asdf</name>
      <B1:name>asdf</B1:name>
      <B2:desc>foobar</B2:desc>
</F:foo>

Here #[xml(namespace = "...")] attribute applied to a type, as in the ordinary XSD, where you define types in some namespace and all their elements are automatically defined in that namespace.

Strictly speaking, root element <foo> not necessary should be defined in the "foourn" namespace, the namespace of a root element is also explicitly defined in XSD by having <xs:element> under <xs:schema>. In quick-xml we believe, that when you call from_str (and other deserializer methods), you work with that namespace that is defined. So this will also be parsed:

<any-ns:any-tag xmlns="foourn" xmlns:F="foourn" xmlns:any-ns="some-other-ns" xmlns:B1="barurn" xmlns:B2="barurn">
      <F:id>123</F:id>
      <name>asdf</name>
      <B1:name>asdf</B1:name>
      <B2:desc>foobar</B2:desc>
</any-ns:any-tag>

Right now deserializer deserializes type, but root element is a part of document (if you take any mature XML library in any language, you will find that conception everywhere).


Using only a vanilla serde, we cannot use #[xml(namespace = "...")] attribute directly. serde at all does not allow us to send any type attributes from type to (de)serializer. It could only tell us a name of a struct, and because we does not use that name in any case, we could encode namespace information in it:

#[derive(Serialize, Deserialize)]
//#[xml(namespace = "barurn")]
#[serde(rename = "{barurn}Bar")] // form a special name with a namespace information
struct Bar {
  name: String, // QName: {barurn}name
  desc: String, // QName: {barurn}desc
}

We can, however, try one trick. We could write a macro, that will take our type definition and rewrite it to serde format. So, having a macro quick_xml::xml, we could transform

// We should put the macro before the derive, so it runs first... I hope
#[xml(namespace = "barurn")] // or in full form: #[quick_xml::xml(namespace = "barurn")]
#[derive(Serialize, Deserialize)]
struct Bar {
  name: String, // QName: {barurn}name
  desc: String, // QName: {barurn}desc
}

to

#[derive(Serialize, Deserialize)]
#[serde(rename = "{barurn}Bar")] // form a special name with a namespace information
struct Bar {
  name: String, // QName: {barurn}name
  desc: String, // QName: {barurn}desc
}

I already had starting investigate this opportunity and it probably conceptually possible. I didn't work in that direction deeply, however, but I plan. One thing, that I can note here -- it is better to not try to write macro by hand. There is an excellent darling library which could process all boilerplate.

@JOSEPHGILBY
Copy link
Author

@Mingun

I'll try to summarize your points before responding to them. Let me know if I misunderstand anything you said.

  1. Somebody already made a custom macro library built-on top of quick-xml: xmlserde. It does not have namespaces so why not contribute there?
  2. quick-xml's deserializer takes an xml-document (an instance) and maps the document's elements to the struct/enum/etc.'s fields based off of the resolved QName for the field. There may be a multitude of valid xml-documents which could satisfy this mapping. As an example
<foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
    <A:id>123</A:id>
    <name>asdf</name>
    <B1:name>asdf</B1:name>
    <B2:desc>foobar</B2:desc>
</foo>

and

<foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
    <id>123</id>
    <A:name>asdf</A:name>
    <B2:name>asdf</B2:name>
    <B1:desc>foobar</B1:desc>
</foo>

and

<foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
    <id>123</id>
    <name>asdf</name>
    <B1:name>asdf</B1:name>
    <B1:desc>foobar</B1:desc>
</foo>

should all be able to map into a struct/enum/etc. where the QName is associated for each field

#[derive(Serialize, Deserialize)]
struct Foo {
    id: String,    // QName: {foorun}id     |-----------------------------------------------------|
    name1: String, // QName: {foourn}name <- name conflict which is resolved through QName in XML |
    name2: String, // QName: {barurn}name <- name conflict in rust can be resolved through        |
                   //                       | serde(rename) or separate structs/enums or          |
                   //                       | whatever else we decide in design                   |
                   //                       |-----------------------------------------------------|
    desc: String   // QName: {barurn}desc
}
  1. A conceptual solution to point 2 would be for the namespace to be associated with the struct/enum/etc. declaration and use serde(flatten) whenever there is a mixture of namespaces. The following example is what you provide:
#[derive(Serialize, Deserialize)]
#[xml(namespace = "foourn")]
struct Foo {
  id: String,   // QName: {foourn}id
  name: String, // QName: {foourn}name
  #[serde(flatten)]
  bar: Bar,     // QName: {foourn}bar
}

#[derive(Serialize, Deserialize)]
#[xml(namespace = "barurn")]
struct Bar {
  name: String, // QName: {barurn}name
  desc: String, // QName: {barurn}desc
}

The main point is that the {foourn} namespace is associated with all fields in Foo, but, {foorun}bar ceases to exist and instead pulls-in {barurn}name and {barurn}desc when #[serde(flatten)] is run.
4. Root elements are a special case for the deserializer. That is, the QName of the element does not have to match the struct/enum/etc.'s name+namespace association. As long as the QNames inside the xml element match the inside of the struct/enum/etc's fields, it will be a valid deserialization. Given the example in point 3, if we consider struct Foo to associate with the root of an xml-document, the following should deserialize:

<this-can-be-any-namespace:this-can-be-any-tag xmlns:this-can-be-any-namespace="some-other-ns" xmlns="foourn" xmlns:F="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
      <F:id>123</F:id>
      <name>asdf</name>
      <B1:name>asdf</B1:name>
      <B2:desc>foobar</B2:desc>
</this-can-be-any-namespace:this-can-be-any-tag>

This is a convention used by most mature xml libraries so we should follow it as well.
5. Sending extra metadata into the serde hooks (fn serialize_struct and fn deserialize_struct) like the expected QName is difficult because the function signatures take in the "name" which is the name after serde(rename) is applied. Sending metadata into the serde hooks would require either the user manually putting the metadata in serde(rename) which is something already done in quick-xml like serde(rename = "@attribute"), or, having a macro execute which could modify / add serde(rename) to these fields. Hopefully, this macro executes before Serialize/Deserialize.

@Mingun
Copy link
Collaborator

Mingun commented Nov 21, 2022

@JOSEPHGILBY, sorry, missed your answer. Yes, you understood me perfectly.

One note about point 3: I realize, that unfortunately, assigning namespace information to a structs (rather that their fields) will not work with #[serde(flatten)], because in that case the struct would be serialized using serialize_map and we loose struct name with namespace info. Also, #[serde(flatten)] in general work poorly with non self-describing formats (i.e. formats that cannot say what type the content has in deserialize_any, and XML cannot). So it is better to avoid it usage at all.

@riverar
Copy link

riverar commented Mar 3, 2023

Just throwing in a random comment, apologies if this is out of place.

This proposal makes use of xml namespace prefixes. I'd like to caution that these namespace prefixes are not enforceable via xml schema, are an xml reader implementation detail, and can be completely random.

In other words, the following will not work:

#[derive(Deserialize)]
#[namespace(prefixA, "https://schemas.example.com/foo/v1")]
struct Foo {
  id: String,
}
<root xmlns:prefixB="https://schemas.example.com/foo/v1">
  <prefixB:foo>
    <id>123</id>
  </prefixB:foo>
</root>

Instead, I propose the namespace prefixes be mapped/handled internally, letting users instead specify a namespace:

#[derive(Deserialize)]
#[namespace("https://schemas.example.com/foo/v1")]
struct Foo {
  id: String,
}

@dralley
Copy link
Collaborator

dralley commented Jul 10, 2023

@JOSEPHGILBY Do you intend to continue this work?

@JOSEPHGILBY
Copy link
Author

Unfortunately, I don't have the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement namespaces Issues related to namespaces support serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants