-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for serializing namespaces in serde addressing #218 #466
Conversation
Codecov Report
@@ Coverage Diff @@
## master #466 +/- ##
=======================================
Coverage 53.28% 53.28%
=======================================
Files 29 29
Lines 13527 13527
=======================================
Hits 7208 7208
Misses 6319 6319
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
… so that namespace declarations and prefixes are included in serialization
Actually, there are already exists a library on top of quick-xml, that introduces custom traits & derive macro: xmlserde. After quick look it seems that it does not support namespaces nowadays. So maybe it would be better to contribute to that library. About overall design. It should not be necessary to define prefixes in the type. Prefix is an entity of a concrete XML instance, not of an XML Schema, which is defined by Rust types. You can have an XML with multiple prefixes mapped to the one namespace and you should be able to deserialize type from any of them (even when part of fields uses one prefix, and the other part is another). I think, it would be better to leave special trait and macro to third-party library (i.e. the second part of your proposal), and quick-xml itself will provide a serde serializer & deserializer as best as it could. Your opening example then could be rewritten as: #[derive(Serialize, Deserialize)]
#[xml(namespace = "foourn")]
struct Foo {
id: String, // QName: {foourn}id
name: String, // QName: {foourn}name
#[serde(flatten)]
bar: Bar, // QName: {foourn}bar
}
#[derive(Serialize, Deserialize)]
#[xml(namespace = "barurn")]
struct Bar {
name: String, // QName: {barurn}name
desc: String, // QName: {barurn}desc
} Note, how different prefixes <F:foo xmlns="foourn" xmlns:F="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
<F:id>123</F:id>
<name>asdf</name>
<B1:name>asdf</B1:name>
<B2:desc>foobar</B2:desc>
</F:foo> Here Strictly speaking, root element <any-ns:any-tag xmlns="foourn" xmlns:F="foourn" xmlns:any-ns="some-other-ns" xmlns:B1="barurn" xmlns:B2="barurn">
<F:id>123</F:id>
<name>asdf</name>
<B1:name>asdf</B1:name>
<B2:desc>foobar</B2:desc>
</any-ns:any-tag> Right now deserializer deserializes type, but root element is a part of document (if you take any mature XML library in any language, you will find that conception everywhere). Using only a vanilla serde, we cannot use #[derive(Serialize, Deserialize)]
//#[xml(namespace = "barurn")]
#[serde(rename = "{barurn}Bar")] // form a special name with a namespace information
struct Bar {
name: String, // QName: {barurn}name
desc: String, // QName: {barurn}desc
} We can, however, try one trick. We could write a macro, that will take our type definition and rewrite it to serde format. So, having a macro // We should put the macro before the derive, so it runs first... I hope
#[xml(namespace = "barurn")] // or in full form: #[quick_xml::xml(namespace = "barurn")]
#[derive(Serialize, Deserialize)]
struct Bar {
name: String, // QName: {barurn}name
desc: String, // QName: {barurn}desc
} to #[derive(Serialize, Deserialize)]
#[serde(rename = "{barurn}Bar")] // form a special name with a namespace information
struct Bar {
name: String, // QName: {barurn}name
desc: String, // QName: {barurn}desc
} I already had starting investigate this opportunity and it probably conceptually possible. I didn't work in that direction deeply, however, but I plan. One thing, that I can note here -- it is better to not try to write macro by hand. There is an excellent darling library which could process all boilerplate. |
I'll try to summarize your points before responding to them. Let me know if I misunderstand anything you said.
<foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
<A:id>123</A:id>
<name>asdf</name>
<B1:name>asdf</B1:name>
<B2:desc>foobar</B2:desc>
</foo> and <foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
<id>123</id>
<A:name>asdf</A:name>
<B2:name>asdf</B2:name>
<B1:desc>foobar</B1:desc>
</foo> and <foo xmlns="foourn" xmlns:A="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
<id>123</id>
<name>asdf</name>
<B1:name>asdf</B1:name>
<B1:desc>foobar</B1:desc>
</foo> should all be able to map into a struct/enum/etc. where the QName is associated for each field #[derive(Serialize, Deserialize)]
struct Foo {
id: String, // QName: {foorun}id |-----------------------------------------------------|
name1: String, // QName: {foourn}name <- name conflict which is resolved through QName in XML |
name2: String, // QName: {barurn}name <- name conflict in rust can be resolved through |
// | serde(rename) or separate structs/enums or |
// | whatever else we decide in design |
// |-----------------------------------------------------|
desc: String // QName: {barurn}desc
}
#[derive(Serialize, Deserialize)]
#[xml(namespace = "foourn")]
struct Foo {
id: String, // QName: {foourn}id
name: String, // QName: {foourn}name
#[serde(flatten)]
bar: Bar, // QName: {foourn}bar
}
#[derive(Serialize, Deserialize)]
#[xml(namespace = "barurn")]
struct Bar {
name: String, // QName: {barurn}name
desc: String, // QName: {barurn}desc
} The main point is that the <this-can-be-any-namespace:this-can-be-any-tag xmlns:this-can-be-any-namespace="some-other-ns" xmlns="foourn" xmlns:F="foourn" xmlns:B1="barurn" xmlns:B2="barurn">
<F:id>123</F:id>
<name>asdf</name>
<B1:name>asdf</B1:name>
<B2:desc>foobar</B2:desc>
</this-can-be-any-namespace:this-can-be-any-tag> This is a convention used by most mature xml libraries so we should follow it as well. |
@JOSEPHGILBY, sorry, missed your answer. Yes, you understood me perfectly. One note about point 3: I realize, that unfortunately, assigning namespace information to a structs (rather that their fields) will not work with |
Just throwing in a random comment, apologies if this is out of place. This proposal makes use of xml namespace prefixes. I'd like to caution that these namespace prefixes are not enforceable via xml schema, are an xml reader implementation detail, and can be completely random. In other words, the following will not work: #[derive(Deserialize)]
#[namespace(prefixA, "https://schemas.example.com/foo/v1")]
struct Foo {
id: String,
} <root xmlns:prefixB="https://schemas.example.com/foo/v1">
<prefixB:foo>
<id>123</id>
</prefixB:foo>
</root> Instead, I propose the namespace prefixes be mapped/handled internally, letting users instead specify a namespace: #[derive(Deserialize)]
#[namespace("https://schemas.example.com/foo/v1")]
struct Foo {
id: String,
} |
@JOSEPHGILBY Do you intend to continue this work? |
Unfortunately, I don't have the time. |
I'm interested in forwarding the design / implementing a solution to the problem of serializing namespaces in Serde.
Proposal from issue #218
The current proposal is to implement a 'namespace' attribute macro where the first argument is the prefix and the second argument is the namespace. Additionally, the proposal indicates that any child field will automatically take the defined prefix unless the field is shadowed by an additional namespace macro + serde flatten helper attribute.
Input:
output:
Potential downside
One downside I see with #218 is that the second xmlns statement will have to be 'hoisted' up into the foo tag. Additionally, multiple namespaces will always have to be used with the serde-flatten macro. The downside of using the flatten-macro is that quick-xml doesn't allow attributes to be directly defined (as currently implemented) in the parent container. As seen in the example above, the "id" field gets serialized into
<id>123</id>
and notid="123"
. Thus, it doesn't look like it would be possible for id to be an xml attribute, which could be limiting for serializing xml documents.Alternative idea
Despite the proposal above, I want to explore a little bit on some other design ideas and get feedback on it. Playing around a little bit with attribute macros, I was able to get the following to process using
syn::parse2
. I haven't gotten it to fully serialize into xml quite yet, but I can indicate the expected xml.The proposal
Create a new derive macro called
QuickXml
with two helper attributesxmlns
andxmlpre
. What Serde does is that theSerialize
andDeserialize
macros are derive macros, but they listserde
as a helper attribute. Thus, anytime you see#[serde(...)]
, it isn't actually a separate macro which expands. Instead, theSerialize
andDeserialize
macros will be fed the location and parameters to the#[serde(...)]
statements and use that for their expansion.In a similar vein, we could have
QuickXml
be the derive macro which runs, but is fedxmlns
andxmlpre
helper attributes to expand-out namespace and prefix information for the struct. In contrast to the above proposal, there won't be a need for 'hoisting up' additional namespaces into the containing tag. Additionally, without being forced to useserde(flatten)
, it's possible to both have multiple namespaces for a single tag, and, define attributes and elements at the same time for the tag. This would give a good amount of flexibility to the programmer for representing xml files as serde structs.Another point is that
xmlns
andxlmpre
would be one-to-one with the xml document in contrast to the above proposal. That is, as the programmer is reading through an example xml output which they would like to represent in rust using quick-xml and serde, everytime they see an 'xmlns:prefix=" and "prefix:attr-or-element-name" statement in the xml document, they would put an#[xmlns:prefix="..."]
and#[xmlpre:prefix]attr_or_element_name: type
into the rust serde struct. The downside is of course typing 'xmlpre' for each field in the struct and struct name which has a prefix on it. However, one-to-oneness makes it easier to visually verify the correctness between an example xml document and the rust serde struct the programmer is trying to make.Finally, the format
xmlns:prefix="namespace"
orxmlns="default-namespace"
used inside the attribute helper might be somewhat non-standard. However, this format allows the programmer to directly copy-and-paste xmlns statements from an example xml document, which provides a usability advantage.Example
Input:
Intended output:
Remaining questions
#[xmlpre:A]child: Child
, but, Child is defined by#[xmlpre:B] struct Child
, what should the outcome be?