Implement `write_serializable_content` on element writer #508

calops · 2022-11-13T02:42:55Z

Resolves #499

This introduces a new write_serializable_content method in order to serialize arbitrary types through serde within manual manipulation of an XML writer.

I feel it's a bit rough, especially how indentation is handled, but I'm not sure how to improve this without a bigger overhaul of some of the deeper stuff.

codecov-commenter · 2022-11-13T11:24:45Z

Codecov Report

Merging #508 (12fee25) into master (9f8ec44) will increase coverage by 0.17%.
The diff coverage is 97.18%.

@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
+ Coverage   61.33%   61.51%   +0.17%     
==========================================
  Files          33       33              
  Lines       15393    15456      +63     
==========================================
+ Hits         9441     9507      +66     
+ Misses       5952     5949       -3

Flag	Coverage Δ
unittests	`61.51% <97.18%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/errors.rs	`11.00% <0.00%> (-0.11%)`	⬇️
src/writer.rs	`92.59% <98.24%> (+3.17%)`	⬆️
src/se/content.rs	`97.20% <100.00%> (ø)`
src/se/element.rs	`97.70% <100.00%> (ø)`
src/se/mod.rs	`88.54% <100.00%> (+0.49%)`	⬆️
src/reader/buffered_reader.rs	`85.37% <0.00%> (-0.48%)`	⬇️
src/de/map.rs	`70.95% <0.00%> (-0.42%)`	⬇️
src/lib.rs	`13.17% <0.00%> (-0.08%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Mingun

Great work, but I'm not confident of how the result of serde serialization is merged into Writer. It would be much more useful if the type name won't be produce an inner element. I also suggest to move test to the doctest comment and include to it an attribute in the struct definition that will be merged with manually added attributes.

Also pay attention on failed checks.

src/se/mod.rs

Mingun · 2022-11-13T12:40:57Z

src/writer.rs

@@ -330,6 +333,48 @@ impl<'a, W: Write> ElementWriter<'a, W> {
            .write_event(Event::End(self.start_tag.to_end()))?;
        Ok(self.writer)
    }
+
+    /// Serialize an arbitrary value inside the current element


I think, it would be valuable to add a test as doc example here. Pay particular attention to which tag name the type will be serialized with

src/writer.rs

Mingun · 2022-11-13T12:50:53Z

src/writer.rs

+            r#"<paired attr1="value1" attr2="value2">
+    <Foo>
+        <bar>
+            <baz>42</baz>
+            <bat>43</bat>
+        </bar>
+        <val>foo</val>
+    </Foo>
+</paired>"#
+        );


I would expect

<paired attr1="value1" attr2="value2"> <bar> <baz>42</baz> <bat>43</bat> </bar> <val>foo</val> </paired>

here. This also means that you need to test that the attribute lists will be merged correctly. It also would be valuable to move the entire test to doctests for write_serializable_content

@Mingun I would expect it to produce the same XML it would if you were to serialize it in a different context, but I don't think that would be the case if it produced the XML you suggest?

If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

Right, I asked myself this question and I can see both use cases being valid. Maybe we should leave this as an option for that function? Although I'm not sure how to tell the serializer to not produce the <foo> tags if we don't want them.

Assuming it works as it does in your test case and in my description, it just feels extremely niche. The ElementWriter construct only allows you to write one inner value wrapped in a pair of tags, and I can't think of many XML structures where you'd want 2 levels of nesting like that for only one possible value inner value. The construct is basically just too limiting to make it broadly useful.

Wheras with the other approach of only supporting it on the writer, you can still accomplish the same thing, and it would be more flexible - if slightly more verbose for that one strange case.

writer .create_element("paired") .with_attribute(("attr1", "value1")) .with_attribute(("attr2", "value2")) .write_inner_content(|writer| { writer.write_serializable(&content1).unwrap(); writer.write_serializable(&content2).unwrap(); writer.write_serializable(&content3).unwrap(); })?;

If it were to work the way @Mingun describes then none of this necessarily applies. I'm just not sure I understand why the outer tags wouldn't be included when on the deserialization side you would expect some struct fields to potentially be sourced from attributes. Shouldn't it work the same in the opposite direction?

@dralley, I proceed from the following considerations:

We expect that in <foo>...</foo> foo could be mapped to a field of a struct. This is a basic expectation.

We also expect to get the inner content of <foo>...</foo>, so we could in the result a type which content is mapped to the ... in our XML. This is a second basic expectation

Ok, so we expect that
<...>  <foo>...</foo> </...>
is mapped to
struct { foo: ???, }

To understand what is ??? let's expand our example:
... <foo> <bar>...</bar> </foo> ...
To access bar content we can use path foo.bar. That can be naturally expressed with nested structs:
struct { foo: FooType, } struct FooType { bar: ???, }

Because we want to build a mapping, which can be applied to XML with any nesting depth, obviously, bar should have a BarType similar to FooType. Our rules should be recursive.

Now take a closer look at BarType and how it is presented in XML:
<bar>...</bar>

struct BarType { /*...*/, }
We may notice that struct BarType is a <bar> element. Remember, that this XML can be described with following XSD:
<xs:schema> <xs:complexType name="BarType"> <xs:sequence>  </xs:sequence> </xs:complexType> <xs:element name="bar" type="BarType"/> </xs:schema>
Quite obvious, that rust's BarType can be mapped to XSD's BarType.

The example above gives an interesting question -- where is the bar? XML responds by saying that the bar is part of the document. For example, XmlBeans implements this concept by generation a class BarDocument, which is different from class BarType that represents type of <bar> element and contains actual data.

serde is very bond to json format, and json has no dedicated conception of a document -- each piece of JSON is JSON. XML is opposite -- not all pieces of XML is a document (but each document is XML piece). XmlBeans names that pieces is a Fragment. So JSON in terms of XML is a fragment.

Returning to Rust, we have to admit that the Rust types will model fragments. We could introduce a special methods / types to work with documents (now we can say, that this is an object, that holds the root element name and a technical XML stuff -- namespace definitions, DTD and so on), but they will outside serde API.

I think, now it is obvious why I think, that serialization of a Rust type should not generate an element name -- because that name is not part of an xml-fragment (a type). There is, however, one significant semi-exception to this rule. enum at the top level is convenient to represent as a document. But note that even in that case the (root) element name mapped to the variant name, not the enum's type name.

I'll plan to write a doc about that in #369.

Although I must say that there are several subtle moments that may not be quite obvious. For example, serialization of sequences and enums. It is best to create tests and imagine what result we expect in each case? Will the selected mapping option be consistent?

If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

This depends on whether Foo is represent the root element type or not. In the latter case, you most likely expect that the name of the element will have a name of struct field of type Foo as I shown in the beginning of that comment.

The Deserializer and Serializer are worked as I described. For convenience, the Foo type can simultaneously represent both the data type and the document type -- the root tag can be derived from the type name, this is the only place in the serializer where the type name is used. The deserializer never uses a type name.

@Mingun So let's that Foo had some fields mapped to attributes, how does that work? If the Foo element is erased and only its fields are serialized, any fields mapped to attributes will have nowhere to be serialized to.

I can appreciate that there are limitations with the serde model which may limit our ability to achieve certain outcomes, but I would rather not do something than to do it in a way that is incorrect or illogical.

#[derive(Serialize)] struct Foo { #[serde(rename = "@attribute")] attribute: &'static str, element: Bar, list: Vec<&'static str>, #[serde(rename = "$text")] text: &'static str, } let content = Foo { attribute: "attribute", element: Bar { baz: 42, bat: 43 }, list: vec!["first element", "second element"], text: "text", }; writer .create_element("paired") .with_attribute(("attr1", "value1")) .with_attribute(("attr2", "value2")) .write_serializable_content(content) .expect("failure");

I would expect (indentation depends on settings):

<paired attr1="value1" attr2="value2" attribute="attribute"> <element> <baz>42</baz> <bat>43</bat> </element> <list>first element</list> <list>second element</list> text </paired>

Mingun · 2022-11-13T12:54:58Z

src/writer.rs

+
+    /// Serialize an arbitrary value inside the current element
+    #[cfg(feature = "serialize")]
+    pub fn write_serializable_content<T: Serialize>(self, content: T) -> Result<&'a mut Writer<W>> {


According to the usage example in tests I think that the name with_content could be better, because:

writer.create_element("foo") .with_attribute(("a", "b")) .with_content(&bar)

It is also better to take a reference instead of value, because Serialize is not automatically implemented for the references.

Summing up all the suggestions:

Suggested change

pub fn write_serializable_content<T: Serialize>(self, content: T) -> Result<&'a mut Writer<W>> {

pub fn with_content<T: Serialize>(self, content: &T) -> DeResult<&'a mut Writer<W>> {

where

// in crate::error::serialize type Result<T> = std::result::Result<T, DeError>; use crate::error::serialize::Result as DeResult;

@Mingun I'm not sure which usage you are referring to, but the scheme you suggest is inconsistent with all of the existing methods such as write_text_content(), write_empty(), etc.

https://github.com/tafia/quick-xml/blob/master/src/writer.rs#L285-L332

They're named write_ because they perform the write, they don't push into a separate buffer. with_ implies that you can chain another method onto the end, which shouldn't be the case. Attributes are different because there's already a separate buffer for storing them which you can append to.

Likewise using DeResult would be inconsistent with the other methods.

I think I disagree slightly with the overall approach here, we should support this directly on the writer as opposed to on ElementWriter. If you want to use it in conjunction with ElementWriter it would be easy to do with write_inner_content()

https://github.com/tafia/quick-xml/blob/master/src/writer.rs#L285-L332

This specific use case just feels too niche to support as an independent thing as - if it worked like the other ElementWriter methods - it would be writing a single object inside a custom outer pair of tags.

e.g.

writer .create_element("paired") .with_attribute(("attr1", "value1")) .with_attribute(("attr2", "value2")) .write_inner_content(|writer| { writer.write_serializable(&content) })?;

@Mingun I'm not sure which usage you are referring to, but the scheme you suggest is inconsistent with all of the existing methods such as write_text_content(), write_empty(), etc.

I mean, that we have with_attribute, so another with_* looks familiar. But you're right -- with_* methods does not end the chain. Ok, let it be write_serializable_content.

Likewise using DeResult would be inconsistent with the other methods.

That looks not so inconsistent, if we took into account that this is another serialization level which have its own error type. I do not like hiding real errors in library code -- the caller should decide how to handle them. We cannot include DeError into Error because DeError already includes Error. So the logical step is just return DeError here.

I think I disagree slightly with the overall approach here, we should support this directly on the writer as opposed to on ElementWriter.

I would see this as an auxiliary method for a frequent use case. We can have methods both on the Writer and ElementWriter.

Mingun · 2022-11-13T18:42:32Z

Also add an entry to changelog

Co-authored-by: Mingun <[email protected]>

dralley · 2023-01-04T15:49:13Z

@calops Is this ready for re-review?

calops · 2023-01-04T16:18:09Z

My apologies, I didn't have any time lately to finish this. There isn't much to do to address all the feedback here though, it should be ready in the next few days.

dralley · 2023-06-02T14:18:55Z

@calops Do you have a moment to finish in the next few days or can I do so?

dralley · 2023-06-04T14:36:48Z

I'm refiling it here: #609

Will try to address the feedback before un-drafting

implement write_serializable_content on element writer

12fee25

Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Nov 13, 2022

Mingun requested changes Nov 13, 2022

View reviewed changes

Rémi Labeyrie and others added 3 commits November 15, 2022 05:26

change return type of write_serializable_content

8d833d8

Update src/writer.rs

0178623

Co-authored-by: Mingun <[email protected]>

move struct definitions into test function

541aae3

Mingun mentioned this pull request Jun 2, 2023

Release 0.29.0 #607

Closed

9 tasks

dralley closed this Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `write_serializable_content` on element writer #508

Implement `write_serializable_content` on element writer #508

calops commented Nov 13, 2022 •

edited

Loading

codecov-commenter commented Nov 13, 2022

Mingun left a comment

Mingun Nov 13, 2022

Mingun Nov 13, 2022

dralley Nov 15, 2022 •

edited

Loading

calops Nov 15, 2022

dralley Nov 15, 2022 •

edited

Loading

Mingun Nov 15, 2022

dralley Jun 6, 2023 •

edited

Loading

Mingun Jun 7, 2023 •

edited

Loading

Mingun Nov 13, 2022

dralley Nov 14, 2022 •

edited

Loading

dralley Nov 14, 2022 •

edited

Loading

dralley Nov 14, 2022

Mingun Nov 14, 2022

Mingun commented Nov 13, 2022

dralley commented Jan 4, 2023

calops commented Jan 4, 2023

dralley commented Jun 2, 2023

dralley commented Jun 4, 2023

	pub fn write_serializable_content<T: Serialize>(self, content: T) -> Result<&'a mut Writer<W>> {
	pub fn with_content<T: Serialize>(self, content: &T) -> DeResult<&'a mut Writer<W>> {

Implement write_serializable_content on element writer #508

Implement write_serializable_content on element writer #508

Conversation

calops commented Nov 13, 2022 • edited Loading

codecov-commenter commented Nov 13, 2022

Codecov Report

Mingun left a comment

Choose a reason for hiding this comment

Mingun Nov 13, 2022

Choose a reason for hiding this comment

Mingun Nov 13, 2022

Choose a reason for hiding this comment

dralley Nov 15, 2022 • edited Loading

Choose a reason for hiding this comment

calops Nov 15, 2022

Choose a reason for hiding this comment

dralley Nov 15, 2022 • edited Loading

Choose a reason for hiding this comment

Mingun Nov 15, 2022

Choose a reason for hiding this comment

dralley Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Mingun Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

Mingun Nov 13, 2022

Choose a reason for hiding this comment

dralley Nov 14, 2022 • edited Loading

Choose a reason for hiding this comment

dralley Nov 14, 2022 • edited Loading

Choose a reason for hiding this comment

dralley Nov 14, 2022

Choose a reason for hiding this comment

Mingun Nov 14, 2022

Choose a reason for hiding this comment

Mingun commented Nov 13, 2022

dralley commented Jan 4, 2023

calops commented Jan 4, 2023

dralley commented Jun 2, 2023

dralley commented Jun 4, 2023

Implement `write_serializable_content` on element writer #508

Implement `write_serializable_content` on element writer #508

calops commented Nov 13, 2022 •

edited

Loading

dralley Nov 15, 2022 •

edited

Loading

dralley Nov 15, 2022 •

edited

Loading

dralley Jun 6, 2023 •

edited

Loading

Mingun Jun 7, 2023 •

edited

Loading

dralley Nov 14, 2022 •

edited

Loading

dralley Nov 14, 2022 •

edited

Loading