Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement write_serializable_content on element writer #508

Closed
wants to merge 4 commits into from

Conversation

calops
Copy link
Contributor

@calops calops commented Nov 13, 2022

Resolves #499

This introduces a new write_serializable_content method in order to serialize arbitrary types through serde within manual manipulation of an XML writer.

I feel it's a bit rough, especially how indentation is handled, but I'm not sure how to improve this without a bigger overhaul of some of the deeper stuff.

@codecov-commenter
Copy link

Codecov Report

Merging #508 (12fee25) into master (9f8ec44) will increase coverage by 0.17%.
The diff coverage is 97.18%.

@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
+ Coverage   61.33%   61.51%   +0.17%     
==========================================
  Files          33       33              
  Lines       15393    15456      +63     
==========================================
+ Hits         9441     9507      +66     
+ Misses       5952     5949       -3     
Flag Coverage Δ
unittests 61.51% <97.18%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/errors.rs 11.00% <0.00%> (-0.11%) ⬇️
src/writer.rs 92.59% <98.24%> (+3.17%) ⬆️
src/se/content.rs 97.20% <100.00%> (ø)
src/se/element.rs 97.70% <100.00%> (ø)
src/se/mod.rs 88.54% <100.00%> (+0.49%) ⬆️
src/reader/buffered_reader.rs 85.37% <0.00%> (-0.48%) ⬇️
src/de/map.rs 70.95% <0.00%> (-0.42%) ⬇️
src/lib.rs 13.17% <0.00%> (-0.08%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Mingun Mingun added enhancement serde Issues related to mapping from Rust types to XML labels Nov 13, 2022
Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, but I'm not confident of how the result of serde serialization is merged into Writer. It would be much more useful if the type name won't be produce an inner element. I also suggest to move test to the doctest comment and include to it an attribute in the struct definition that will be merged with manually added attributes.

Also pay attention on failed checks.

@@ -330,6 +333,48 @@ impl<'a, W: Write> ElementWriter<'a, W> {
.write_event(Event::End(self.start_tag.to_end()))?;
Ok(self.writer)
}

/// Serialize an arbitrary value inside the current element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be valuable to add a test as doc example here. Pay particular attention to which tag name the type will be serialized with

Comment on lines +695 to +704
r#"<paired attr1="value1" attr2="value2">
<Foo>
<bar>
<baz>42</baz>
<bat>43</bat>
</bar>
<val>foo</val>
</Foo>
</paired>"#
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect

<paired attr1="value1" attr2="value2">
    <bar>
        <baz>42</baz>
        <bat>43</bat>
    </bar>
    <val>foo</val>
</paired>

here. This also means that you need to test that the attribute lists will be merged correctly. It also would be valuable to move the entire test to doctests for write_serializable_content

Copy link
Collaborator

@dralley dralley Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun I would expect it to produce the same XML it would if you were to serialize it in a different context, but I don't think that would be the case if it produced the XML you suggest?

If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I asked myself this question and I can see both use cases being valid. Maybe we should leave this as an option for that function? Although I'm not sure how to tell the serializer to not produce the <foo> tags if we don't want them.

Copy link
Collaborator

@dralley dralley Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it works as it does in your test case and in my description, it just feels extremely niche. The ElementWriter construct only allows you to write one inner value wrapped in a pair of tags, and I can't think of many XML structures where you'd want 2 levels of nesting like that for only one possible value inner value. The construct is basically just too limiting to make it broadly useful.

Wheras with the other approach of only supporting it on the writer, you can still accomplish the same thing, and it would be more flexible - if slightly more verbose for that one strange case.

        writer
            .create_element("paired")
            .with_attribute(("attr1", "value1"))
            .with_attribute(("attr2", "value2"))
            .write_inner_content(|writer| {
                writer.write_serializable(&content1).unwrap();
                writer.write_serializable(&content2).unwrap();
                writer.write_serializable(&content3).unwrap();
            })?;

If it were to work the way @Mingun describes then none of this necessarily applies. I'm just not sure I understand why the outer tags wouldn't be included when on the deserialization side you would expect some struct fields to potentially be sourced from attributes. Shouldn't it work the same in the opposite direction?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dralley, I proceed from the following considerations:

  1. We expect that in <foo>...</foo> foo could be mapped to a field of a struct. This is a basic expectation.
  2. We also expect to get the inner content of <foo>...</foo>, so we could in the result a type which content is mapped to the ... in our XML. This is a second basic expectation
  3. Ok, so we expect that
    <...>
      <!-- a far-far inside some xml... -->
      <foo>...</foo>
    </...>
    is mapped to
    struct {
      foo: ???,
    }
  4. To understand what is ??? let's expand our example:
    ...
      <foo>
        <bar>...</bar>
      </foo>
    ...
    To access bar content we can use path foo.bar. That can be naturally expressed with nested structs:
    struct {
      foo: FooType,
    }
    struct FooType {
      bar: ???,
    }
  5. Because we want to build a mapping, which can be applied to XML with any nesting depth, obviously, bar should have a BarType similar to FooType. Our rules should be recursive.
  6. Now take a closer look at BarType and how it is presented in XML:
    <bar>...</bar>
    struct BarType {
      /*...*/,
    }
    We may notice that struct BarType is a <bar> element. Remember, that this XML can be described with following XSD:
    <xs:schema>
      <xs:complexType name="BarType">
        <xs:sequence>
           <!--...-->
        </xs:sequence>
      </xs:complexType>
    
      <xs:element name="bar" type="BarType"/>
    </xs:schema>
    Quite obvious, that rust's BarType can be mapped to XSD's BarType.
  7. The example above gives an interesting question -- where is the bar? XML responds by saying that the bar is part of the document. For example, XmlBeans implements this concept by generation a class BarDocument, which is different from class BarType that represents type of <bar> element and contains actual data.
  8. serde is very bond to json format, and json has no dedicated conception of a document -- each piece of JSON is JSON. XML is opposite -- not all pieces of XML is a document (but each document is XML piece). XmlBeans names that pieces is a Fragment. So JSON in terms of XML is a fragment.
  9. Returning to Rust, we have to admit that the Rust types will model fragments. We could introduce a special methods / types to work with documents (now we can say, that this is an object, that holds the root element name and a technical XML stuff -- namespace definitions, DTD and so on), but they will outside serde API.

I think, now it is obvious why I think, that serialization of a Rust type should not generate an element name -- because that name is not part of an xml-fragment (a type). There is, however, one significant semi-exception to this rule. enum at the top level is convenient to represent as a document. But note that even in that case the (root) element name mapped to the variant name, not the enum's type name.

I'll plan to write a doc about that in #369.


Although I must say that there are several subtle moments that may not be quite obvious. For example, serialization of sequences and enums. It is best to create tests and imagine what result we expect in each case? Will the selected mapping option be consistent?


If you were to just serialize a Foo, then I would expect <foo>...inner_contents...</foo>

This depends on whether Foo is represent the root element type or not. In the latter case, you most likely expect that the name of the element will have a name of struct field of type Foo as I shown in the beginning of that comment.

The Deserializer and Serializer are worked as I described. For convenience, the Foo type can simultaneously represent both the data type and the document type -- the root tag can be derived from the type name, this is the only place in the serializer where the type name is used. The deserializer never uses a type name.

Copy link
Collaborator

@dralley dralley Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun So let's that Foo had some fields mapped to attributes, how does that work? If the Foo element is erased and only its fields are serialized, any fields mapped to attributes will have nowhere to be serialized to.

I can appreciate that there are limitations with the serde model which may limit our ability to achieve certain outcomes, but I would rather not do something than to do it in a way that is incorrect or illogical.

Copy link
Collaborator

@Mingun Mingun Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[derive(Serialize)]
struct Foo {
    #[serde(rename = "@attribute")]
    attribute: &'static str,

    element: Bar,
    list: Vec<&'static str>,

    #[serde(rename = "$text")]
    text: &'static str,
}
let content = Foo {
    attribute: "attribute",
    element: Bar { baz: 42, bat: 43 },
    list: vec!["first element", "second element"],
    text: "text",
};

writer
    .create_element("paired")
    .with_attribute(("attr1", "value1"))
    .with_attribute(("attr2", "value2"))
    .write_serializable_content(content)
    .expect("failure");

I would expect (indentation depends on settings):

<paired attr1="value1" attr2="value2" attribute="attribute">
  <element>
    <baz>42</baz>
    <bat>43</bat>
  </element>
  <list>first element</list>
  <list>second element</list>
  text
</paired>

src/writer.rs Outdated

/// Serialize an arbitrary value inside the current element
#[cfg(feature = "serialize")]
pub fn write_serializable_content<T: Serialize>(self, content: T) -> Result<&'a mut Writer<W>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the usage example in tests I think that the name with_content could be better, because:

writer.create_element("foo")
    .with_attribute(("a", "b"))
    .with_content(&bar)

It is also better to take a reference instead of value, because Serialize is not automatically implemented for the references.

Summing up all the suggestions:

Suggested change
pub fn write_serializable_content<T: Serialize>(self, content: T) -> Result<&'a mut Writer<W>> {
pub fn with_content<T: Serialize>(self, content: &T) -> DeResult<&'a mut Writer<W>> {

where

// in crate::error::serialize
type Result<T> = std::result::Result<T, DeError>;
use crate::error::serialize::Result as DeResult;

Copy link
Collaborator

@dralley dralley Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun I'm not sure which usage you are referring to, but the scheme you suggest is inconsistent with all of the existing methods such as write_text_content(), write_empty(), etc.

https://github.com/tafia/quick-xml/blob/master/src/writer.rs#L285-L332

They're named write_ because they perform the write, they don't push into a separate buffer. with_ implies that you can chain another method onto the end, which shouldn't be the case. Attributes are different because there's already a separate buffer for storing them which you can append to.

Likewise using DeResult would be inconsistent with the other methods.

Copy link
Collaborator

@dralley dralley Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree slightly with the overall approach here, we should support this directly on the writer as opposed to on ElementWriter. If you want to use it in conjunction with ElementWriter it would be easy to do with write_inner_content()

https://github.com/tafia/quick-xml/blob/master/src/writer.rs#L285-L332

This specific use case just feels too niche to support as an independent thing as - if it worked like the other ElementWriter methods - it would be writing a single object inside a custom outer pair of tags.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

        writer
            .create_element("paired")
            .with_attribute(("attr1", "value1"))
            .with_attribute(("attr2", "value2"))
            .write_inner_content(|writer| {
                writer.write_serializable(&content)
            })?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mingun I'm not sure which usage you are referring to, but the scheme you suggest is inconsistent with all of the existing methods such as write_text_content(), write_empty(), etc.

I mean, that we have with_attribute, so another with_* looks familiar. But you're right -- with_* methods does not end the chain. Ok, let it be write_serializable_content.

Likewise using DeResult would be inconsistent with the other methods.

That looks not so inconsistent, if we took into account that this is another serialization level which have its own error type. I do not like hiding real errors in library code -- the caller should decide how to handle them. We cannot include DeError into Error because DeError already includes Error. So the logical step is just return DeError here.

I think I disagree slightly with the overall approach here, we should support this directly on the writer as opposed to on ElementWriter.

I would see this as an auxiliary method for a frequent use case. We can have methods both on the Writer and ElementWriter.

@Mingun
Copy link
Collaborator

Mingun commented Nov 13, 2022

Also add an entry to changelog

@dralley
Copy link
Collaborator

dralley commented Jan 4, 2023

@calops Is this ready for re-review?

@calops
Copy link
Contributor Author

calops commented Jan 4, 2023

My apologies, I didn't have any time lately to finish this. There isn't much to do to address all the feedback here though, it should be ready in the next few days.

@dralley
Copy link
Collaborator

dralley commented Jun 2, 2023

@calops Do you have a moment to finish in the next few days or can I do so?

@Mingun Mingun mentioned this pull request Jun 2, 2023
9 tasks
@dralley dralley closed this Jun 4, 2023
@dralley
Copy link
Collaborator

dralley commented Jun 4, 2023

I'm refiling it here: #609

Will try to address the feedback before un-drafting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The se::to_writer function now requires std::fmt::Write instead of std::io::Write
4 participants