-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help deserialize mixed tags and string in body $value (html text formatting) #257
Comments
Whoever picks this up, consider starting from #511 |
Has anybody found a workaround for this? I am having the same issue. |
You can close this. Don't know when it was fixed but the original example works now with minor edits: #[derive(Debug, Deserialize, PartialEq)]
struct DivDefinition {
#[serde(rename = "@style")]
style: String,
#[serde(rename = "$value")]
definition: Vec<MyEnum>,
}
#[derive(Debug, Deserialize, PartialEq)]
enum MyEnum {
b(String),
#[serde(rename = "$text")]
String,
i(String),
} |
Thoughts on this idea? enricozb@7b4b3f8 Specifically, I'm adding a new special field name It lets you do stuff like this: const xml: &str = r#"
<who-cares>
<foo property="value">
test
<bar><bii/><int>1</int></bar>
test
<baz/>
</foo>
</who-cares>
"#;
#[derive(Deserialize, Debug)]
struct Root {
#[serde(rename = "$raw")]
value: String,
}
let root = quick_xml::de::from_str::<Root>(&xml).unwrap();
println!("parsed: {root:?}"); This prints
One of the problems with this approach is that it doesn't save exactly what was in the XML file. This would be ideal because we could likely avoid any allocations, like Another issue is that empty tags It's possible my initial idea could be fixed up to disable trimming temporarily of the reader during |
Deserialization of The deserializer then returns data from it's own buffer of directly from input string, depending on what type is deserialized ( |
Got it. I saw that private newtype name, but wasn't sure why it mattered. I see now that the json deserializer looks for this tag. I'll take a stab at this. |
Additionally, I'm not sure if we should capture the surrounding tags or not. What should this print: struct AnyName {
root: RawValue,
}
const xml: &str = "
<root>
<some/><inner/><tags/>
</root>
";
let x: AnyName = from_str(xml)?;
println!("{}", x.value); Should this print
or
|
Hi, I'm trying to track down a way to de-serialize unknown/arbitrary data under a specific tag and found my way here. Is this currently possible in any form? I have something like this: <root>
<someTag> <!-- I am only aware of this tag -->
<arbitraryTag1>
<arbitraryTag2>...stuff...</arbitraryTag2>
<anotherArbitraryTag>foo</anotherArbitraryTag>
</arbitraryTag1>
</someTag>
</root> I simply need everything under |
If |
@Mingun Thanks for the quick reply! I updated my example, it was missing some data. Basically, under |
So in your example you expect
? // type of `someTag` field
struct SomeTagType {
// filled with "arbitraryTag1"
name: String,
// filled with
// - ("arbitraryTag2", "...stuff...")
// - ("anotherArbitraryTag", "foo")
// - ...
fields: HashMap<String, String>,
} ? Both are impossible right now. The first because we cannot capture markup to the |
@Mingun thanks, the 2nd example is what I'm after. Can you think of any workarounds? |
@Mingun Apologies for the "bump", I'm trying to determine where this stands exactly. #778 mentions something works, but I can't find it. Ideally, I'm after the ability to capture arbitrary nested XML, similar to what a I'm not 100% clear if this is the correct ticket, #778, or something else. Thanks again! |
In #383 @alex-semov in the initial post gave a code that looks like what you need. Try experimenting with it. If you don't have to extract the attributes from |
Unfortunately we need to extract/convert arbitrary XML into a JSON representation in our case. Something like: <xml>
<foo><bar>123</bar></foo>
<foobar someattr="thing"/>
<bazfoo anotherattr="stuff">bazzle</bazfoo>
</xml> to {
"foo": {
"bar": 123
},
"foobar": {
"@someattr": "thing"
},
"bazfoo": {
"@anotherattr": "stuff",
"@value": "bazzle"
}
} JSON structure is just an example, we just need a way to do it in some way. |
The best way is to represent some kind of |
I already notes in some related issues, that I have very WIP dom branch in my repository. Feel free to finish it, I do not think that I will work on it in near future. |
do we have any recommended workaround on this issue? from the discussion I conclude that the |
@Mingun @enricozb sorry to bother, but I'm very interested in using fn raw_de<'de, D>(deserializer: D) -> Result<String, D::Error>
where
D: Deserializer<'de>,
{
let element: Result<quick_xml::de::RawElement, _> = Deserialize::deserialize(deserializer);
Ok(element.read_text())
} will be of great value and flexibility (is this what you refer to as a DOM?) to parse XML document <title>
text <sub>1-<i>y</i></sub>
</title> currently I have to mimic HTML-like representation by using many structs and a custom deserializer like: #[derive(Deserialize, Debug)]
#[serde(rename_all(deserialize = "snake_case"))]
enum ItalicBoldString {
Sup,
Sub,
I(String),
B(ItalicBoldStringWrapper),
#[serde(rename = "$text")]
String(String),
}
#[derive(Deserialize, Debug)]
struct ItalicBoldStringWrapper {
#[serde(rename(deserialize = "$value"), default)]
field: Vec<ItalicBoldString>,
}
#[derive(Deserialize, Debug)]
#[serde(rename_all(deserialize = "snake_case"))]
enum CouldBeString {
Sup(ItalicBoldStringWrapper),
Sub(ItalicBoldStringWrapper),
I(ItalicBoldStringWrapper),
B(ItalicBoldStringWrapper),
Math,
#[serde(rename = "$text")]
String(String),
}
#[derive(Deserialize, Debug)]
struct SegmentedString {
#[serde(rename(deserialize = "$value"), default)]
field: Vec<CouldBeString>,
}
fn traverse_ibs_wrapper(ibs: &ItalicBoldStringWrapper) -> String {
ibs.field
.iter()
.map(|e| match &e {
ItalicBoldString::I(str) => str.clone(),
ItalicBoldString::B(str) => format!("{:?}", str),
ItalicBoldString::String(str) => str.clone(),
_ => "".to_string(),
})
.collect()
}
// Ok(SegmentedString::deserialize(deserializer)?.field.join(" "))
Ok(SegmentedString::deserialize(deserializer)?
.field
.iter()
.map(|e| match e {
CouldBeString::I(str) => traverse_ibs_wrapper(&str),
CouldBeString::B(str) => traverse_ibs_wrapper(&str),
CouldBeString::Sup(str) => traverse_ibs_wrapper(&str),
CouldBeString::Sub(str) => traverse_ibs_wrapper(&str),
CouldBeString::String(str) => str.clone(),
&CouldBeString::Math => "".to_string(),
})
.map(|e| e.trim().to_string())
.collect::<Vec<_>>()
.join(" "))
} in order to extract raw text or plain text from a xml node which is very tedious and ad-hoc (can only applied to a small subset of all possible combinations of the HTML tree). if I use something like |
No, currently there is no way to do that. If you want this feature, consider to contributing to the implementation. I think, that something like what the
That is #819 which I accidentally discovered a couple of hours ago. Use struct variant with |
I'm trying to deserialize some dictionary defitnitions and came across this one which contains mixed multiple tags with normal string (html text formatting).
I looked around in serde-xml-rs tests and tried this solution which seems to be close but it doesn't quite work
The error I'm getting is:
I can make it work for now by not using
MyEnum
and just usedefinition: Vec<String>
, but then I wouldn't know which text is bold and which is italic.How can I properly deserialize this?
The text was updated successfully, but these errors were encountered: