Further serialization options #101
Replies: 3 comments
-
in a proposal for the documentation that covers the current features i had initially included a hack that makes use of the current facilities to produce serializations where the header part is formatted w/o wrapping and the text part with. class MixedFormattingOptions(NamedTuple):
indentation: str = " "
text_width: int = 79
class MixedFormatting(Transformation):
options: MixedFormattingOptions
def mind_the_gaps(self, node: TagNode):
for child in tuple(node.iterate_children(is_tag_node))[:-1]:
if not isinstance(child.fetch_following_sibling(), TextNode):
child.add_following_siblings(" ")
self.mind_the_gaps(child)
def transform(self):
if isinstance((first_child := self.root.first_child), TextNode):
first_child.content = "\n"
else:
self.root.insert_children(0, "\n")
self.mind_the_gaps(self.root[1])
self.replace_with_formatted(
self.root[1],
FormatOptions(
align_attributes=False,
indentation=self.options.indentation,
width=0,
),
)
if isinstance((separator := self.root[2]), TextNode):
separator.content = "\n"
else:
self.root.insert_children(1, "\n")
self.replace_with_formatted(
self.root[3],
FormatOptions(
align_attributes=False,
indentation=self.options.indentation,
width=self.options.text_width,
),
)
if len(self.root) > 4:
self.root.last_child.content = "\n"
else:
self.root.append_children("\n")
@staticmethod
def replace_with_formatted(node: TagNode, format_options: FormatOptions):
node.replace_with(
Document(
node.serialize(format_options=format_options),
parser_options=ParserOptions(reduce_whitespace=False, unplugged=True),
).root
)
def mixed_format(document: Document, options: Optional[MixedFormattingOptions]) -> str:
document = document.clone()
MixedFormatting(options)(document.root)
default_formatting_options = DefaultStringOptions.format_options
DefaultStringOptions.format_options = None
result = str(document)
DefaultStringOptions.format_options = default_formatting_options
return result[: (i := result.find(">") + 1)] + "\n" + result[i:] which produces (from the input given in the doc's serialization chapter): <?xml version="1.0" encoding="UTF-8"?>
<document xmlns:pi="https://pirates.code/">
<head>
<title>
Über suum venire vetuit.
</title>
<identifiers>
<id>
5dcebaa4-8760-4286-be7a-6b25fd6ae0f0
</id>
<id>
15b0c526-585f-4daf-a45f-411929ffbd61
</id>
</identifiers>
<locations>
<shelf>
A0
</shelf>
<shelf>
B1c
</shelf>
</locations>
<contributors>
<contributor height="~5"" pi:greeting="Ay'e!">
Ed Teach
</contributor>
</contributors>
</head>
<body>
<text>
<lb/>Liquidae voluptatis et liberae potest. Atqui
pugnantibus et <hi>contra</hi>riis studiis consiliisque
semper utens nihil <lb/>quieti videre, nihil tranquilli
potest. <lb/>Quodsi vitam omnem continent, neglegentur?
<lb/>Nam, ut sint illa vendibiliora, haec uberiora certe
sunt. Quamquam id quidem licebit iis existimare, qui
legerint. Nos autem hanc omnem quaestionem de finibus
bonorum et malorum, <lb/>id est voluptatem. Homines optimi
non intellegunt totam rationem everti, <lb/>si ita res se
habeat. Nam si ea sola voluptas esset,
<choice><sic>que</sic><corr>quae</corr></choice>
quasi delapsa <lb/>de caelo est ad quiete vivendum,
caritatem, praesertim cum omnino nulla sit causa peccandi.
Quae enim cupiditates a natura proficiscuntur, <lb/>facile
explentur sine ulla iniuria, quae autem inanes sunt, iis
parendum non est. <lb/>Nihil enim desiderabile
concupiscunt, plusque in ipsa iniuria detrimenti est quam
in.
</text>
</body>
</document> maybe an option that toggles text wrapping is needed:
would be the same as:
the practically though that might be an overkill. in most cases it would suffice to only address nodes that are children of the root. |
Beta Was this translation helpful? Give feedback.
-
referring to the example above, it would be nice to define tags the prefer a newline and possible indentation before (and also after?) themselves over plain spaces, in order to achieve this: <body>
<text>
<lb/>Liquidae voluptatis et liberae potest. Atqui
pugnantibus et <hi>contra</hi>riis studiis consiliisque
semper utens nihil
<lb/>quieti videre, nihil tranquilli potest.
<lb/>Quodsi vitam omnem continent, neglegentur?
<lb/>Nam, ut sint illa vendibiliora, haec uberiora certe
sunt. Quamquam id quidem licebit iis existimare, qui
legerint. Nos autem hanc omnem quaestionem de finibus
bonorum et malorum,
<lb/>id est voluptatem. Homines optimi […]
</text>
</body> maybe by providing:
|
Beta Was this translation helpful? Give feedback.
-
for both of the above options, the idea of using language-agnostic querying interfaces is relevant. e.g.:
|
Beta Was this translation helpful? Give feedback.
-
yet unscheduled is the re-implementation of the serialization logic. until then it's a good opportunity to come up with ideas for features that further the actual erfüllung of the boomers' promises that were dubbed as human-readability.
it will probably be the best course to try to implement these additional options on top of an extension interface as each will add overhead that should be avoided in the simpler cases.
Beta Was this translation helpful? Give feedback.
All reactions