Evaluation of template macros #674

zslayton · 2023-10-18T21:03:04Z

Adds:

a TemplateExpansion macro kind that can evaluate invocations of a user-specified macro definition.
a TemplateCompiler that reads a macro definition expression and emits a TemplateMacro.

Fixes #658.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ession

…ult impl of `resolve`

codecov · 2023-10-18T21:05:48Z

Codecov Report

Attention: 612 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
src/binary/non_blocking/raw_binary_reader.rs	`80.15% <ø> (ø)`
src/element/mod.rs	`80.20% <ø> (-0.70%)`	⬇️
src/ion_reader.rs	`100.00% <ø> (ø)`
src/lazy/binary/raw/value.rs	`89.26% <100.00%> (-0.31%)`	⬇️
src/lazy/encoding.rs	`45.45% <100.00%> (ø)`
src/lazy/raw_stream_item.rs	`48.00% <ø> (ø)`
src/lazy/raw_value_ref.rs	`72.98% <ø> (ø)`
src/lazy/reader.rs	`77.16% <100.00%> (+1.53%)`	⬆️
src/lazy/struct.rs	`70.00% <100.00%> (ø)`
src/lazy/system_stream_item.rs	`0.00% <ø> (ø)`
... and 27 more

... and 4 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

zslayton

🗺️ PR tour

zslayton · 2023-10-20T19:19:35Z

src/binary/non_blocking/raw_binary_reader.rs

🗺️ These first few changes are addressing unnecessarily explicit doc link paths, a recently introduced clippy lint.

zslayton · 2023-10-20T19:22:27Z

src/lazy/any_encoding.rs

@@ -15,7 +15,7 @@ use crate::lazy::decoder::{
    LazyRawValueExpr, RawFieldExpr, RawValueExpr,
 };
 use crate::lazy::encoding::{BinaryEncoding_1_0, TextEncoding_1_0, TextEncoding_1_1};
-use crate::lazy::expanded::macro_evaluator::MacroInvocation;
+use crate::lazy::expanded::macro_evaluator::RawEExpression;


🗺️ As you'll see later, this PR required a firmer division between E-expressions ((:foo)) and template macro invocations ((foo) in the body of a template). For now, I'll just point out that several things named ______MacroInvocation are now _____EExpression to highlight the fact that they are syntactic elements from the data stream.

zslayton · 2023-10-20T19:24:07Z

src/lazy/any_encoding.rs

+#[derive(Debug, Clone, Copy)]
 pub struct AnyEncoding;


🗺️ Implementations of LazyDecoder are now Copy + 'static (in addition to their other prerequisite traits) because otherwise types that are generic over <D: LazyDecoder> would have to have lots of boilerplate where clauses.

zslayton · 2023-10-20T19:25:25Z

src/lazy/any_encoding.rs

@@ -48,61 +48,59 @@ impl<'data> LazyDecoder<'data> for AnyEncoding {
    type List = LazyRawAnyList<'data>;
    type Struct = LazyRawAnyStruct<'data>;
    type AnnotationsIterator = RawAnyAnnotationsIterator<'data>;
-    type MacroInvocation = LazyRawAnyMacroInvocation<'data>;
+    type EExpression = LazyRawAnyEExpression<'data>;


🗺️ All of the remaining changes in this file came from renaming *MacroInvocacation to *EExpression.

zslayton · 2023-10-20T19:26:43Z

src/lazy/decoder.rs

+// However, many types are generic over some `D: LazyDecoder<'_>`, and having this trait
+// extend 'static, Sized, Debug, Clone and Copy means that those types can #[derive(...)]
+// those traits themselves without boilerplate `where` clauses.
+pub trait LazyDecoder<'data>: 'static + Sized + Debug + Clone + Copy {


🗺️ This is the only change in this file not caused by the rename.

zslayton · 2023-10-20T20:27:44Z

src/lazy/expanded/mod.rs

-        (
-            // A collection of bump-allocated annotation strings
-            BumpVec<'top, &'top str>,
-            ExpandedValueRef<'top, 'data, D>,
-        ),
+        // Constructed data stored in the bump allocator. Holding references instead of the data
+        // itself allows this type (and those that contain it) to impl `Copy`.
+        &'top [&'top str],
+        &'top ExpandedValueRef<'top, 'data, D>,


🗺️ These values all live in the bump allocator. Switching them to references allowed LazyExpandedValue to implement Copy.

src/lazy/expanded/sequence.rs

zslayton · 2023-10-20T20:29:51Z

src/lazy/expanded/stack.rs

🗺️ Now that the MacroEvaluator is not generic over Vec/BumpVec, this isn't necessary.

zslayton · 2023-10-20T20:30:41Z

src/lazy/expanded/tdl_macro.rs

🗺️ Everything living in this file became obsolete as a result of the refactor.

zslayton · 2023-10-20T20:36:08Z

src/lazy/system_reader.rs

 pub struct LazySystemReader<'data, D: LazyDecoder<'data>> {
-    // TODO: Remove this RefCell when the Polonius borrow checker is available.
-    //       See: https://github.com/rust-lang/rust/issues/70255
-    expanding_reader: RefCell<LazyExpandingReader<'data, D>>,
-    // TODO: Make the symbol and macro tables traits on `D` such that they can be configured
-    //       statically. Then 1.0 types can use `Never` for the macro table.
-    symbol_table: SymbolTable,
-    macro_table: MacroTable,
-    allocator: BumpAllocator,
-    pending_lst: PendingLst,
+    pub(crate) expanding_reader: LazyExpandingReader<'data, D>,


🗺️ The LazyExpandingReader can only apply changes to the EncodingContext between top-level expressions, which meant it was much easier for it to own all of these resources. Having moved them all over to the LazyExpandingReader, it became clear that there wasn't anything left for the LazySystemReader to do. I'll merge them in another PR later. Tracking this in #675.

src/lazy/expanded/compiler.rs

popematt · 2023-10-24T21:44:01Z

src/lazy/expanded/template.rs

+                    match self.evaluator.push(self.context, invocation) {
+                        Ok(_) => continue,
+                        Err(e) => return Some(Err(e)),
+                    };


Can push() return not-a-Result?

This method is where an e-expression's arguments are evaluated for the first time. If an argument is missing, malformed, or invokes a non-existent macro, those errors will surface here.

I'm open to renaming this if you can think of a name that preserves what push() communicates (i.e. we're adding to the stack of evaluations in progress) while also indicating that some validation is happening.

src/lazy/expanded/template.rs

popematt · 2023-10-25T17:49:22Z

src/lazy/expanded/mod.rs

+        // SAFETY: The only time that the macro table, symbol table, and allocator can be modified
+        // is in the body of the method `between_top_level_expressions`. As long as nothing holds
+        // a reference to the `EncodingContext` we create here when that method is running,
+        // this is safe.


As long as nothing holds a reference to the EncodingContext we create here when that method is running, this is safe.

I know it would be annoying to mark this function as unsafe, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe, and the safety should be ensured at call sites for this function.

Or am I understanding this wrongly?

I know it would be annoying to mark this function as unsafe, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe, and the safety should be ensured at call sites for this function.

Or am I understanding this wrongly?

You're understanding it correctly. However, this is a private method and essentially every private method in this type has safety constraints. Do you think they should all be unsafe given that they're not accessible?

I guess it depends on how "correct" we want things to be. Is it going to be easier to understand and maintain this later if we mark them as unsafe? I will defer to your judgment on the matter.

src/lazy/expanded/mod.rs

popematt · 2023-10-25T18:00:17Z

src/lazy/expanded/mod.rs

+        };
+        let evaluator = Self::ptr_to_evaluator(evaluator_ptr);
+
+        match evaluator.next(self.context(), 0) {


What does the 0 represent here? (Maybe add a comment so that it doesn't look like a magic number.)

Addressed in this commit.

src/lazy/expanded/mod.rs

The text reader performs parsing in two phases: first, it matches values and expressions. Upon request, it can then read the expression it matched. Prior to this PR, structural information detected during the matching phase would be discarded. For example, when the reader would match a list, it would also match all of the list's child expressions in the process. However, the child expressions would not be stored; when the application began reading the list, each one would need to be matched again. This redundant matching effort would also happen for the fields of a struct or the arguments of an e-expression. The `TextEncoding_1_1` decoder now caches child and argument expressions in the bump allocator, which makes iterating over the container or e-expression "free" during the reading phase. As part of this change, the lifetime associated with `TextBufferView` was reduced from `'data` to `'top`. Readers now hold a `&[u8]` directly and construct a transient `TextBufferView` at read time. This allows the `TextBufferView` to hold a reference to the bump allocator, making the allocator available in all of the parsing methods without having to manually plumb it into each method.

zslayton

🗺️ PR tour for most recent commits

zslayton · 2023-10-31T16:24:10Z

benches/read_many_structs.rs

+
+pub fn criterion_benchmark(c: &mut Criterion) {
+    const NUM_VALUES: usize = 10_000;
+    let data_1_0 = concat!("{",


🗺️ Slightly awkward syntax here to produce a compact struct instead of a pretty-printed one.

If you feel like it, you could probably use something like

r" { foo:1, bar:2, } ".split("\n".into()).map(|it| it.trim()).collect::<Vec<_>>().join("");

It's still kind of awkward, but it moves the awkwardness somewhere else. I don't have a preference.

I'm adding another commit that writes out the struct in full (as you've done) and then just roundtrips it to a String of compact text.

The same commit also makes the IonEq trait externally visible so the benchmark can confirm that the 1.0 and 1.1 test data are equivalent before measurement begins.

zslayton · 2023-10-31T16:27:41Z

benches/read_many_structs.rs

+    c.bench_function("text 1.0: scan all", |b| {
+        b.iter(|| {
+            let mut reader =
+                LazyApplicationReader::<'_, TextEncoding_1_1>::new(data_1_0.as_bytes()).unwrap();


🗺️ The 1.0 benchmark tests are using the 1.1 reader because child expression caching is hugely impactful and has not yet been added to the 1.0 decoders. Because Ion 1.1 text is a superset of 1.0 text, the 1.1 reader can process it fine. The measurement should be basically the same as using the 1.0 reader since there are just a few more branches that never get taken, but we'll get this changed over to a proper 1.0 reader once that optimization is backported.

I'll add a comment for this.

zslayton · 2023-10-31T16:33:09Z

src/lazy/text/buffer.rs

    offset: usize,
+    allocator: &'top BumpAllocator,


🗺️ TextBufferView now holds a reference to the bump allocator, allowing it to cheaply store structural information about containers and e-expressions. Data living in the bump allocator has the 'top lifetime.

Passing an allocator into each parsing method as an argument would cause those methods to violate the Parser contract, which says they must only accept a single parameter of some type I: Input. Adding that field allows us to use it everywhere that a TBV is already accepted.

zslayton · 2023-10-31T16:33:53Z

src/lazy/text/buffer.rs

+    fn eq(&self, other: &Self) -> bool {
+        self.offset == other.offset && self.data == other.data
+    }
+}


🗺️ Adding a BumpAllocator to TextBufferView meant that we could no longer derive an implementation of PartialEq.

zslayton · 2023-10-31T16:36:12Z

src/lazy/text/buffer.rs

@@ -383,38 +404,42 @@ impl<'data> TextBufferView<'data> {
            // If the next thing in the input is a `}`, return `None`.
            value(None, Self::match_struct_end),
            // Otherwise, match a name/value pair and turn it into a `LazyRawTextField`.
-            Self::match_struct_field_name_and_value.map(


🗺️ As you'll see below, clippy (rightfully) complained about the complexity of the type returned by match_struct_field_name_and_value among others. I've introduced a MatchedFieldName type that simplifies the signatures quite a bit.

zslayton · 2023-10-31T16:43:53Z

src/lazy/text/buffer.rs

+            TextBufferView<'top>,
+            &'top [LazyRawValueExpr<'top, TextEncoding_1_1>],
+        ),
+    > {


🗺️ Matching a list now returns both the matched text buffer and a bump-allocated collection of child value expressions.

zslayton · 2023-10-31T16:44:59Z

src/lazy/text/buffer.rs

-                        .with_description(format!("{}", e));
-                    Err(nom::Err::Failure(IonParseError::Invalid(error)))
+        let (span, child_exprs) =
+            match TextListSpanFinder_1_1::new(self.allocator, sequence_iter).find_span() {


🗺️ The _______SpanFinder types wrap a parsing iterator and return the matched span along with any nested expressions.

zslayton · 2023-10-31T16:47:04Z

src/lazy/text/matched.rs

+    Struct(&'top [LazyRawFieldExpr<'top, D>]),
+}
+
+impl<'top, D: LazyDecoder> PartialEq for MatchedValue<'top, D> {


🗺️ Storing slices of LazyRawFieldExpr meant that we could no longer derive an implementation of PartialEq.

zslayton · 2023-10-31T16:49:54Z

src/lazy/text/raw/reader.rs

+    struct TestReader<'data> {
+        allocator: BumpAllocator,
+        reader: LazyRawTextReader_1_0<'data>,
+    }


🗺️ A consequence of expression caching is that using the raw reader requires the application to supply a BumpAllocator to each call to next(). Since this is not an API many people will interact with--most won't need to and it will be behind a feature flag--I'm ok with that tradeoff.

This is fine, but is there any particular reason it can't be stored in the reader?

LazyRawReader::next() mutably borrows the raw reader for 'top, and the expanding reader needs access to the allocator after calling LazyRawReader::next() which would require another borrow.

An alternative to passing the allocator into next() is having next() return a reference to the allocator with each value. That's possible (and maybe preferable?) but I'd like to defer implementing that until after this PR.

src/lazy/text/raw/v1_1/reader.rs

popematt · 2023-10-31T20:24:03Z

src/lazy/text/raw/reader.rs

+    struct TestReader<'data> {
+        allocator: BumpAllocator,
+        reader: LazyRawTextReader_1_0<'data>,
+    }


This is fine, but is there any particular reason it can't be stored in the reader?

src/lazy/text/raw/v1_1/reader.rs

popematt · 2023-11-02T17:43:58Z

src/ion_data/mod.rs

@@ -1,4 +1,4 @@
-mod ion_eq;
+pub(crate) mod ion_eq;


If you wanted, we could leave this mod the way it is, and you can use IonData::eq(a, b) to compare to values.

Good call, I've done this in 158348e.

src/lib.rs

zslayton added 25 commits September 25, 2023 11:29

Initial implementation of 1.1 text reader

2185cc3

Fixed private doc links

222bcb2

Recursive expansion of TDL containers

fb37931

Relocate From<LazyExpandedValue> for LazyValue impls

22ca798

Incorporates feedback from PR #645

f992f86

Merge remote-tracking branch 'origin/main' into pr_645_feedback

77fd243

Expanded doc comments

28f6d93

Make MacroEvaluator a trait instead of a struct.

60166cf

Merge remote-tracking branch 'origin/main' into invoking-templates

435da4c

Introduces a TemplateCompiler

0de6c1c

wip

2147a6e

raw argument iteration

c2a700d

wip environment plumbing

002faf1

wip transient evaluators

a176e81

Working expansion w/o variables

2675d43

Variable substitution

bdc89b3

Merge remote-tracking branch 'origin/main' into invoking-templates

9c2a079

cleanup/clippy

9949390

removed empty module

7901a39

Moved SystemReader functionality into ExpandingReader

9150196

Removed MacroInvocation trait, renamed RawMacroInvocation to RawEExpr…

368cfb4

…ession

Doc comments, moved TemplateExpansion

3943f49

Renamed LazyDecoder::RawMacroInvocation to _::EExpression, added defa…

789a325

…ult impl of `resolve`

Adds a ValueExpr to represent expressions after variable resolution

b02b18c

More tests, comment cleanup

d169fa8

zslayton added 3 commits October 19, 2023 13:52

works, but with miri error

3247455

Simpler expanding reader impl using UnsafeCell

528cac1

doc link fixes

1b5de55

zslayton mentioned this pull request Oct 20, 2023

Merge the LazySystemReader and the LazyExpandingReader #675

Open

zslayton commented Oct 20, 2023

View reviewed changes

zslayton marked this pull request as ready for review October 20, 2023 20:37

zslayton requested a review from popematt October 20, 2023 20:37

popematt reviewed Oct 25, 2023

View reviewed changes

zslayton added 2 commits October 26, 2023 20:18

Removed 'data lifetime from <'top, 'data, D> signatures

647571f

zslayton commented Oct 31, 2023

View reviewed changes

popematt reviewed Oct 31, 2023

View reviewed changes

zslayton and others added 6 commits November 1, 2023 13:05

More doc comments, better TemplateMacro debug formatting

39df701

Merge branch 'main' into template-macros

f9a0073

Adds correctness test to the benchmark

04aff35

Configures codecov to use the repo token for uploads

56b7994

additional code coverage

95e885d

Simplifies MacroEvaluator::next usage in the common case

1db5b30

popematt reviewed Nov 2, 2023

View reviewed changes

Uses IonData instead of making IonEq pub

158348e

popematt approved these changes Nov 2, 2023

View reviewed changes

zslayton merged commit 18ba6ec into main Nov 2, 2023
20 checks passed

zslayton deleted the template-macros branch November 2, 2023 20:36

Evaluation of template macros #674

Evaluation of template macros #674

Conversation

zslayton commented Oct 18, 2023 • edited Loading

codecov bot commented Oct 18, 2023 • edited Loading

Codecov Report

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton commented Oct 18, 2023 •

edited

Loading

codecov bot commented Oct 18, 2023 •

edited

Loading