Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of template macros #674

Merged
merged 37 commits into from
Nov 2, 2023
Merged

Evaluation of template macros #674

merged 37 commits into from
Nov 2, 2023

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Oct 18, 2023

Adds:

  • a TemplateExpansion macro kind that can evaluate invocations of a user-specified macro definition.
  • a TemplateCompiler that reads a macro definition expression and emits a TemplateMacro.

Fixes #658.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Attention: 612 lines in your changes are missing coverage. Please review.

Files Coverage Δ
src/binary/non_blocking/raw_binary_reader.rs 80.15% <ø> (ø)
src/element/mod.rs 80.20% <ø> (-0.70%) ⬇️
src/ion_reader.rs 100.00% <ø> (ø)
src/lazy/binary/raw/value.rs 89.26% <100.00%> (-0.31%) ⬇️
src/lazy/encoding.rs 45.45% <100.00%> (ø)
src/lazy/raw_stream_item.rs 48.00% <ø> (ø)
src/lazy/raw_value_ref.rs 72.98% <ø> (ø)
src/lazy/reader.rs 77.16% <100.00%> (+1.53%) ⬆️
src/lazy/struct.rs 70.00% <100.00%> (ø)
src/lazy/system_stream_item.rs 0.00% <ø> (ø)
... and 27 more

... and 4 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These first few changes are addressing unnecessarily explicit doc link paths, a recently introduced clippy lint.

@@ -15,7 +15,7 @@ use crate::lazy::decoder::{
LazyRawValueExpr, RawFieldExpr, RawValueExpr,
};
use crate::lazy::encoding::{BinaryEncoding_1_0, TextEncoding_1_0, TextEncoding_1_1};
use crate::lazy::expanded::macro_evaluator::MacroInvocation;
use crate::lazy::expanded::macro_evaluator::RawEExpression;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ As you'll see later, this PR required a firmer division between E-expressions ((:foo)) and template macro invocations ((foo) in the body of a template). For now, I'll just point out that several things named ______MacroInvocation are now _____EExpression to highlight the fact that they are syntactic elements from the data stream.

Comment on lines +38 to 39
#[derive(Debug, Clone, Copy)]
pub struct AnyEncoding;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Implementations of LazyDecoder are now Copy + 'static (in addition to their other prerequisite traits) because otherwise types that are generic over <D: LazyDecoder> would have to have lots of boilerplate where clauses.

@@ -48,61 +48,59 @@ impl<'data> LazyDecoder<'data> for AnyEncoding {
type List = LazyRawAnyList<'data>;
type Struct = LazyRawAnyStruct<'data>;
type AnnotationsIterator = RawAnyAnnotationsIterator<'data>;
type MacroInvocation = LazyRawAnyMacroInvocation<'data>;
type EExpression = LazyRawAnyEExpression<'data>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All of the remaining changes in this file came from renaming *MacroInvocacation to *EExpression.

// However, many types are generic over some `D: LazyDecoder<'_>`, and having this trait
// extend 'static, Sized, Debug, Clone and Copy means that those types can #[derive(...)]
// those traits themselves without boilerplate `where` clauses.
pub trait LazyDecoder<'data>: 'static + Sized + Debug + Clone + Copy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This is the only change in this file not caused by the rename.

Comment on lines 204 to 449
(
// A collection of bump-allocated annotation strings
BumpVec<'top, &'top str>,
ExpandedValueRef<'top, 'data, D>,
),
// Constructed data stored in the bump allocator. Holding references instead of the data
// itself allows this type (and those that contain it) to impl `Copy`.
&'top [&'top str],
&'top ExpandedValueRef<'top, 'data, D>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These values all live in the bump allocator. Switching them to references allowed LazyExpandedValue to implement Copy.

src/lazy/expanded/sequence.rs Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Now that the MacroEvaluator is not generic over Vec/BumpVec, this isn't necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Everything living in this file became obsolete as a result of the refactor.

Comment on lines 72 to 73
pub struct LazySystemReader<'data, D: LazyDecoder<'data>> {
// TODO: Remove this RefCell when the Polonius borrow checker is available.
// See: https://github.com/rust-lang/rust/issues/70255
expanding_reader: RefCell<LazyExpandingReader<'data, D>>,
// TODO: Make the symbol and macro tables traits on `D` such that they can be configured
// statically. Then 1.0 types can use `Never` for the macro table.
symbol_table: SymbolTable,
macro_table: MacroTable,
allocator: BumpAllocator,
pending_lst: PendingLst,
pub(crate) expanding_reader: LazyExpandingReader<'data, D>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The LazyExpandingReader can only apply changes to the EncodingContext between top-level expressions, which meant it was much easier for it to own all of these resources. Having moved them all over to the LazyExpandingReader, it became clear that there wasn't anything left for the LazySystemReader to do. I'll merge them in another PR later. Tracking this in #675.

@zslayton zslayton marked this pull request as ready for review October 20, 2023 20:37
@zslayton zslayton requested a review from popematt October 20, 2023 20:37
src/lazy/expanded/compiler.rs Outdated Show resolved Hide resolved
src/lazy/expanded/compiler.rs Outdated Show resolved Hide resolved
Comment on lines 199 to 202
match self.evaluator.push(self.context, invocation) {
Ok(_) => continue,
Err(e) => return Some(Err(e)),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can push() return not-a-Result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is where an e-expression's arguments are evaluated for the first time. If an argument is missing, malformed, or invokes a non-existent macro, those errors will surface here.

I'm open to renaming this if you can think of a name that preserves what push() communicates (i.e. we're adding to the stack of evaluations in progress) while also indicating that some validation is happening.

src/lazy/expanded/template.rs Outdated Show resolved Hide resolved
src/lazy/expanded/template.rs Outdated Show resolved Hide resolved
Comment on lines +220 to +223
// SAFETY: The only time that the macro table, symbol table, and allocator can be modified
// is in the body of the method `between_top_level_expressions`. As long as nothing holds
// a reference to the `EncodingContext` we create here when that method is running,
// this is safe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as nothing holds a reference to the EncodingContext we create here when that method is running, this is safe.

I know it would be annoying to mark this function as unsafe, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe, and the safety should be ensured at call sites for this function.

Or am I understanding this wrongly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it would be annoying to mark this function as unsafe, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe, and the safety should be ensured at call sites for this function.

Or am I understanding this wrongly?

You're understanding it correctly. However, this is a private method and essentially every private method in this type has safety constraints. Do you think they should all be unsafe given that they're not accessible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on how "correct" we want things to be. Is it going to be easier to understand and maintain this later if we mark them as unsafe? I will defer to your judgment on the matter.

src/lazy/expanded/mod.rs Show resolved Hide resolved
src/lazy/expanded/mod.rs Outdated Show resolved Hide resolved
};
let evaluator = Self::ptr_to_evaluator(evaluator_ptr);

match evaluator.next(self.context(), 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the 0 represent here? (Maybe add a comment so that it doesn't look like a magic number.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in this commit.

src/lazy/expanded/mod.rs Outdated Show resolved Hide resolved
The text reader performs parsing in two phases: first, it matches
values and expressions. Upon request, it can then read the expression
it matched.

Prior to this PR, structural information detected during the matching
phase would be discarded. For example, when the reader would match a list,
it would also match all of the list's child expressions in the process.
However, the child expressions would not be stored; when the application
began reading the list, each one would need to be matched again. This
redundant matching effort would also happen for the fields of a struct
or the arguments of an e-expression.

The `TextEncoding_1_1` decoder now caches child and argument expressions
in the bump allocator, which makes iterating over the container or
e-expression "free" during the reading phase.

As part of this change, the lifetime associated with `TextBufferView`
was reduced from `'data` to `'top`. Readers now hold a `&[u8]` directly
and construct a transient `TextBufferView` at read time. This allows
the `TextBufferView` to hold a reference to the bump allocator, making
the allocator available in all of the parsing methods without having
to manually plumb it into each method.
Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour for most recent commits


pub fn criterion_benchmark(c: &mut Criterion) {
const NUM_VALUES: usize = 10_000;
let data_1_0 = concat!("{",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Slightly awkward syntax here to produce a compact struct instead of a pretty-printed one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel like it, you could probably use something like

r"
{
  foo:1,
  bar:2,
}
".split("\n".into()).map(|it| it.trim()).collect::<Vec<_>>().join("");

It's still kind of awkward, but it moves the awkwardness somewhere else. I don't have a preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding another commit that writes out the struct in full (as you've done) and then just roundtrips it to a String of compact text.

The same commit also makes the IonEq trait externally visible so the benchmark can confirm that the 1.0 and 1.1 test data are equivalent before measurement begins.

c.bench_function("text 1.0: scan all", |b| {
b.iter(|| {
let mut reader =
LazyApplicationReader::<'_, TextEncoding_1_1>::new(data_1_0.as_bytes()).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The 1.0 benchmark tests are using the 1.1 reader because child expression caching is hugely impactful and has not yet been added to the 1.0 decoders. Because Ion 1.1 text is a superset of 1.0 text, the 1.1 reader can process it fine. The measurement should be basically the same as using the 1.0 reader since there are just a few more branches that never get taken, but we'll get this changed over to a proper 1.0 reader once that optimization is backported.

I'll add a comment for this.

offset: usize,
allocator: &'top BumpAllocator,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ TextBufferView now holds a reference to the bump allocator, allowing it to cheaply store structural information about containers and e-expressions. Data living in the bump allocator has the 'top lifetime.

Passing an allocator into each parsing method as an argument would cause those methods to violate the Parser contract, which says they must only accept a single parameter of some type I: Input. Adding that field allows us to use it everywhere that a TBV is already accepted.

fn eq(&self, other: &Self) -> bool {
self.offset == other.offset && self.data == other.data
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Adding a BumpAllocator to TextBufferView meant that we could no longer derive an implementation of PartialEq.

@@ -383,38 +404,42 @@ impl<'data> TextBufferView<'data> {
// If the next thing in the input is a `}`, return `None`.
value(None, Self::match_struct_end),
// Otherwise, match a name/value pair and turn it into a `LazyRawTextField`.
Self::match_struct_field_name_and_value.map(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ As you'll see below, clippy (rightfully) complained about the complexity of the type returned by match_struct_field_name_and_value among others. I've introduced a MatchedFieldName type that simplifies the signatures quite a bit.

TextBufferView<'top>,
&'top [LazyRawValueExpr<'top, TextEncoding_1_1>],
),
> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Matching a list now returns both the matched text buffer and a bump-allocated collection of child value expressions.

.with_description(format!("{}", e));
Err(nom::Err::Failure(IonParseError::Invalid(error)))
let (span, child_exprs) =
match TextListSpanFinder_1_1::new(self.allocator, sequence_iter).find_span() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The _______SpanFinder types wrap a parsing iterator and return the matched span along with any nested expressions.

Struct(&'top [LazyRawFieldExpr<'top, D>]),
}

impl<'top, D: LazyDecoder> PartialEq for MatchedValue<'top, D> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Storing slices of LazyRawFieldExpr meant that we could no longer derive an implementation of PartialEq.

Comment on lines +101 to +104
struct TestReader<'data> {
allocator: BumpAllocator,
reader: LazyRawTextReader_1_0<'data>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ A consequence of expression caching is that using the raw reader requires the application to supply a BumpAllocator to each call to next(). Since this is not an API many people will interact with--most won't need to and it will be behind a feature flag--I'm ok with that tradeoff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but is there any particular reason it can't be stored in the reader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyRawReader::next() mutably borrows the raw reader for 'top, and the expanding reader needs access to the allocator after calling LazyRawReader::next() which would require another borrow.

An alternative to passing the allocator into next() is having next() return a reference to the allocator with each value. That's possible (and maybe preferable?) but I'd like to defer implementing that until after this PR.

src/lazy/text/raw/v1_1/reader.rs Outdated Show resolved Hide resolved
Comment on lines +101 to +104
struct TestReader<'data> {
allocator: BumpAllocator,
reader: LazyRawTextReader_1_0<'data>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but is there any particular reason it can't be stored in the reader?

src/lazy/text/raw/v1_1/reader.rs Show resolved Hide resolved
src/lazy/text/raw/v1_1/reader.rs Show resolved Hide resolved
src/lazy/text/raw/v1_1/reader.rs Show resolved Hide resolved
@@ -1,4 +1,4 @@
mod ion_eq;
pub(crate) mod ion_eq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted, we could leave this mod the way it is, and you can use IonData::eq(a, b) to compare to values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've done this in 158348e.

src/lib.rs Outdated Show resolved Hide resolved
@zslayton zslayton merged commit 18ba6ec into main Nov 2, 2023
20 checks passed
@zslayton zslayton deleted the template-macros branch November 2, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluating user-defined template macros
2 participants