-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation of template macros #674
Conversation
…ult impl of `resolve`
Codecov ReportAttention:
... and 4 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ These first few changes are addressing unnecessarily explicit doc link paths, a recently introduced clippy
lint.
@@ -15,7 +15,7 @@ use crate::lazy::decoder::{ | |||
LazyRawValueExpr, RawFieldExpr, RawValueExpr, | |||
}; | |||
use crate::lazy::encoding::{BinaryEncoding_1_0, TextEncoding_1_0, TextEncoding_1_1}; | |||
use crate::lazy::expanded::macro_evaluator::MacroInvocation; | |||
use crate::lazy::expanded::macro_evaluator::RawEExpression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ As you'll see later, this PR required a firmer division between E-expressions ((:foo)
) and template macro invocations ((foo)
in the body of a template). For now, I'll just point out that several things named ______MacroInvocation
are now _____EExpression
to highlight the fact that they are syntactic elements from the data stream.
#[derive(Debug, Clone, Copy)] | ||
pub struct AnyEncoding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Implementations of LazyDecoder
are now Copy + 'static
(in addition to their other prerequisite traits) because otherwise types that are generic over <D: LazyDecoder>
would have to have lots of boilerplate where
clauses.
src/lazy/any_encoding.rs
Outdated
@@ -48,61 +48,59 @@ impl<'data> LazyDecoder<'data> for AnyEncoding { | |||
type List = LazyRawAnyList<'data>; | |||
type Struct = LazyRawAnyStruct<'data>; | |||
type AnnotationsIterator = RawAnyAnnotationsIterator<'data>; | |||
type MacroInvocation = LazyRawAnyMacroInvocation<'data>; | |||
type EExpression = LazyRawAnyEExpression<'data>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ All of the remaining changes in this file came from renaming *MacroInvocacation
to *EExpression
.
src/lazy/decoder.rs
Outdated
// However, many types are generic over some `D: LazyDecoder<'_>`, and having this trait | ||
// extend 'static, Sized, Debug, Clone and Copy means that those types can #[derive(...)] | ||
// those traits themselves without boilerplate `where` clauses. | ||
pub trait LazyDecoder<'data>: 'static + Sized + Debug + Clone + Copy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This is the only change in this file not caused by the rename.
src/lazy/expanded/mod.rs
Outdated
( | ||
// A collection of bump-allocated annotation strings | ||
BumpVec<'top, &'top str>, | ||
ExpandedValueRef<'top, 'data, D>, | ||
), | ||
// Constructed data stored in the bump allocator. Holding references instead of the data | ||
// itself allows this type (and those that contain it) to impl `Copy`. | ||
&'top [&'top str], | ||
&'top ExpandedValueRef<'top, 'data, D>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ These values all live in the bump allocator. Switching them to references allowed LazyExpandedValue
to implement Copy
.
src/lazy/expanded/stack.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Now that the MacroEvaluator
is not generic over Vec
/BumpVec
, this isn't necessary.
src/lazy/expanded/tdl_macro.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Everything living in this file became obsolete as a result of the refactor.
src/lazy/system_reader.rs
Outdated
pub struct LazySystemReader<'data, D: LazyDecoder<'data>> { | ||
// TODO: Remove this RefCell when the Polonius borrow checker is available. | ||
// See: https://github.com/rust-lang/rust/issues/70255 | ||
expanding_reader: RefCell<LazyExpandingReader<'data, D>>, | ||
// TODO: Make the symbol and macro tables traits on `D` such that they can be configured | ||
// statically. Then 1.0 types can use `Never` for the macro table. | ||
symbol_table: SymbolTable, | ||
macro_table: MacroTable, | ||
allocator: BumpAllocator, | ||
pending_lst: PendingLst, | ||
pub(crate) expanding_reader: LazyExpandingReader<'data, D>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The LazyExpandingReader
can only apply changes to the EncodingContext
between top-level expressions, which meant it was much easier for it to own all of these resources. Having moved them all over to the LazyExpandingReader
, it became clear that there wasn't anything left for the LazySystemReader
to do. I'll merge them in another PR later. Tracking this in #675.
src/lazy/expanded/template.rs
Outdated
match self.evaluator.push(self.context, invocation) { | ||
Ok(_) => continue, | ||
Err(e) => return Some(Err(e)), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can push()
return not-a-Result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is where an e-expression's arguments are evaluated for the first time. If an argument is missing, malformed, or invokes a non-existent macro, those errors will surface here.
I'm open to renaming this if you can think of a name that preserves what push()
communicates (i.e. we're adding to the stack of evaluations in progress) while also indicating that some validation is happening.
// SAFETY: The only time that the macro table, symbol table, and allocator can be modified | ||
// is in the body of the method `between_top_level_expressions`. As long as nothing holds | ||
// a reference to the `EncodingContext` we create here when that method is running, | ||
// this is safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as nothing holds a reference to the
EncodingContext
we create here when that method is running, this is safe.
I know it would be annoying to mark this function as unsafe
, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe
, and the safety should be ensured at call sites for this function.
Or am I understanding this wrongly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it would be annoying to mark this function as unsafe, but you are placing conditions on when this function can safely be called, which implies that this function should be unsafe, and the safety should be ensured at call sites for this function.
Or am I understanding this wrongly?
You're understanding it correctly. However, this is a private method and essentially every private method in this type has safety constraints. Do you think they should all be unsafe
given that they're not accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how "correct" we want things to be. Is it going to be easier to understand and maintain this later if we mark them as unsafe? I will defer to your judgment on the matter.
src/lazy/expanded/mod.rs
Outdated
}; | ||
let evaluator = Self::ptr_to_evaluator(evaluator_ptr); | ||
|
||
match evaluator.next(self.context(), 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the 0
represent here? (Maybe add a comment so that it doesn't look like a magic number.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in this commit.
The text reader performs parsing in two phases: first, it matches values and expressions. Upon request, it can then read the expression it matched. Prior to this PR, structural information detected during the matching phase would be discarded. For example, when the reader would match a list, it would also match all of the list's child expressions in the process. However, the child expressions would not be stored; when the application began reading the list, each one would need to be matched again. This redundant matching effort would also happen for the fields of a struct or the arguments of an e-expression. The `TextEncoding_1_1` decoder now caches child and argument expressions in the bump allocator, which makes iterating over the container or e-expression "free" during the reading phase. As part of this change, the lifetime associated with `TextBufferView` was reduced from `'data` to `'top`. Readers now hold a `&[u8]` directly and construct a transient `TextBufferView` at read time. This allows the `TextBufferView` to hold a reference to the bump allocator, making the allocator available in all of the parsing methods without having to manually plumb it into each method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour for most recent commits
benches/read_many_structs.rs
Outdated
|
||
pub fn criterion_benchmark(c: &mut Criterion) { | ||
const NUM_VALUES: usize = 10_000; | ||
let data_1_0 = concat!("{", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Slightly awkward syntax here to produce a compact struct instead of a pretty-printed one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel like it, you could probably use something like
r"
{
foo:1,
bar:2,
}
".split("\n".into()).map(|it| it.trim()).collect::<Vec<_>>().join("");
It's still kind of awkward, but it moves the awkwardness somewhere else. I don't have a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding another commit that writes out the struct in full (as you've done) and then just roundtrips it to a String
of compact text.
The same commit also makes the IonEq
trait externally visible so the benchmark can confirm that the 1.0 and 1.1 test data are equivalent before measurement begins.
c.bench_function("text 1.0: scan all", |b| { | ||
b.iter(|| { | ||
let mut reader = | ||
LazyApplicationReader::<'_, TextEncoding_1_1>::new(data_1_0.as_bytes()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The 1.0 benchmark tests are using the 1.1 reader because child expression caching is hugely impactful and has not yet been added to the 1.0 decoders. Because Ion 1.1 text is a superset of 1.0 text, the 1.1 reader can process it fine. The measurement should be basically the same as using the 1.0 reader since there are just a few more branches that never get taken, but we'll get this changed over to a proper 1.0 reader once that optimization is backported.
I'll add a comment for this.
offset: usize, | ||
allocator: &'top BumpAllocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ TextBufferView
now holds a reference to the bump allocator, allowing it to cheaply store structural information about containers and e-expressions. Data living in the bump allocator has the 'top
lifetime.
Passing an allocator into each parsing method as an argument would cause those methods to violate the Parser
contract, which says they must only accept a single parameter of some type I: Input
. Adding that field allows us to use it everywhere that a TBV
is already accepted.
fn eq(&self, other: &Self) -> bool { | ||
self.offset == other.offset && self.data == other.data | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Adding a BumpAllocator
to TextBufferView
meant that we could no longer derive an implementation of PartialEq
.
@@ -383,38 +404,42 @@ impl<'data> TextBufferView<'data> { | |||
// If the next thing in the input is a `}`, return `None`. | |||
value(None, Self::match_struct_end), | |||
// Otherwise, match a name/value pair and turn it into a `LazyRawTextField`. | |||
Self::match_struct_field_name_and_value.map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ As you'll see below, clippy
(rightfully) complained about the complexity of the type returned by match_struct_field_name_and_value
among others. I've introduced a MatchedFieldName
type that simplifies the signatures quite a bit.
TextBufferView<'top>, | ||
&'top [LazyRawValueExpr<'top, TextEncoding_1_1>], | ||
), | ||
> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Matching a list now returns both the matched text buffer and a bump-allocated collection of child value expressions.
.with_description(format!("{}", e)); | ||
Err(nom::Err::Failure(IonParseError::Invalid(error))) | ||
let (span, child_exprs) = | ||
match TextListSpanFinder_1_1::new(self.allocator, sequence_iter).find_span() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The _______SpanFinder
types wrap a parsing iterator and return the matched span along with any nested expressions.
Struct(&'top [LazyRawFieldExpr<'top, D>]), | ||
} | ||
|
||
impl<'top, D: LazyDecoder> PartialEq for MatchedValue<'top, D> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Storing slices of LazyRawFieldExpr
meant that we could no longer derive an implementation of PartialEq
.
struct TestReader<'data> { | ||
allocator: BumpAllocator, | ||
reader: LazyRawTextReader_1_0<'data>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ A consequence of expression caching is that using the raw reader requires the application to supply a BumpAllocator
to each call to next()
. Since this is not an API many people will interact with--most won't need to and it will be behind a feature flag--I'm ok with that tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but is there any particular reason it can't be stored in the reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LazyRawReader::next()
mutably borrows the raw reader for 'top
, and the expanding reader needs access to the allocator after calling LazyRawReader::next()
which would require another borrow.
An alternative to passing the allocator into next()
is having next()
return a reference to the allocator with each value. That's possible (and maybe preferable?) but I'd like to defer implementing that until after this PR.
struct TestReader<'data> { | ||
allocator: BumpAllocator, | ||
reader: LazyRawTextReader_1_0<'data>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but is there any particular reason it can't be stored in the reader?
src/ion_data/mod.rs
Outdated
@@ -1,4 +1,4 @@ | |||
mod ion_eq; | |||
pub(crate) mod ion_eq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted, we could leave this mod the way it is, and you can use IonData::eq(a, b)
to compare to values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I've done this in 158348e.
Adds:
TemplateExpansion
macro kind that can evaluate invocations of a user-specified macro definition.TemplateCompiler
that reads a macro definition expression and emits aTemplateMacro
.Fixes #658.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.