Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of template macros #674

Merged
merged 37 commits into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
2185cc3
Initial implementation of 1.1 text reader
zslayton Sep 9, 2023
222bcb2
Fixed private doc links
zslayton Sep 25, 2023
fb37931
Recursive expansion of TDL containers
zslayton Sep 27, 2023
22ca798
Relocate `From<LazyExpandedValue> for LazyValue` impls
zslayton Sep 28, 2023
f992f86
Incorporates feedback from PR #645
zslayton Sep 28, 2023
77fd243
Merge remote-tracking branch 'origin/main' into pr_645_feedback
zslayton Sep 29, 2023
28f6d93
Expanded doc comments
zslayton Sep 29, 2023
60166cf
Make MacroEvaluator a trait instead of a struct.
zslayton Oct 2, 2023
435da4c
Merge remote-tracking branch 'origin/main' into invoking-templates
zslayton Oct 2, 2023
0de6c1c
Introduces a TemplateCompiler
zslayton Oct 3, 2023
2147a6e
wip
zslayton Oct 6, 2023
c2a700d
raw argument iteration
zslayton Oct 8, 2023
002faf1
wip environment plumbing
zslayton Oct 9, 2023
a176e81
wip transient evaluators
zslayton Oct 10, 2023
2675d43
Working expansion w/o variables
zslayton Oct 13, 2023
bdc89b3
Variable substitution
zslayton Oct 13, 2023
9c2a079
Merge remote-tracking branch 'origin/main' into invoking-templates
zslayton Oct 17, 2023
9949390
cleanup/clippy
zslayton Oct 17, 2023
7901a39
removed empty module
zslayton Oct 17, 2023
9150196
Moved SystemReader functionality into ExpandingReader
zslayton Oct 17, 2023
368cfb4
Removed MacroInvocation trait, renamed RawMacroInvocation to RawEExpr…
zslayton Oct 17, 2023
3943f49
Doc comments, moved TemplateExpansion
zslayton Oct 17, 2023
789a325
Renamed LazyDecoder::RawMacroInvocation to _::EExpression, added defa…
zslayton Oct 18, 2023
b02b18c
Adds a ValueExpr to represent expressions after variable resolution
zslayton Oct 18, 2023
d169fa8
More tests, comment cleanup
zslayton Oct 18, 2023
3247455
works, but with miri error
zslayton Oct 19, 2023
528cac1
Simpler expanding reader impl using UnsafeCell
zslayton Oct 20, 2023
1b5de55
doc link fixes
zslayton Oct 20, 2023
647571f
Removed 'data lifetime from <'top, 'data, D> signatures
zslayton Oct 27, 2023
01a0265
Caches nested expressions during parsing
zslayton Oct 31, 2023
39df701
More doc comments, better TemplateMacro debug formatting
zslayton Nov 1, 2023
f9a0073
Merge branch 'main' into template-macros
zslayton Nov 1, 2023
04aff35
Adds correctness test to the benchmark
zslayton Nov 2, 2023
56b7994
Configures codecov to use the repo token for uploads
zslayton Nov 2, 2023
95e885d
additional code coverage
zslayton Nov 2, 2023
1db5b30
Simplifies MacroEvaluator::next usage in the common case
zslayton Nov 2, 2023
158348e
Uses IonData instead of making IonEq pub
zslayton Nov 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ memmap = "0.7.0"
criterion = "0.5.1"

[[bench]]
name = "read_text_single_struct"
name = "read_many_structs"
harness = false

[profile.release]
Expand Down
120 changes: 120 additions & 0 deletions benches/read_many_structs.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use ion_rs::lazy::decoder::LazyDecoder;
use ion_rs::lazy::encoding::TextEncoding_1_1;
use ion_rs::lazy::r#struct::LazyStruct;
use ion_rs::lazy::reader::{LazyApplicationReader, LazyTextReader_1_0, LazyTextReader_1_1};
use ion_rs::lazy::value::LazyValue;
use ion_rs::lazy::value_ref::ValueRef;
use ion_rs::IonResult;
use nom::AsBytes;

pub fn criterion_benchmark(c: &mut Criterion) {
const NUM_VALUES: usize = 10_000;
let data_1_0 = concat!("{",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Slightly awkward syntax here to produce a compact struct instead of a pretty-printed one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel like it, you could probably use something like

r"
{
  foo:1,
  bar:2,
}
".split("\n".into()).map(|it| it.trim()).collect::<Vec<_>>().join("");

It's still kind of awkward, but it moves the awkwardness somewhere else. I don't have a preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding another commit that writes out the struct in full (as you've done) and then just roundtrips it to a String of compact text.

The same commit also makes the IonEq trait externally visible so the benchmark can confirm that the 1.0 and 1.1 test data are equivalent before measurement begins.

"'timestamp': 1670446800245,",
"'threadId': 418,",
r#"'threadName': "scheduler-thread-6","#,
r#"'loggerName': "com.example.organization.product.component.ClassName","#,
"'logLevel': INFO,",
r#"'format': "Request status: {} Client ID: {} Client Host: {} Client Region: {} Timestamp: {}","#,
r#"'parameters': ["SUCCESS","example-client-1","aws-us-east-5f-18b4fa","region 4","2022-12-07T20:59:59.744000Z",],"#,
"}"
).repeat(NUM_VALUES);
let template_text = r#"
(macro event (timestamp thread_id thread_name client_num host_id parameters)
{
'timestamp': timestamp,
'threadId': thread_id,
'threadName': (make_string "scheduler-thread-" thread_name),
'loggerName': "com.example.organization.product.component.ClassName",
'logLevel': (quote INFO),
'format': "Request status: {} Client ID: {} Client Host: {} Client Region: {} Timestamp: {}",
'parameters': [
"SUCCESS",
(make_string "example-client-" client_num),
(make_string "aws-us-east-5f-" host_id),
parameters
]
}
)
"#;

let data_1_1 = r#"(:event 1670446800245 418 "6" "1" "18b4fa" (:values "region 4" "2022-12-07T20:59:59.744000Z"))"#.repeat(NUM_VALUES);

println!("Ion 1.0 data size: {} bytes", data_1_0.len());
println!("Ion 1.1 data size: {} bytes", data_1_1.len());

fn count_value_and_children<D: LazyDecoder>(lazy_value: &LazyValue<'_, D>) -> IonResult<usize> {
use ValueRef::*;
let child_count = match lazy_value.read()? {
List(s) => count_sequence_children(s.iter())?,
SExp(s) => count_sequence_children(s.iter())?,
Struct(s) => count_struct_children(&s)?,
scalar => {
let _ = black_box(scalar);
0
}
};
Ok(1 + child_count)
}

fn count_sequence_children<'a, D: LazyDecoder>(
lazy_sequence: impl Iterator<Item = IonResult<LazyValue<'a, D>>>,
) -> IonResult<usize> {
let mut count = 0;
for value in lazy_sequence {
count += count_value_and_children(&value?)?;
}
Ok(count)
}

fn count_struct_children<D: LazyDecoder>(lazy_struct: &LazyStruct<'_, D>) -> IonResult<usize> {
let mut count = 0;
for field in lazy_struct {
count += count_value_and_children(&field?.value())?;
}
Ok(count)
}

c.bench_function("text 1.0: scan all", |b| {
b.iter(|| {
let mut reader =
LazyApplicationReader::<'_, TextEncoding_1_1>::new(data_1_0.as_bytes()).unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The 1.0 benchmark tests are using the 1.1 reader because child expression caching is hugely impactful and has not yet been added to the 1.0 decoders. Because Ion 1.1 text is a superset of 1.0 text, the 1.1 reader can process it fine. The measurement should be basically the same as using the 1.0 reader since there are just a few more branches that never get taken, but we'll get this changed over to a proper 1.0 reader once that optimization is backported.

I'll add a comment for this.

while let Some(item) = reader.next().unwrap() {
black_box(item);
}
})
});
c.bench_function("text 1.0: read all", |b| {
b.iter(|| {
let mut reader =
LazyApplicationReader::<'_, TextEncoding_1_1>::new(data_1_0.as_bytes()).unwrap();
let mut num_values = 0usize;
while let Some(item) = reader.next().unwrap() {
num_values += count_value_and_children(&item).unwrap();
}
})
});
c.bench_function("text 1.1: scan all", |b| {
b.iter(|| {
let mut reader = LazyTextReader_1_1::new(data_1_1.as_bytes()).unwrap();
reader.register_template(template_text).unwrap();
while let Some(item) = reader.next().unwrap() {
black_box(item);
}
})
});
c.bench_function("text 1.1: read all", |b| {
b.iter(|| {
let mut reader = LazyTextReader_1_1::new(data_1_1.as_bytes()).unwrap();
reader.register_template(template_text).unwrap();
let mut num_values = 0usize;
while let Some(item) = reader.next().unwrap() {
num_values += count_value_and_children(&item).unwrap();
}
})
});
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
81 changes: 0 additions & 81 deletions benches/read_text_single_struct.rs

This file was deleted.

21 changes: 10 additions & 11 deletions src/lazy/any_encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

use std::fmt::Debug;

use bumpalo::Bump as BumpAllocator;

use crate::lazy::binary::raw::annotations_iterator::RawBinaryAnnotationsIterator;
use crate::lazy::binary::raw::r#struct::{LazyRawBinaryStruct, RawBinaryStructIterator};
use crate::lazy::binary::raw::reader::LazyRawBinaryReader;
Expand All @@ -26,16 +28,13 @@ use crate::lazy::text::raw::sequence::{
};
use crate::lazy::text::raw::v1_1::reader::{
LazyRawTextList_1_1, LazyRawTextSExp_1_1, LazyRawTextStruct_1_1, MacroIdRef,
RawTextEExpression_1_1, RawTextListIterator_1_1, RawTextSExpIterator_1_1,
RawTextStructIterator_1_1,
RawTextEExpression_1_1, RawTextSequenceCacheIterator_1_1, RawTextStructCacheIterator_1_1,
};
use crate::lazy::text::value::{
LazyRawTextValue_1_0, LazyRawTextValue_1_1, RawTextAnnotationsIterator,
};
use crate::{IonResult, IonType, RawSymbolTokenRef};

use bumpalo::Bump as BumpAllocator;

/// An implementation of the `LazyDecoder` trait that can read any encoding of Ion.
#[derive(Debug, Clone, Copy)]
pub struct AnyEncoding;
Expand Down Expand Up @@ -169,14 +168,13 @@ impl<'data> LazyRawReader<'data, AnyEncoding> for LazyRawAnyReader<'data> {

fn next<'top>(
&'top mut self,
// TODO: This will be used once the 1.1 readers are added to the `self.encoding` enum
_allocator: &'top BumpAllocator,
allocator: &'top BumpAllocator,
) -> IonResult<LazyRawStreamItem<'top, AnyEncoding>>
where
'data: 'top,
{
match &mut self.encoding {
RawReaderKind::Text_1_0(r) => Ok(r.next()?.into()),
RawReaderKind::Text_1_0(r) => Ok(r.next(allocator)?.into()),
RawReaderKind::Binary_1_0(r) => Ok(r.next()?.into()),
}
}
Expand Down Expand Up @@ -503,7 +501,7 @@ pub struct RawAnyListIterator<'data> {
pub enum RawAnyListIteratorKind<'data> {
Text_1_0(RawTextListIterator_1_0<'data>),
Binary_1_0(RawBinarySequenceIterator<'data>),
Text_1_1(RawTextListIterator_1_1<'data>),
Text_1_1(RawTextSequenceCacheIterator_1_1<'data>),
}

impl<'data> Iterator for RawAnyListIterator<'data> {
Expand Down Expand Up @@ -623,7 +621,7 @@ pub struct RawAnySExpIterator<'data> {
pub enum RawAnySExpIteratorKind<'data> {
Text_1_0(RawTextSExpIterator_1_0<'data>),
Binary_1_0(RawBinarySequenceIterator<'data>),
Text_1_1(RawTextSExpIterator_1_1<'data>),
Text_1_1(RawTextSequenceCacheIterator_1_1<'data>),
}

impl<'data> Iterator for RawAnySExpIterator<'data> {
Expand Down Expand Up @@ -727,7 +725,7 @@ pub struct RawAnyStructIterator<'data> {
pub enum RawAnyStructIteratorKind<'data> {
Text_1_0(RawTextStructIterator_1_0<'data>),
Binary_1_0(RawBinaryStructIterator<'data>),
Text_1_1(RawTextStructIterator_1_1<'data>),
Text_1_1(RawTextStructCacheIterator_1_1<'data>),
}

impl<'data> Iterator for RawAnyStructIterator<'data> {
Expand Down Expand Up @@ -879,14 +877,15 @@ impl<'data> IntoIterator for LazyRawAnyStruct<'data> {

#[cfg(test)]
mod tests {
use super::*;
use crate::lazy::any_encoding::LazyRawAnyReader;
use crate::lazy::binary::test_utilities::to_binary_ion;
use crate::lazy::decoder::{LazyRawReader, LazyRawSequence, LazyRawValue};
use crate::lazy::raw_stream_item::LazyRawStreamItem;
use crate::lazy::raw_value_ref::RawValueRef;
use crate::{IonResult, RawSymbolTokenRef, Timestamp};

use super::*;

#[test]
fn any_encoding() -> IonResult<()> {
fn test_input(data: &[u8]) -> IonResult<()> {
Expand Down
14 changes: 7 additions & 7 deletions src/lazy/decoder.rs
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
use std::fmt::Debug;

use bumpalo::Bump as BumpAllocator;

use crate::lazy::encoding::TextEncoding_1_1;
use crate::lazy::expanded::macro_evaluator::RawEExpression;
use crate::lazy::raw_stream_item::LazyRawStreamItem;
use crate::lazy::raw_value_ref::RawValueRef;
use crate::lazy::text::raw::v1_1::reader::{
MacroIdRef, RawTextEExpression_1_1, RawTextSExpIterator_1_1,
MacroIdRef, RawTextEExpression_1_1, RawTextSequenceCacheIterator_1_1,
};
use crate::result::IonFailure;
use crate::{IonResult, IonType, RawSymbolTokenRef};

use bumpalo::Bump as BumpAllocator;

/// A family of types that collectively comprise the lazy reader API for an Ion serialization
/// format. These types operate at the 'raw' level; they do not attempt to resolve symbols
/// using the active symbol table.
Expand Down Expand Up @@ -44,7 +44,7 @@ pub trait LazyDecoder: 'static + Sized + Debug + Clone + Copy {
///
/// When working with `RawValueExpr`s that always use a given decoder's `Value` and
/// `MacroInvocation` associated types, consider using [`LazyRawValueExpr`] instead.
#[derive(Copy, Clone, Debug)]
#[derive(Copy, Clone, Debug, PartialEq)]
pub enum RawValueExpr<V, M> {
/// A value literal. For example: `5`, `foo`, or `"hello"` in text.
ValueLiteral(V),
Expand Down Expand Up @@ -92,7 +92,7 @@ impl<V: Debug, M: Debug> RawValueExpr<V, M> {
/// * a name/value pair (as it is in Ion 1.0)
/// * a name/e-expression pair
/// * an e-expression
#[derive(Debug)]
#[derive(Clone, Debug)]
pub enum RawFieldExpr<'top, V, M> {
NameValuePair(RawSymbolTokenRef<'top>, RawValueExpr<V, M>),
MacroInvocation(M),
Expand Down Expand Up @@ -179,14 +179,14 @@ pub(crate) mod private {
}

impl<'top> RawEExpression<'top, TextEncoding_1_1> for RawTextEExpression_1_1<'top> {
type RawArgumentsIterator<'a> = RawTextSExpIterator_1_1<'top> where Self: 'a;
type RawArgumentsIterator<'a> = RawTextSequenceCacheIterator_1_1<'top> where Self: 'a;

fn id(&self) -> MacroIdRef<'top> {
self.id
}

fn raw_arguments(&self) -> Self::RawArgumentsIterator<'_> {
RawTextSExpIterator_1_1::new(self.arguments_bytes())
RawTextSequenceCacheIterator_1_1::new(self.arg_expr_cache)
}
}

Expand Down
14 changes: 10 additions & 4 deletions src/lazy/encoding.rs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All changes in this file are from the rename.

Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,21 @@ pub trait BinaryEncoding: Encoding {}
pub trait TextEncoding<'top>:
Encoding + LazyDecoder<AnnotationsIterator<'top> = RawTextAnnotationsIterator<'top>>
{
fn value_from_matched(matched: MatchedRawTextValue) -> <Self as LazyDecoder>::Value<'_>;
fn value_from_matched(
matched: MatchedRawTextValue<'top, Self>,
) -> <Self as LazyDecoder>::Value<'top>;
}
impl<'top> TextEncoding<'top> for TextEncoding_1_0 {
fn value_from_matched(matched: MatchedRawTextValue) -> <Self as LazyDecoder>::Value<'_> {
fn value_from_matched(
matched: MatchedRawTextValue<'_, Self>,
) -> <Self as LazyDecoder>::Value<'_> {
LazyRawTextValue_1_0::from(matched)
}
}
impl<'top> TextEncoding<'top> for TextEncoding_1_1 {
fn value_from_matched(matched: MatchedRawTextValue) -> <Self as LazyDecoder>::Value<'_> {
fn value_from_matched(
matched: MatchedRawTextValue<'_, Self>,
) -> <Self as LazyDecoder>::Value<'_> {
LazyRawTextValue_1_1::from(matched)
}
}
Expand Down Expand Up @@ -127,7 +133,7 @@ impl LazyDecoder for TextEncoding_1_1 {
// the implementation will conflict with the core `impl<T> From<T> for T` implementation.
pub trait RawValueLiteral {}

impl<'top> RawValueLiteral for MatchedRawTextValue<'top> {}
impl<'top, E: TextEncoding<'top>> RawValueLiteral for MatchedRawTextValue<'top, E> {}
impl<'top, E: TextEncoding<'top>> RawValueLiteral for LazyRawTextValue<'top, E> {}
impl<'top> RawValueLiteral for LazyRawBinaryValue<'top> {}
impl<'top> RawValueLiteral for LazyRawAnyValue<'top> {}
2 changes: 1 addition & 1 deletion src/lazy/expanded/macro_table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ impl Default for MacroTable {
impl MacroTable {
pub fn new() -> Self {
let macros_by_id = vec![MacroKind::Void, MacroKind::Values, MacroKind::MakeString];
let mut macros_by_name = HashMap::new();
let mut macros_by_name = HashMap::default();
for (id, kind) in macros_by_id.iter().enumerate() {
macros_by_name.insert(kind.name().to_string(), id);
}
Expand Down
Loading
Loading