fix: solve deep recursion issues on some ZDag functions #1183

arthurpaulino · 2024-02-28T22:20:00Z

Now that we're sending Lurk data through the wire, it's important that we don't hit recursion depth limits on arbitrarily deep data.

This PR closes #1021

src/lem/store.rs

src/cli/zstore.rs

huitseeker · 2024-03-04T16:12:00Z

src/cli/zstore.rs

+        macro_rules! feed_loop {
+            ($x:expr) => {
+                if !cache.contains_key(&$x) {
+                    if ptrs.insert($x) {
+                        stack.push($x);
+                    }
+                }
+            };
+        }


Is it wise to collect a transitive depth-first traversal from ptr, and only recurse afterwards? That is, I expect the ptrs structure could get really huge.

Another way to proceed to remove the recursion limit would be indeed to use a stack, but change the signature of recurse from recurse(ptr, cache) to recurse(stack, cache). Then in each call, you'd pop an item off the stack, do what you already do in recurse(ptr, cache) and populate the stack with the same feed_loop logic used here below. I think this might lead to the same order of processing, but would not require materializing the whole pointer graph in a stack ahead of time (leading to nicer memory usage).

Is there a reason to do this full initial graph traversal?

The reason is that we need to go from leaves to the root. This is similar to what happens during hydration: in order to hash a Ptr, we need the hashes of its children.

Then in each call, you'd pop an item off the stack, do what you already do in recurse(ptr, cache)

That's not possible because of what I said above. The reason to do the initial graph traversal is to populate the cache with the hashes of all children that are required.

I have no problem with the depth-first search. I believe that for each node that has two siblings (which you'll deal with one after the other), before recursing, you will materialize in a data structure all the descendants of both siblings, rather than the descendants of just the one of the two you will process first.

I believe that if you switch between stacking more elements and processing those elements, you can get to better memory utilization, more cache hits, and potentially marginally less code.

recurse doesn't call recurse. I don't think this change you want is this simple. It would require rethinking the whole implementation

@huitseeker let me try to give some perspective to this.

We can't solve this recursively otherwise we incur in the original problem of recursion depth. The solution has to feed the cache iteratively, making every call to recurse trivial (every child will be cached).

On the memory issue, I think it's really minor. The memory consumption involved in populating those IndexMaps is probably irrelevant in comparison with the bigger problem we're solving, which is proving. Also, the entirety of the dag will have to end up inhabiting the Store (or the ZStore, depending on the direction we're going) anyway.

Nevertheless, I opened #1197 so we can deal with that later. For now, I don't think this is worth more cycles. Please let me know what you think.

huitseeker · 2024-03-04T16:12:25Z

src/cli/zstore.rs

@@ -191,10 +236,33 @@ impl<F: LurkField> ZDag<F> {
                Ok(ptr)
            }
        };
-        recurse(z_ptr)
+        let mut z_ptrs: IndexSet<&ZPtr<F>> = IndexSet::default();


Roughly the same comments as above apply.

arthurpaulino · 2024-03-04T16:59:09Z

@huitseeker I've updated the PR to use a better name for what I had previously called ptrs

huitseeker

Agreed

PR #1183 didn't include the logic for environments when collecting the children of pointers. We didn't see the error because our tests weren't checking for environment roundtrip through Z data. Here we fix that bug and also enhance the tests to avoid future regressions.

It's safer to use a new variant `RawPtr::Env` to signal raw pointers for environments than relying solely on tag checks. This would probably have avoided the bug introduced in #1183, whose fix was implemented in #1200

arthurpaulino requested review from a team as code owners February 28, 2024 22:20

arthurpaulino commented Feb 28, 2024

View reviewed changes

src/lem/store.rs Outdated Show resolved Hide resolved

arthurpaulino enabled auto-merge March 2, 2024 14:17

huitseeker reviewed Mar 4, 2024

View reviewed changes

arthurpaulino force-pushed the ap/fix-z-dag-deep-recursions branch from 057aa42 to bc052d9 Compare March 4, 2024 16:57

arthurpaulino force-pushed the ap/fix-z-dag-deep-recursions branch 2 times, most recently from 7e13ee5 to c63f828 Compare March 5, 2024 16:02

arthurpaulino mentioned this pull request Mar 6, 2024

Optimize memory usage in ZDag functions #1197

Open

fix: solve deep recursion issues on some ZDag functions

c72f63d

arthurpaulino force-pushed the ap/fix-z-dag-deep-recursions branch from c63f828 to c72f63d Compare March 6, 2024 14:07

huitseeker approved these changes Mar 6, 2024

View reviewed changes

arthurpaulino added this pull request to the merge queue Mar 6, 2024

Merged via the queue into main with commit 013410f Mar 6, 2024
11 checks passed

arthurpaulino deleted the ap/fix-z-dag-deep-recursions branch March 6, 2024 23:22

arthurpaulino mentioned this pull request Mar 7, 2024

fix: make environments roundtrip as Z data #1200

Merged

arthurpaulino mentioned this pull request Mar 7, 2024

chore: use RawPtr::Env to signal raw pointers for environments #1203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: solve deep recursion issues on some ZDag functions #1183

fix: solve deep recursion issues on some ZDag functions #1183

arthurpaulino commented Feb 28, 2024

huitseeker Mar 4, 2024

arthurpaulino Mar 4, 2024

huitseeker Mar 5, 2024

arthurpaulino Mar 5, 2024

arthurpaulino Mar 6, 2024 •

edited

Loading

huitseeker Mar 4, 2024

arthurpaulino commented Mar 4, 2024

huitseeker left a comment

fix: solve deep recursion issues on some ZDag functions #1183

fix: solve deep recursion issues on some ZDag functions #1183

Conversation

arthurpaulino commented Feb 28, 2024

huitseeker Mar 4, 2024

Choose a reason for hiding this comment

arthurpaulino Mar 4, 2024

Choose a reason for hiding this comment

huitseeker Mar 5, 2024

Choose a reason for hiding this comment

arthurpaulino Mar 5, 2024

Choose a reason for hiding this comment

arthurpaulino Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

huitseeker Mar 4, 2024

Choose a reason for hiding this comment

arthurpaulino commented Mar 4, 2024

huitseeker left a comment

Choose a reason for hiding this comment

arthurpaulino Mar 6, 2024 •

edited

Loading