Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way for sets to zipper with more things #26595

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

lydia-duncan
Copy link
Member

@lydia-duncan lydia-duncan commented Jan 23, 2025

Prior to this work, sets could only zipper with identical other sets (or basically just itself). Zippering with sets of the same length, or with arrays, would result in errors about unequal zippering lengths.

Resolves #15824

Adds a leader/follower pair of iterators to ChapelHashtable to support this effort. These iterators and their support functions allow the hashtable to evenly divide the elements stored in it, instead of evenly dividing the entire hashtable's storage capacity.

While here, I fixed a comment that was missing a closing parenthesis.

New tests:

  • Added a version of setZipDifferentSize.chpl that reversed the order, to show that it still worked
  • Added a future test of zippering with an array leader that is the same length as the set
  • Added a future test of zippering with an array follower that is the same length as the set
  • Added a test of zippering with an array leader that is shorter than the set
  • Added a future test of zippering with an array follower that is shorter than the set
  • Added a test of zippering two sets with different contents (basically, David Longnecker's original test from Zippering of two sets can fail even when they are the same length #15824)
  • Added a test of zippering an array with the set's toArray result (user code, but not expected to have changed as a result of this effort)

Places for improvement:

  • It currently iterates over the whole backing hashtable for each follower. Ideally we'd iterate over it once and store the location of all filled elements somewhere helpful, but that's a problem for another time

Passed a full paratest with futures

Adds a pair of leader/follower iterators to ChapelHashtable and a helper method,
and calls them from the Set `these` iterators.

The implementation does frequent recomputation of how to divide up the iteration
space to find full slots.  This can probably be improved on, but I wanted to
take a snapshot of what worked before attempting it.

It also does not help when zippering with arrays yet.  That is also a next step

----
Signed-off-by: Lydia Duncan <[email protected]>
Test comes from issue chapel-lang#15824 and checks zippering two sets of the same length
(but different contents).  Sorts the output so that it is consistent instead of
potentially racy.

----
Signed-off-by: Lydia Duncan <[email protected]>
This is a first step towards being able to zip sets with other types instead of
just other sets

----
Signed-off-by: Lydia Duncan <[email protected]>
This meant always computing the number of chunks and which chunk we were
ourselves, since the array iterator expects the other components of the tuple to
be ranges as well (so we can't use followThis to send that information) and the
argument list for the follower is supposed to match the argument list of the
leader aside from followThis.  This is unfortunate, but given the sizes I
checked, seemed to be okay.

----
Signed-off-by: Lydia Duncan <[email protected]>
Covers:
- converting the set to an array and then zippering
- zippering with the array leading
- zippering with an array when the set leads

With the prior commit, all three work (though toArray already worked prior to
this effort).

----
Signed-off-by: Lydia Duncan <[email protected]>
The original is only failing on one of the two machines I checked, so I'm
curious if this also fails there or if it behaves as expected (it passes on the
machine where the original still works)

----
Signed-off-by: Lydia Duncan <[email protected]>
I'm not sure if this will actually allow us to check for unequal lengths, but
try it and see

----
Signed-off-by: Lydia Duncan <[email protected]>
This argument indicates the size of the leader in the iteration, so that we can
validate if the iteration space is uneven.  It has a default value so that the
non-zippered case is not as annoying to write

Update various tests to use this iterator instead

----
Signed-off-by: Lydia Duncan <[email protected]>
- document new iterator
- remove commented out writelns
- restore old behavior to Set's these iterators, since we're using a different
  iterator
- comment an if statement in _determineEvenChunks
- close a parenthetical in an older comment in ChapelHashtable

----
Signed-off-by: Lydia Duncan <[email protected]>
It'll present a cleaner interface to users

----
Signed-off-by: Lydia Duncan <[email protected]>
Covers:
- Zippering with a shorter array as leader
- Zippering with a shorter array as follower
- Serial version of the new iterator
- Standalone version of the new iterator

----
Signed-off-by: Lydia Duncan <[email protected]>
…f it

Would have added a warning to all the overloads, except for the bug reported in
all the right contexts, whether the bug or the unstable warning gets resolved
first.

----
Signed-off-by: Lydia Duncan <[email protected]>
Use a halt wrapper instead of `__primitive("chpl_error"`, especially because
this is a new error instead of trying to match errors that the compiler can
insert.

While doing that, made the error messages clearer about what was wrong and what
should be done instead (and hopefully head off confusion for the follower case).
And add a test locking in the error message in the leader case

----
Signed-off-by: Lydia Duncan <[email protected]>
modules/standard/Set.chpl Outdated Show resolved Hide resolved
@bradcray
Copy link
Member

Capturing my thoughts after seeing a demo of this today:

  • I think it'd be preferable to have the set.these() iterator support this kind of zippering than to rely on a new iterator that takes the size explicitly
  • The case where a leader set is smaller than the follower seems like an instance of zippered forall loops with size mismatches can silently drop iterations on the floor #11428 and a general problem with our leader-follower framework rather than something we should worry about solving specifically for sets
    • this would cause a change in behavior for test/library/standard/Set/these/setZipDifferentSize.chpl, but I think that's acceptable since (a) it's a case that doesn't "work" today anyway (in the sense that it causes a halt) and (b) cases that ought to work today like the one in Zippering of two sets can fail even when they are the same length #15824 don't. That is, I think the bug fix trumps the halt.
    • if a user wanted to zip two sets safely/correctly, I'd advise them to assert the two set sizes are equal before doing the zippering rather than using the .contents() iterator
  • I'd propose having the set's leader simply invoke the range's leader on 0..<size rather than recreating that logic

Some argued that since a set yields its elements arbitrarily, sets shouldn't even support zippering. I see that point (and started with that mindset myself), but think that the presence of #15824 suggests that people may want that anyway, and it also supports things like copying a set into an array when you don't care about ordering (or haven't until wanting to put it into the array).

@lydia-duncan
Copy link
Member Author

  • I'd propose having the set's leader simply invoke the range's leader on 0..<size rather than recreating that logic

Interestingly, it looks as though ranges follow a different way of distributing uneven chunks than I was expecting, and this can result in the current implementation missing values from the set.

For a range of 10 elements with 6 tasks, the chunks will be [0..1, 2..3, 4..4, 5..6, 7..8, 9..9], instead of [0..1, 2..3, 4..5, 6..7, 8..8, 9..9] like I was expecting, meaning one element of the set gets printed twice and another element gets dropped entirely. I need to double check that this doesn't mean arrays/domains with the same size/tasks combo will encounter the same issue, and if so adjust the follower accordingly

@bradcray
Copy link
Member

Could you summarize the follower's approach using pseudocode?

@lydia-duncan
Copy link
Member Author

I think I need to limit the scope back down to only "sets can zip with sets". I did encounter the same issue with arrays of that same size/number of tasks and the attempts to solve it so far have been running into walls.

While I can use the range leader to get the chunk divisions when in the set/hashtable leader, trying to use it during the follower to determine how to divide up the slots gives a different answer (I'm pretty sure because when we're in the follower, the iterator understandably thinks there are a different number of tasks available to do parallel iteration). I'm not sure how to solve that without sending more information between the leader and the follower.

Additionally, while I was trying to do that, I encountered several issues when trying to deal with arrays of ranges. I'm going to write them up separately, but with scaling down the scope, they won't be as relevant

@lydia-duncan
Copy link
Member Author

lydia-duncan commented Jan 29, 2025

Could you summarize the follower's approach using pseudocode?

Here's the word problem description of it, will write up better pseudocode once I de-ice my driveway

// determine the number of elements to return from the followThis argument (size of the range)

// extrapolate from that number the number of expected chunks (basically, solve for x in totalElements = elements in this chunk * x)
// based on where the followThis range starts, guess which chunk we are referring to in order to determine where to start looking for elements to return (divide the starting location by the chunk size to see where we start)

// then go through and determine the range of slots needed to find each chunk of elements to return. The set up when I made this PR basically assumed a known threshold point where the number of elements in a chunk would change (the idea for this was based on the parIters primer). So for 10 elements and 6 tasks, we know that 6 divides 10 once, with a remainder of 4, so four tasks would take 2 elements and the remaining 2 would take 1 element.

// So iterate over the slots in the hashtable, storing the point at which each threshold would be matched (2, 4, 6, 8, 9), except for the last one which is known to be "end of the table". Front-loading the tasks that take more to the beginning allowed us to mathematically compute when to stop looking for a second element - once we have filled "remainder" number of slots, we will look for the lower number of elements.

// Finishing that iteration gives us the upper bound for each task's range of elements in an array, except for the last task (which again, will be the end of the table). E.g., for a hashtable with 25 slots, this could look like [2, 6, 8, 13, 17], which would give us ranges of 0..2, 3..6, 7..8, 9..13, 14..17, 18..24 to find the full slots.

// To determine which of those ranges this task should actually use, go back to our guess at the chunk based on the lower bound for the element number to return. If we are task 3 (when the tasks are 0 based), for instance, the followThis would be 6..7, the chunk size is 2, and 6 divided by 2 is 3, so we have determined which task we are and will return all elements in 9..13

(Edit: note that once we get to actual tasks 4 and 5, the chunk size would be 1. But since our starting location was expected to be mathematically computable and have a specific threshold where it switches, this won't pose a problem. A chunk size of 1 will lead to every element having their own subtask, and our starting element number would be 8 and 9, so we'd act as though we were task 8 and task 9, which are the last two chunks anyways).

@lydia-duncan
Copy link
Member Author

// So iterate over the slots in the hashtable, storing the point at which each threshold would be matched (2, 4, 6, 8, 9), except for the last one which is known to be "end of the table". Front-loading the tasks that take more to the beginning allowed us to mathematically compute when to stop looking for a second element - once we have filled "remainder" number of slots, we will look for the lower number of elements.

This is the part that needs better pseudocode.

// save the minimum number of elements for each task.
// save the number of tasks that will have one additional element.

// make an array to store the upper bounds of all tasks except the
// last one (which will use the end of the hashtable as its upper bound).

// track our position in the upper bounds array.
// track the number of elements we've actually seen.

   for i in `the whole hashtable` {
//    increment our number of seen elements, if we found one.

//    check if we've filled the upper bounds array yet or found the last element
//         early exit the loop if so

//    otherwise, check if our position in the upper bounds array is less than the number of tasks which will have an additional element.
//         if so, our threshold to store is `minimum number per task * (position in upper bounds array + 1)  + (position in upper bounds array + 1)`.
//             store the location we met that threshold in the upper bounds array
//             update our position in the upper bounds array

//         if not, our threshold to store is `minimum number per task * (position in upper bounds array + 1) + remainder`.
//             store the location we met that threshold in the upper bounds array
//             update our position in the upper bounds array
    }

We now have all the start and end points for tasks to find each element evenly. They are 0, each element in the upper bounds array, the end of the hashtable's storage. E.g., for a hashtable with 25 slots, the upper bounds array could look like [2, 6, 8, 13, 17], which would give us ranges of 0..2, 3..6, 7..8, 9..13, 14..17, 18..24 to find the full slots.

This reverts commit 09a4bca.

Partial change, but seemed like the natural way to do that part

----
Signed-off-by: Lydia Duncan <[email protected]>
- Replace uses of the now-defunct contents iterator with the these iterator
- Drop the tests that were explicitly only for contents (serial, standalone, and
  the error message test on the argument)
- Futurize the tests of zippering with arrays
- Add a future for zippering with a longer array
- Add two futures of zippering with ranges

Ranges and arrays don't work with the restored strategy (but they didn't always
work with the previous one, as demonstrated by the range and longer array tests)

----
Signed-off-by: Lydia Duncan <[email protected]>
withRange1's behavior is inconsistent

----
Signed-off-by: Lydia Duncan <[email protected]>
----
Signed-off-by: Lydia Duncan <[email protected]>
@lydia-duncan lydia-duncan marked this pull request as ready for review January 30, 2025 17:42
@lydia-duncan lydia-duncan changed the title Provide a way for sets to zipper with more things Provide a way for sets to zipper with other sets Jan 30, 2025
@bradcray
Copy link
Member

To avoid making any sort of assumptions about what the leader is passing us / what we're zippering with, I'd suggest going for simplicity and correctness over performance in a first implementation for the set follower iterator. For example, if I'm asked to follow lo..hi, I'd start from the beginning of the set, iterate through "full" slots, counting them, until I hit slot # lo, then yield and continue until I hit slot # hi, then exit (if the leader's range is strided, this would also need to skip every 's' elements). This is obviously very serial, and worse, O(size) for the last follower, but since sets aren't distributed (so will only be following #cores tasks at most, typically), it may not be the end of the world for many cases. I may also be positing that many sets may not be super-huge.

For more scalable approaches, my mind goes to things like having some sort of directory structure within the set / hash table that (lazily?) maintains some sort of rough sense of where items live, for example, through a computed-on-demand map, a scan operation, or the like. This problem also seems very equivalent to the kind of problem Jeremiah was wrestling with in the ParallelIO module: I want to read a file in parallel, but don't know what offset task # i of n should start at since the "rows" of a file (say) aren't all of equal size. I can't recall what approach he took there, and felt similarly like it was very challenging to parallelize, but remember being convinced that what he ended up with was sound (I think? Though it worries me that I can't remember it).

But again, I'd prefer to have it working correctly and flexibly first, and then worry about performance in a follow-on effort.

Copy link
Member

@benharsh benharsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code as-written looks good to me, but I'm inclined to agree with Brad about focusing on simplicity and correctness.

@lydia-duncan
Copy link
Member Author

To avoid making any sort of assumptions about what the leader is passing us / what we're zippering with, I'd suggest going for simplicity and correctness over performance in a first implementation for the set follower iterator. For example, if I'm asked to follow lo..hi, I'd start from the beginning of the set, iterate through "full" slots, counting them, until I hit slot # lo, then yield and continue until I hit slot # hi, then exit (if the leader's range is strided, this would also need to skip every 's' elements). This is obviously very serial, and worse, O(size) for the last follower, but since sets aren't distributed (so will only be following #cores tasks at most, typically), it may not be the end of the world for many cases. I may also be positing that many sets may not be super-huge.

Hmm, to be fair, that's both simpler and probably better performance than what I had implemented. I'll give it a shot and see where it gets me, thanks!

Brad pointed out that we could just count through the filled elements of the
hashtable until we reach the count specified by the `low` value, and then start
yielding.  This is more performant than what I had been doing and it allows more
flexibility, enabling it to actually work with ranges and arrays

The one drawback is that it does mean we don't detect when the leader is shorter
than the follower, but that's already true for arrays so we'll at least be
consistent with them.

----
Signed-off-by: Lydia Duncan <[email protected]>
- Updates the expected behavior for setZipDifferentSize.chpl and
arrayShorter.chpl
- Removes the .bad and .future files for the tests that now work
- Adds a .prediff I missed for the second range test because the test wasn't
working before
- Updates the expected output for withArrayLonger1.chpl, again because the test
wasn't working before

----
Signed-off-by: Lydia Duncan <[email protected]>
@lydia-duncan
Copy link
Member Author

Okay, it looks like that does result in us failing to detect when the leader is shorter than the follower, but we already said that was okay because it was consistent with what arrays do. Running a paratest now, but otherwise I think this is ready for another look (sorry Ben XD)

@lydia-duncan lydia-duncan changed the title Provide a way for sets to zipper with other sets Provide a way for sets to zipper with more things Jan 30, 2025
@lydia-duncan lydia-duncan requested a review from benharsh January 31, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zippering of two sets can fail even when they are the same length
3 participants