Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing setUnion in C++. #3970

Open
Xinlu-Y opened this issue Sep 20, 2024 · 4 comments
Open

Implementing setUnion in C++. #3970

Xinlu-Y opened this issue Sep 20, 2024 · 4 comments
Labels

Comments

@Xinlu-Y
Copy link
Collaborator

Xinlu-Y commented Sep 20, 2024

instance Set CppSrcCode where
contains = CP.containsInt cppIndex cppIterEnd
setAdd = CP.setMethodCall cppListAdd
setRemove = CP.setMethodCall cppListRemove
setUnion = error "not done yet"

At the moment, setUnion is not implemented in C++ yet. The standard way to compute a set union is via std::set_union, but this function does not modify sets in-place. Instead, it computes the union of two sets and stores the result in a third set.

Is there a recommended way in C++ to perform an in-place set union similar to Julia's union! or Java's addAll?
I found that using the insert method (e.g., a.insert(b.begin(), b.end());) can achieve an in-place union by adding elements from one set to another directly. To implement this, I need to iterate over the second set and insert each element into the first set. How can I achieve this iteration and insertion using a loop, such as ForEach, to go from b.begin() to b.end() and add the elements into set a?

@JacquesCarette
Copy link
Owner

I don't think it is a strong requirement to do set union in-place. I think std::set_union is just fine, and can be replaced at a later time with more imperative code if the need truly arises.

@balacij
Copy link
Collaborator

balacij commented Sep 27, 2024

Aside: out of curiosity, why does the definition of contains refer to something called containsInt? Are we currently only able to handle membership checking for integers?

C++ std::set_union has an interesting definition. Some notes:

  1. it expects that we give it two iterators for each set: one iterator starting from the first element (i.e., begin), and another that starts just past the last element (i.e., end),
  2. the return value is an output iterator that points an imaginary element past the last element in the new set,
  3. it expects an 'output iterator' (i.e., the data writer). I assume we'll need to use the same one for a set, specifically a std::inserter because that one preserves order of the set (note: sets in C++ are ordered). However, in order for us build an inserter, we need references to the set that we're inserting into. So, it looks like we need to have a variable pre-defined for each set union, so that we can insert into it. (Am I getting that right, @Xinlu-Y ?)

Some comments on each:

  1. The syntax is going to be noisy.
  2. We will struggle to generate code for expressions like $e \in (P \cup Q)$ because the hypothetical 'return' of $P \cup Q$ (as translated to set_union) will not be the set itself, but an iterator that points to one past the last element in the set.
  3. Again, the syntax is going to be noisy.

I think that (2) is a fundamental conflict with how we generate code. To generate code for $e \in (P \cup Q)$, we would need to move the entirety of $P \cup Q$ into an earlier expression that creates a temporary result set, performs the set union, and then we can continue to $e \in s$. It would mean that for each set union operation we have in an expression, we need to create a variable above wherever that expression is. That does not quite seem human-readable/traceable anymore. So, maybe we can write a little function that helps us polyfill the function with the right syntax we want? e.g.,

template <typename T>
std::set<T> union_sets(const std::set<T>& s1, const std::set<T>& s2) {
    std::set<T> result;
    std::set_union(s1.begin(), s1.end(), s2.begin(), s2.end(),
                   std::inserter(result, result.begin()));
    return result;
}

This function will allow us to get rid of some of the syntactic noise of the generated code. Furthermore, I noticed that we we use iterators for set containment as well (i.e., not very human-readable for our purposes), I think we can do the same for that as well, e.g.,:

template <typename T>
bool set_contains(const std::set<T>& s, T e) {
    return s.find(e) != s.end();
}

Wrapping everything together into a complete snippet:

#include <iostream>
#include <set>
#include <algorithm>
#include <iterator>

template <typename T>
std::set<T> union_sets(const std::set<T>& s1, const std::set<T>& s2) {
    std::set<T> result;
    std::set_union(s1.begin(), s1.end(), s2.begin(), s2.end(),
                   std::inserter(result, result.begin()));
    return result;
}

template <typename T>
bool set_contains(const std::set<T>& s, T e) {
    return s.find(e) != s.end();
}

int main() {
    std::set<int> s1 = {1, 2, 3, 4, 5};
    std::set<int> s2 = {4, 5, 6, 7, 8};

    int e = 5;

    if (set_contains(union_sets(s1, s2), e)) {
        std::cout << "Yes" << std::endl;
    } else {
        std::cout << "No" << std::endl;
    }

    return 0;
}

I think that this is more representative of the kind of code we want to generate. That being said, I'm taking a very biased stance in what kind of semantics I'm assuming we want out of the set API in GOOL. So, my question is: what do we want out of the set API in GOOL? What should these functions translate to in each of the languages, semantically?

For example, "set addition" is a rather uncommon thing (at least in my experience...) -- I'm not quite sure how we would even go about writing that, we usually would see something like $S \cup \set{e}$ (i.e., set union). Similarly "set removal" is uncommon too -- one would normally see $S \ \set{e}$ (i.e., set difference). For Expr and ModelExpr, I'm not sure if we should even have these operators. "Set addition" does make sense on the side of GOOL, however, because that usually relates to an imperative-statement, likely a 'void' or 'boolean' function/method, unlike in mathematics where the result of the operation would be another set. For example, in Java, set add returns a boolean, C# similarly, while in Python, s.add(x) returns nothing (None, technically).

We have a pipeline of expressions: Expr -> CodeExpr -> GOOL (/GProc now?). Now we need to figure out what 'set' means to each of these stages and what kinds of operations belong to each (and what those operations do).

@Xinlu-Y
Copy link
Collaborator Author

Xinlu-Y commented Sep 30, 2024

The function containsInt does indeed implement the contains functionality for sets in C++ using the find and end methods. It supports not only the Integer type but also other types that GOOL supports. Therefore, I believe it is just a naming typo.

containsInt :: (OORenderSym r) => Label -> Label -> SValue r -> SValue r -> SValue r
containsInt f fn s v = contains f s v ?!= IG.objAccess s (IG.func fn IC.bool [])

Currently, our basic set tests do not include functionality to test union operations. However, to ensure consistency across different languages, the union function in C++ should be translated in a way that aligns with its implementation in other languages.

  • In C#, map to the UnionWith method, which directly merges the elements of another set into the current set.
  • In Java, map to the addAll method, which adds all elements from another collection to the current set.
  • In Julia, map to the union! function, which modifies the first set in place, merging it with elements from the second set.
  • In Python and Swift, map to the union method, which returns a new set containing all elements from both sets.

Given this, to maintain semantic consistency across different languages, I think we need to implement a similar imperative-style union function for C++ as well. This function should behave like its counterparts in other languages, allowing direct modification rather than requiring the use of iterators or creating additional intermediate sets.

If we implement custom functions for set operations in C++, we align the behavior with that of other languages like C#, Java, Python, etc. This means that when we generate code for C++, these tests will work correctly just as they do in other languages, providing reliable and readable code.

  setDecDef (var "s" (setType int)) mainFn (litSet int [litInt 4, litInt 7, litInt 5]),
  valStmt (setAdd (valueOf (var "s" (setType int))) (litInt 10)),
  assert (contains (valueOf (var "s" (setType int))) (litInt 10))
    (litString "Set s should contain 10 after setAdd")
  setDecDef (var "t" (setType int)) mainFn (litSet int [litInt 3, litInt 5]),
  setDecDef (var "unionSet" (setType int)) mainFn (setUnion (valueOf (var "s" (setType int))) (valueOf (var "t" (setType int)))),
  assert (contains (valueOf (var "unionSet" (setType int))) (litInt 10))
    (litString "Set unionSet should contain 10 from set s"),

@balacij
Copy link
Collaborator

balacij commented Sep 30, 2024

Great!

Thank you! This clarifies a lot for me!

In that case, I think that Java deviates from the rest because addAll is a boolean-returning method, while the others return a reference to a new set (right?).

I think that generating these polyfill functions is the right way to go to keep the code looking the same across generated code bases. @JacquesCarette @smiths Do you have any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants