Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added pipeline reduction by value #62

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

godefv
Copy link

@godefv godefv commented Oct 14, 2020

According to issue #33, here is a proof of concept of pipeline reduction by value.

Only pipes::fork and the new pipes::to_<Container> are currently returning pipelines, but it is very easy to implement new ones or adapt existing ones.

For now, I have done the simplest implementation of pipes::fork, which returns a std::tuple in which none values must be std::ignored. As discussed, this interface is not the only one possible, and I can adapt the code if another interface is chosen.

@joboccara
Copy link
Owner

This is an interesting implementation of your idea. For the moment I understand this is a POC.
Note that the build is failing on clang (example of one job: https://travis-ci.org/github/joboccara/pipes/jobs/735858359)

@godefv
Copy link
Author

godefv commented Oct 15, 2020

clang fails because void_t is c++17. I will fix it.

It is a POC because it is minimal (I haven't implemented the other requested reduce operations), and because the API for pipes::fork may change. Also, more test cases could be added.

I think that the discussion can progress further by having a working implementation at hand.

@fried-water
Copy link
Contributor

I was working on something similar (https://github.com/fried-water/pipes/commit/ce3181e1e84b02c605b06404a8d75265cf651d28). Seems like we arrived at the same basic idea. I spent a bit more time trying to minimize the moves and copies that occurred throughout. Additionally it seemed nice to have an extra overload to use the accumulator as an lvalue like so.

const std::vector<int> v = {1, 2, 3};
    
auto accumulator = pipes::accumulate(0, [](int acc, int ele) { return acc + ele; });

REQUIRE((v >>= accumulator) == 6);
REQUIRE((v >>= accumulator) == 12);

@godefv
Copy link
Author

godefv commented Oct 18, 2020

It is indeed very close. I can see that you have had the same problem as me with std::copy in operator.hpp.

Here are my comments on the few differences we have :

  1. The name sink() is less clear to me that the more explicit move_reduced_value_from().
  2. Returning the reduced value by copy when your pipeline is an lvalue is a design choice, which is not the behaviour I would expect. I would assign a pipeline to an lvalue if I want to use the same function several times, not to aggregate results of several calls. Moreover, the pipe style is usually for functional programming, so in my opinion it should be as stateless as possible.
  3. For some reason, I didn't want to put a default sink() in the base class, but it seems ok to do so.
  4. Using a custom type instead of std::ignore is good, but I would not name it void_t because this name is used for a different puprose in the standard library.
  5. to_<Container> is more generic than to_vector<T>
  6. Implementing to_<Container> with a call to accumulate looks like a good idea but, at least with your implementation, it causes two unnecessary moves of the Container per insertion.

About the additional moves, it is a separate topic. But I don't expect it to change anything in almost all use cases due to copy elision and the fact that most (all ?) rvalue pipelines don't have a useful move constructor. It doesn't seem to hurt either.

@fried-water
Copy link
Contributor

  1. Returning the reduced value by copy when your pipeline is an lvalue is a design choice, which is not the behaviour I would expect. I would assign a pipeline to an lvalue if I want to use the same function several times, not to aggregate results of several calls. Moreover, the pipe style is usually for functional programming, so in my opinion it should be as stateless as possible.

I would expect the move version to only be overloaded for rvalues, as moving an lvalue would be very unexpected as a user. At that point the only options with lvalues is either not compile or copy. Seems like there is no downside to offer the option.

  1. Using a custom type instead of std::ignore is good, but I would not name it void_t because this name is used for a different puprose in the standard library.

Yeah I need a different name.

  1. to_ is more generic than to_vector

Yeah I hadn't gotten around to properly sfinaeing based on insert().

  1. Implementing to_ with a call to accumulate looks like a good idea but, at least with your implementation, it causes two unnecessary moves of the Container per insertion.

Yeah that's a good point.

Overall I think its important to note that accumulator pipelines are no longer output iterators. Accumulators that return move-only values will no longer be copyable, and passing accumulators with types that aren't cheap to copy (eg. containers) to functions that expect cheap to copy iterators seems like a bad idea. That's why the std::copy() in operators had to be replaced.

@godefv
Copy link
Author

godefv commented Oct 18, 2020

Ok, the only point where our views differ is the move behaviour of returning pipelines.

You do

const std::vector<int> v = {1, 2, 3};
    
auto accumulator = pipes::accumulate(0, [](int acc, int ele) { return acc + ele; });

REQUIRE((v >>= accumulator) == 6);
REQUIRE((v >>= accumulator) == 12);

whereas I would do

const std::vector<int> v = {1, 2, 3};
    
auto accumulator = pipes::accumulate(0, [](int acc, int ele) { return acc + ele; });

REQUIRE((v >>= accumulator) == 6);
REQUIRE((v >>= accumulator) == 6);

because I view accumulator as a pipe version of a pure (stateless) function, whereas you view it as a stateful object.

In my view, I don't move from an lvalue, I return from a function.

They are actually two different end pipes (we could have both).

@fried-water
Copy link
Contributor

Right so in the more common rvalue case, it won't make a difference.

REQUIRE((v >>= pipes::accumulate(0, [](int acc, int ele) { return acc + ele; })) == 6);

I can see your point about the accumulator appearing stateless from the users perspective. But internally the accumulator has to store some state, hiding that would result in surprises when trying to use a const accumulator. It either won't compile or the internal state needs to marked mutable, which might lead to surprises if attempted to use in a multithreaded context.

const std::vector<int> v = {1, 2, 3};
const auto accumulator = pipes::to_<std::vector<int>>();

// This would be assumed to work if accumulator is indeed stateless
REQUIRE((v >> accumulator ) == v);

@godefv
Copy link
Author

godefv commented Oct 18, 2020

What you say is true. However, as a user, I don't see it as a big issue to do the following without the const.

const std::vector<int> v = {1, 2, 3};
auto functional_pipeline = pipes::transform(f) >>= pipes::to_<std::vector<int>>();

REQUIRE((v >>= functional_pipeline ) == v);

Like that, I don't hide that there is a state internally, and there is no surprise. But I can use it as a function.

If you want the const, you can use a lambda, which actually uses to the rvalue case.

const std::vector<int> v = {1, 2, 3};
auto const functional_pipeline = [](){retrun pipes::transform(f) >>= pipes::to_<std::vector<int>>()};

// This would be assumed to work if accumulator is indeed stateless
REQUIRE((v >>= functional_pipeline() ) == v);

This lead me to a new idea : making pipes::push_back return its reference to the container

const std::vector<int> v = {1, 2, 3};
REQUIRE((v >>= pipes::push_back(std::vector<int>{}) ) == v); // same as using pipes::to_<std::vector<int>>

auto const functional_pipeline = [f](std::vector<int>&& container){
    static auto const fixed_pipeline=pipes::transform(f);
    return  fixed_pipeline >>= pipes::push_back(std::move(container)); 
}; 
REQUIRE((v >>= functional_pipeline({}) ) == v); 

// same pipeline can be used to accumulate results like your accumulate
const std::vector<int> v2 = {4, 5, 6};
std::vector<int> results = {};
v  >>= functional_pipeline(results);
v2 >>= functional_pipeline(results);

auto stateful_pipeline = pipes::push_back(std::vector<int>{}); // forbidden, would cause a dangling reference

In general, you can always explicitly accumulate to the same result like above.
For me, implicitly accumulating where I don't expect it is where I will be surprised.

@fried-water
Copy link
Contributor

fried-water commented Oct 18, 2020

because I view accumulator as a pipe version of a pure (stateless) function

Fundamentally my problem is that this is not true in the current implementation. Otherwise pipes::to_<std::vector<std::unique_ptr<int>>>() would be copyable. Having the interface pretend otherwise seems like it will lead to problems, the inability to use const being one of them.

I wonder if it could be implemented in a stateless way somehow by having the accumulated value on the stack in the operator>>= function instead of in the accumulator pipe itself. I haven't thought through how that would work or if its even possible. But in that case the pipeline would be stateless (outside of storing the functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants