Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework completely PE and lower2cff #95

Open
madmann91 opened this issue Aug 7, 2018 · 3 comments
Open

Rework completely PE and lower2cff #95

madmann91 opened this issue Aug 7, 2018 · 3 comments
Assignees

Comments

@madmann91
Copy link
Contributor

Right now, there is the PE phase that does partial evaluation and lower2cff at the same time. This has some issues, as pointed out before with issue #93. However, even with a fix for that, there is a more fundamental problem here. Consider the following code:

extern "C" { fn rand() -> i32; }

fn range(a: i32, b: i32, body: fn (i32) -> ()) -> () {
    if a < b {
        body(a);
        range(a + 1, b, body)
    }
}

fn something(pred: fn (i32) -> bool) -> bool {
    pred(rand()) && (something(pred) || something(pred))
}

extern fn boom(n: i32, res: &mut [bool]) -> () {
    for i in range(0, 8) {
        res(i) = something(@ |i| i < n);
    }
}

The function something uses a predicate function pred, which at the call site in function boom happens to be an anonymous function capturing n, a parameter from boom. Now the problem is that something can be specialized to that predicate without any issue (with the fix for issue #93), but it then makes it depend on the parameter n (because the predicate refers to n). No amount of specialization can solve that problem: What we need to do is to lift the function something from its scope after specialization, which means adding a parameter that takes a value for n. In order to do this, it makes sense to decouple PE from lower2cff (even though some code can be shared), since lower2cff requires more work.

Right now, this example hangs on both master and pe_fix. Note that this example is in no way contrived or rare. Such a pattern happens for every divide and conquer algorithm out there (it was encountered while coding a version of the quick sort algorithm).

@madmann91
Copy link
Contributor Author

Right now, people looking for a quick fix can implement such algorithms by managing the stack manually (creating a local mutable array for it). That's however clearly not a long term solution.

@leissa
Copy link
Member

leissa commented Sep 27, 2018

The eurollvm branch has a phase where this lifting is implemented. However, it was never mature enough to get merged into master. I will have a look at this again in the next couple of days. I'm currently in bug-fixing mode :)

@leissa leissa self-assigned this Sep 27, 2018
@madmann91
Copy link
Contributor Author

madmann91 commented Nov 2, 2018

There is also a subtle problem with the partial evaluator as it is now, which I thought might be of interest if you are working on this. The criteria of the partial evaluator that determine when to perform inlining/specialization are incorrect. The problem lies in ConEval::eval():

bool eval(size_t i, bool lower2cff) {
    // the only higher order parameter that is allowed is a single 1st-order fn-parameter of a top-level continuation
    // all other parameters need specialization (lower2cff)
    auto order = callee_->param(i)->order();
    if (lower2cff)
        if(order >= 2 || (order == 1
                    && (!callee_->param(i)->type()->isa<FnType>()
                        || (!callee_->is_returning() || (!is_top_level(callee_)))))) {
        DLOG("bad param({}) {} of continuation {}", i, callee_->param(i), callee_);
        return true;
    }

    return (!callee_->is_external() && callee_->num_uses() == 1) || is_one(instantiate(filter(i)));
}

The condition !callee_->is_external() && callee_->num_uses() == 1 is the culprit. Consider the following code:

fn @f(i: i32) -> () {
    fn g() -> () {
        println(i)
    }
    ($g)()
}
// ...
for i in unroll(0, 10) {
    f(i)
}

The resulting code should generate 10 different g functions and not inline g 10 times, which is what is happening here because g is only called only once and the criterion is hence verified. This issue is also in eta_conversion() (in the file transform/cleanup_world.cpp).

In general, the condition should be:
callee_->num_uses() == 1 && callee_->order() == 1

(i.e. this optimization is only valid for basic blocks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants