Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing step3 #17

Open
matbesancon opened this issue Oct 16, 2021 · 8 comments
Open

Parallelizing step3 #17

matbesancon opened this issue Oct 16, 2021 · 8 comments

Comments

@matbesancon
Copy link

We're using Hungarian.jl in the https://github.com/ZIB-IOL/FrankWolfe.jl package to solve LPs over the Birkhoff polytope.

It scales well in terms of memory footprint but badly in terms of time, mostly because of step3!, where most of the time seems to be spent in this block:

    @inbounds for j in colUncoveredIdx, i in rowUncoveredIdx
        cost = costMat[i, j] + Δcol[j] + Δrow[i]
        if cost <= h
            if cost != h
                h = cost
                empty!(minLocations)
            end
            push!(minLocations, (i, j))
        end
    end

this would be non-trivial to parallelize because of the shared bound h and array minLocations but could be worth it.

@Gnimuc
Copy link
Owner

Gnimuc commented Oct 16, 2021

It scales well in terms of memory footprint but badly in terms of time, mostly because of step3!, where most of the time seems to be spent in this block:

I think the hot-pot depends on the problem size. Yes, for small matrices, step3 is the bottleneck, but I'm not sure how much performance gain we can benefit from multi-threading.

@Gnimuc
Copy link
Owner

Gnimuc commented Oct 16, 2021

Looks like I tried something similar years ago:

@threads for j in colUncoveredIdx

@matbesancon
Copy link
Author

matbesancon commented Oct 16, 2021

Do you get other crititcal points when matrix sizes grow larger? This was for a 1000x1000 dense matrix

@Gnimuc
Copy link
Owner

Gnimuc commented Oct 16, 2021

for ii in CartesianIndices(size(costMat))
# "consider a zero Z of the matrix;"
r, c = ii.I
cost = costMat[r, c] + Δrow[r] + Δcol[c]
if cost == 0
Zs[r, c] = Z
# "if there is no starred zero in its row and none in its column, star Z.
# repeat, considering each zero in the matrix in turn;"
if !colSTAR[c] && !rowSTAR[r]
Zs[r, c] = STAR
rowSTAR[r] = true
colSTAR[c] = true
row2colSTAR[r] = c
# "then cover every column containing a starred zero."
colCovered[c] = true
end
end
end

This marking step is critical for large matrices like 4000x4000 or 8000x8000. You can find some benchmark reports in #15.

@Gnimuc
Copy link
Owner

Gnimuc commented Oct 16, 2021

What's the problem size of your use case?

@matbesancon
Copy link
Author

the problem go up to npixels * npixels for image problems so as big as you want, 150528 * 150528 in some cases but we couldn't run Hungarian.jl for these sizes

@Gnimuc
Copy link
Owner

Gnimuc commented Oct 16, 2021

That's really large. Hungarian is an O(n^3) algorithm and may not suite for this kinda problem.

Would you like to make a PR with above-mentioned multi-threading support? I'm happy to merge if the benchmark reports are green.

BTW, this is released under public domain, so you could also make the code tailored to your use case and embed it in your package.

@matbesancon
Copy link
Author

That's really large. Hungarian is an O(n^3) algorithm and may not suite for this kinda problem.

Yes this is the kind of extreme limit we'd like to solve, it's okay to have the method work for smaller sizes of course.

I'll probably try to reimplement the multithreaded version and compare the two + a LP solver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants