Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness issues and errors after sorting #110

Closed
tecosaur opened this issue Dec 11, 2024 · 2 comments · Fixed by #111
Closed

Correctness issues and errors after sorting #110

tecosaur opened this issue Dec 11, 2024 · 2 comments · Fixed by #111

Comments

@tecosaur
Copy link

Hello,

I was recently surprised to find that when sorting a column I read from a CSV ...the numbers changed?

Using Julia 1.11 and the latest releases of SentinelArrays, CSV, and DataFrames, I executed:

CSV.read("my.csv", DataFrame).column |> unique! |> sort

This produced some really surprising values, and turned up some incredibly buggy behavior:

julia> cv = ChainedVector([rand(1:100, 20) for _ in 1:10])
[...]

julia> unique!(cv)
[...]

julia> sum(cv)
5073

julia> sum(sort(cv))
2236548265497175

If I keep on running sum(sort(cv)) the result bounces around a bit, it seems like there's something stochastic happening?

julia> sum(sort(cv))
4399

julia> sum(sort(cv))
4378

julia> sum(sort(cv))
4400

This can also produce an error instead of an incorrect result sometimes.

julia> sum(cv)
4414

julia> sum(sort(cv))
ERROR: ArgumentError: out of range arguments to copyto! on ChainedVector
Stacktrace:
 [1] copyto!(dest::ChainedVector{…}, doffs::Int64, src::ChainedVector{…}, soffs::Int64, n::Int64)
   @ SentinelArrays ~/.julia/packages/SentinelArrays/ob2QK/src/chainedvector.jl:475
 [2] copyto!
   @ ~/.julia/packages/SentinelArrays/ob2QK/src/chainedvector.jl:465 [inlined]
 [3] copymutable
   @ ./abstractarray.jl:1192 [inlined]
 [4] sort(v::ChainedVector{Int64, Vector{Int64}})
   @ Base.Sort ./sort.jl:1720
 [5] top-level scope
   @ REPL[39]:1
Some type information was truncated. Use `show(err)` to see complete types.
@laborg
Copy link
Contributor

laborg commented Dec 11, 2024

resize! is not working for a ChainedVector.

julia> resize!(ChainedVector([[1],[2],[3]]),2)
2-element ChainedVector{Int64, Vector{Int64}}:
   1
 #undef

@laborg
Copy link
Contributor

laborg commented Dec 11, 2024

This line isn't correct. The last chunk of a chained vector shouldn't be resized to zero (probably a one-off error).
https://github.com/JuliaData/SentinelArrays.jl/blob/a2976cb2c63ecb8b2f1c2dff50975835abfd4b0c/src/chainedvector.jl#L549C5-L549C54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants