Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PaddedDiskArray and pad #219

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add PaddedDiskArray and pad #219

wants to merge 1 commit into from

Conversation

rafaqz
Copy link
Collaborator

@rafaqz rafaqz commented Jan 27, 2025

Ive had this use case a few times that you want to pad a disk array for windowing or fixing chunk alignment, and you don't want to make a copy of the file.

This PR adds a PaddedDiskArray wrapper and a DiskArrays.pad function that pads any disk array by custom amounts above and below the original array on each axis. The fill value for the padded area can be set with the fill keyword, which is zero by default.

Chunk offsets are updated to match the padding

@rafaqz rafaqz requested a review from meggart January 27, 2025 19:10
Copy link
Collaborator

@meggart meggart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this, this would be useful indeed.


function _pad_offset(c::RegularChunks, (low, high), s)
chunksize = c.cs
offset = low > 0 ? c.offset - low + chunksize : c.offset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if low > chunksize so we pad more than a chunk. I think then we would arrive at negative offsets.

size = c.s + low + high
return RegularChunks(chunksize, offset, size)
end
function _pad_offset(c::IrregularChunks, (low, high), s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not really obvious to me what the code should do here. Currently this adds an extra chunk at the beginning for the padded values. Should the values padded at the end be in a new chunk as well? Or should padded values simply be lumped into the first and last chunk. However, something is still wrong with the implementation, see this example:

c2 = IrregularChunks(chunksizes = [10, 10, 20, 30, 40])
_pad_offset(c2, (5,5), 100)

which returns chunks only up to 105 while the resulting array should have length 120.

end

readblock!(A::PaddedDiskArray, data, I...) = _readblock_padded!(A, data, I...)
readblock!(A::PaddedDiskArray, data, I::AbstractVector...) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this is necessary, are the fallbacks too slow here?

return if all(map(<=, fs, ls))
Idata = map(:, fs, ls)
Iparent = map(getindex, Ipadded, Idata)
@show Idata Iparent
Copy link
Collaborator

@meggart meggart Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@show should be removed

i .- low
end
fs = map(axes(parent(A)), Ipadded) do a, ip
searchsortedfirst(ip, first(a))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes indices I are sorted, so this readblock implementation should really only be called for I::AbstractUnitRange and not for AbstractVector

testdata = copy(regions) .= 999
testdata[1:10, 1:10, 1:1] .= ch[91:100, 111:120, 4:4]
testdata
@test regions == testdata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the following tests for _pad_offset:

c1 = RegularChunks(10, 0, 100)
_pad_offset(c1, (5,2), 100) == RegularChunks(10,5,107)
_pad_offset(c1, (30,2), 100) == RegularChunks(10,0,132)

c2 = IrregularChunks(chunksizes = [10, 10, 20, 30, 40])
#The following test would assume padding ends up in a separate chunk:
_pad_offset(c2, (5,5), 100) == IrregularChunks(chunksizes = [5,10, 10, 20, 30, 40,5])

@meggart
Copy link
Collaborator

meggart commented Feb 12, 2025

@rafaqz do you need support for fixing some of the chunk issues?

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 12, 2025

Should be ok, just haven't had time to get back to it yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants