-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PaddedDiskArray
and pad
#219
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this, this would be useful indeed.
|
||
function _pad_offset(c::RegularChunks, (low, high), s) | ||
chunksize = c.cs | ||
offset = low > 0 ? c.offset - low + chunksize : c.offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if low > chunksize
so we pad more than a chunk. I think then we would arrive at negative offsets.
size = c.s + low + high | ||
return RegularChunks(chunksize, offset, size) | ||
end | ||
function _pad_offset(c::IrregularChunks, (low, high), s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really obvious to me what the code should do here. Currently this adds an extra chunk at the beginning for the padded values. Should the values padded at the end be in a new chunk as well? Or should padded values simply be lumped into the first and last chunk. However, something is still wrong with the implementation, see this example:
c2 = IrregularChunks(chunksizes = [10, 10, 20, 30, 40])
_pad_offset(c2, (5,5), 100)
which returns chunks only up to 105 while the resulting array should have length 120.
end | ||
|
||
readblock!(A::PaddedDiskArray, data, I...) = _readblock_padded!(A, data, I...) | ||
readblock!(A::PaddedDiskArray, data, I::AbstractVector...) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why this is necessary, are the fallbacks too slow here?
return if all(map(<=, fs, ls)) | ||
Idata = map(:, fs, ls) | ||
Iparent = map(getindex, Ipadded, Idata) | ||
@show Idata Iparent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@show
should be removed
i .- low | ||
end | ||
fs = map(axes(parent(A)), Ipadded) do a, ip | ||
searchsortedfirst(ip, first(a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes indices I
are sorted, so this readblock implementation should really only be called for I::AbstractUnitRange and not for AbstractVector
testdata = copy(regions) .= 999 | ||
testdata[1:10, 1:10, 1:1] .= ch[91:100, 111:120, 4:4] | ||
testdata | ||
@test regions == testdata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the following tests for _pad_offset:
c1 = RegularChunks(10, 0, 100)
_pad_offset(c1, (5,2), 100) == RegularChunks(10,5,107)
_pad_offset(c1, (30,2), 100) == RegularChunks(10,0,132)
c2 = IrregularChunks(chunksizes = [10, 10, 20, 30, 40])
#The following test would assume padding ends up in a separate chunk:
_pad_offset(c2, (5,5), 100) == IrregularChunks(chunksizes = [5,10, 10, 20, 30, 40,5])
@rafaqz do you need support for fixing some of the chunk issues? |
Should be ok, just haven't had time to get back to it yet |
Ive had this use case a few times that you want to pad a disk array for windowing or fixing chunk alignment, and you don't want to make a copy of the file.
This PR adds a
PaddedDiskArray
wrapper and aDiskArrays.pad
function that pads any disk array by custom amounts above and below the original array on each axis. The fill value for the padded area can be set with thefill
keyword, which is zero by default.Chunk offsets are updated to match the padding