You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
11bad6d
The above commit introduced skip logic in internal/block_reader.go. The logic has 2 issues.
Issue 1
The skip logic manipulates the data stream without updating the offset of the stream. This causes the stream and the offset to deviate from each other in specific cases, leading to incorrect bytes being served from the hdfs data read stream.
Repro
Reproing the issue requires a pathological read case. It requires 3 reads in a specific sequence to trigger the issue.
t0: read 10 bytes from file offset 0. File offset: 10, br.offset: 10, br.stream: 10
t1: read 10 bytes from file offset 8192. File offset: 8202, br.offset: 20, br.stream: 8202 ← stream and offset have diverged.
t2: read 10 bytes from file offset 4096. File offset: 4106, br.offset: 30, br.stream: 8212 ← Incorrect Read
If the skip is greater than 64K the skip logic is not invoked.
Fix Proposal:
update the stream offset as well.
refactor the block_reader.go stream to stream struct / interface so that the stream read and offset are atomic.
Issue 2:
amountToSkip is double decremented. Both by caller and in the skip function.
// Skip attempts to discard bytes in the stream in order to skip forward. This
// is an optimization for the case that the amount to skip is very small. It
// returns an error if skip was not attempted at all (because the BlockReader
// isn't connected, or the offset is out of bounds or too far ahead) or the seek
// failed for some other reason.
func (br *BlockReader) Skip(off int64) error {
blockSize := int64(br.Block.GetB().GetNumBytes())
amountToSkip := off - br.Offset --> here the code is double decrementing the
amountToskip
if br.stream == nil || off < 0 || off >= blockSize ||
amountToSkip < 0 || amountToSkip > maxSkip { // maxSkip : 65536
return errors.New("unable to skip")
}
///// -----
// Code is reading from br.stream but NOT updating br.Offset.
///// -----
_, err := io.CopyN(io.Discard, br.stream, amountToSkip)
if err != nil {
if err == io.EOF {
err = io.ErrUnexpectedEOF
}
br.stream = nil
br.datanodes.recordFailure(err)
}
return err
}
The text was updated successfully, but these errors were encountered:
Hi @mpaliwal3, thanks for the report. Do you think you could write a test that reproduces at least the first issue? That would make debugging much easier.
11bad6d
The above commit introduced skip logic in internal/block_reader.go. The logic has 2 issues.
Issue 1
The skip logic manipulates the data stream without updating the offset of the stream. This causes the stream and the offset to deviate from each other in specific cases, leading to incorrect bytes being served from the hdfs data read stream.
Repro
Reproing the issue requires a pathological read case. It requires 3 reads in a specific sequence to trigger the issue.
t0: read 10 bytes from file offset 0. File offset: 10, br.offset: 10, br.stream: 10
t1: read 10 bytes from file offset 8192. File offset: 8202, br.offset: 20, br.stream: 8202 ← stream and offset have diverged.
t2: read 10 bytes from file offset 4096. File offset: 4106, br.offset: 30, br.stream: 8212 ← Incorrect Read
If the skip is greater than 64K the skip logic is not invoked.
Fix Proposal:
Issue 2:
amountToSkip is double decremented. Both by caller and in the skip function.
The text was updated successfully, but these errors were encountered: