-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51110][CORE][SQL] Proper error handling for file status when reading files #49833
base: master
Are you sure you want to change the base?
Conversation
} catch { | ||
case e: Exception => | ||
logError(s"Failed to filter out path names from ${path.toString}", e) | ||
throw SparkException.internalError(s"Unexpected statuses for path ${path.toString}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can users face to the error? If so, we shouldn't use internalError
. How do you repro the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can users face to the error? If so, we shouldn't use
internalError
. How do you repro the issue?
No, users cannot face this error because it is an internal error at the spark internal layer, hence the use of "internalError". It is difficult to repro, as multiple attempts have failed to repro the issue. What would be a suitable recommendation to address this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a suitable recommendation to address this issue?
Could you try a fake filesystem, see
spark/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala
Lines 1279 to 1288 in f5f7c36
class FakeFileSystemSetPermission extends LocalFileSystem { | |
override def setPermission(src: Path, permission: FsPermission): Unit = { | |
throw new IOException(s"fake fileSystem failed to set permission: $permission") | |
} | |
} | |
class FakeFileSystemNeverExists extends DebugFilesystem { | |
override def exists(f: Path): Boolean = false | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done~
What changes were proposed in this pull request?
Add clear error messages for reading a file
Why are the changes needed?
For these errors, we report ambiguous errors now. For example, sometimes Spark job will report NPE (Null Pointer Exception) as follows:
but users do not know which specific file caused it.
Does this PR introduce any user-facing change?
Yes, a meaningful error is given when visiting an unknown error file.
How was this patch tested?
manually test
Was this patch authored or co-authored using generative AI tooling?
No