-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of range error #120
Comments
It will help if you could share a sample file that can be used to replicate the issue. |
I wouldn't be surprised if it is some sort of weird character in one of the strings. |
Thanks. I see what the issue is. Will work out a fix shortly. |
There may be trailing bytes in bitpack encoded data. The reader should be able to skip those while reading bit packed runs. fixes #120
For some reason one extra byte was present in the encoding of one of the columns which was causing the reading to go out of whack. Should be fixed by #123. |
Okay, it's probably what's causing issue #122. Can you tell me when it gets released so I can test it on some other files? By the way, I want to thank you guys for all the work you're doing on this particular project. Working with Parquet files has been a real pain-point with me and Julia; I currently rely on PyCall to fastparquet which sometimes segfaults if the versions have LLVM conflicts and will always segfault if I try any kind of parallelism. I've noticed huge improvements in the package's usability over the past 9 months. |
👍 |
I couldn't open a parquet file, so I read it in Python with fastparquet and split it up into smaller files, and I can read every file except the second one.
I could open the parquet file:
But I couldn't even start a cursor on it, and the error message doesn't even tell me where the error in the file is. I suspect it's within one of the columns that uses strings.
The text was updated successfully, but these errors were encountered: