Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] .txt file with commas in URLs #342

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bkj
Copy link

@bkj bkj commented Aug 23, 2023

The current implementation seems to fail when the URLs in a .txt input file have commas in them. This modification seems to fix the bug.

(Disclaimer: I am not 100% I am passing data the way that's intended ... if so, my mistake and please correct me!)

@clairej12
Copy link

I am also still having this issue. I get a CSV parse error like the below if the URL has one or more commas:
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, got 2: http://1.bp.blogspot.com/-xf8FZNbm-O4/UVbWL6XBdOI/AAAAAAAAFYs/1IPtSSmYZiI/s640/Big+sage,+Lantana ...

@rom1504
Copy link
Owner

rom1504 commented Oct 8, 2023

let's instead make the separator be something that never occurs

@MaxyLee
Copy link

MaxyLee commented Nov 4, 2024

Got the same issue. I change the delimiter from ',' to '\t' and it works (line 100 of reader.py):

df = csv_pa.read_csv(file, read_options=csv_pa.ReadOptions(column_names=["url"]), parse_options=csv_pa.ParseOptions(delimiter="\t"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for user input
Development

Successfully merging this pull request may close these issues.

4 participants