Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json and jsonb support? #4

Open
sayre1000 opened this issue Apr 3, 2024 · 3 comments
Open

json and jsonb support? #4

sayre1000 opened this issue Apr 3, 2024 · 3 comments

Comments

@sayre1000
Copy link

Hello, this gem is fantastic and cuts out much of the work that needs done! My only concern is that for our use case we have a significant amount of jsonb and json columns that we'd like to convert, we're fine if they're strings since parquet doesn't have any real native support for jsonb and json anyways.

Is there any plan to add this functionality in the future?

@kou
Copy link
Member

kou commented Apr 4, 2024

Could you share examples for this?
I want to know what is an expected behavior of this.

@sayre1000
Copy link
Author

sayre1000 commented Apr 4, 2024

Sure, so I have a model in active record which contains a column of type jsonb we can call this column data

The resultant column for active records returned by rails is therefore is of type jsonb. Currently with this implementation trying to convert a collection of active records with this to an arrow table results in an error message related to unsupported data type regarding jsonb column types.

Now on a practical level these jsonb column entries are just deserialized strings and I believe when converting from active record to an arrow table it would make sense to just treat these like strings (the other option to be treating them like structs which would require passing in a schema to define the struct).

To this end I believe in the arrowable.rb file's extract_arrow_data_type method all that would need be done is the following:

remove line 56:
# when :json
and alter line 57:

when :string, :text, :json, :jsonb
  :string

If my understanding of this gem is right then it will be able to leverage rail's underlying functionality when
records = relation.pluck(*target_column_names)
occurs to serialize the column into strings.

I would expect this to create an arrow table in which the data column is present, as an ArrowString type, instead of throwing an error related to unsupported data type.

@kou
Copy link
Member

kou commented Apr 8, 2024

Thanks.

Do you want to open a PR for the approach?
We may add more metadata for json/jsonb cases later (e.g. Apache Arrow supports extension type https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types and field level custom metadata https://arrow.apache.org/docs/format/Columnar.html#schema-message ) but we can just map json/jsonb to :string for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants