Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: json_normalize should work with JSON #61006

Open
2 of 3 tasks
vdwees opened this issue Feb 25, 2025 · 0 comments
Open
2 of 3 tasks

ENH: json_normalize should work with JSON #61006

vdwees opened this issue Feb 25, 2025 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@vdwees
Copy link

vdwees commented Feb 25, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish pd.json_normalize accepted JSON (as str or bytes), and not just dict.

Or, as a joke, there could be a pd.dict_normalize that only accepts JSON ;)

Feature Description

Given a Series with JSON as str or bytes:

>>> df["data"]
0                  {"value":0.0}
1          {"value":0.005787037}
2         {"value":0.0115740741}
3         {"value":0.0173611111}

It should be possible to parse the JSON with pd.json_normalize, e.g.

>>> pd.json_normalize(df["data"])
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361

Pandas already has good JSON integration, so don't see why it can't be done.

Alternative Solutions

From what I understand, right now it must be first parsed with some other library, e.g. with apply, before using pd.json_normalize.

>>> import json
>>> pd.json_normalize(df["data"].apply(json.loads))
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361

Additional Context

With better JSON/JSONB support in databases like postgres and sqlite, encountering this sort of data is becoming more common, and the intermediate apply step is a performance and usability issue:

>>> import json
>>> df = pd.read_sql(sql=query, con=conn)
>>> pd.json_normalize(df["data"].apply(json.loads))
            value
0        0.000000
1        0.005787
2        0.011574
3        0.017361
@vdwees vdwees added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant