Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Expr.to_dicts() #10697

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Jan 21, 2025

fixes #9185

I hope that the name to_dicts won't lead people to think this returns a dict[column_name, list[values]. I don't think it will. This is why I included the Column.to_list() method in the "See Also" section. I can be more explicit there if you think this is good. I considered the name to_records, but I like to_dicts more. Open ot other ideas.

I'm not sure if we want to sneak this in for 10.0, the implementation doesn't look that hard to review, but the increased API is really what to watch out for here I think.

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jan 21, 2025
@NickCrews NickCrews added the docs-preview Add this label to trigger a docs preview label Jan 21, 2025
@ibis-docs-bot ibis-docs-bot bot removed the docs-preview Add this label to trigger a docs preview label Jan 21, 2025
@NickCrews NickCrews added feature Features or general enhancements format Issues related to output formats labels Jan 21, 2025
@ibis-docs-bot
Copy link

ibis-docs-bot bot commented Jan 21, 2025

@NickCrews NickCrews force-pushed the to-dicts branch 2 times, most recently from f375858 to 777153a Compare January 21, 2025 21:16
@@ -586,6 +586,44 @@ def to_delta(
with expr.to_pyarrow_batches(params=params) as batch_reader:
write_deltalake(path, batch_reader, **kwargs)

@util.experimental
def to_dicts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but maybe call it to_records()?

@cpcloud
Copy link
Member

cpcloud commented Jan 22, 2025

One of the issues with methods like these (to_list, to_dicts, to_records) is that their API tends to add support for different shapes by stuffing in a bunch of non-mutually exclusive arguments.

Can we a pick a name for this that is both concise and mostly-unambiguously indicates the shape of the dictionary/list-of-dictionaries?

@cpcloud
Copy link
Member

cpcloud commented Jan 22, 2025

Here are my current thoughts:

  • to_dicts: implies a sequence because of the pluralization, pigeonholes the output type in perpetuity to literal python dictionaries
  • to_records: implies a mapping or something like named tuple (a thing with fields), leaves the individual record type open.

@NickCrews
Copy link
Contributor Author

I went through the same train of thought as Phillip and that is why I chose to_dicts after considering to_records. Phillip are you saying you are happy with to_dicts or do you see a problem with it?

The two basic shapes I see are Iterable[mapping] (this PR) and Iterable[Iterable] such as an Iterable of tuples or named tuples. Is there another basic shape we might want to support?

I DEFINITELY want different methods for the different shapes. Eg avoid pandas chaos where their to_records() gives you totally different things depending on what you pass to orient. But I'm not sure if I think we need a single method for every concrete type. Eg do we want to_tuples() and to_namedtuples(), or to_rows(record_type=tuple | namedtuple)?
I think I am more of a fan of splitting them into separate methods. One reason is that namedtuples blue the line between being an Iterable and being a mapping.

@cpcloud
Copy link
Member

cpcloud commented Jan 24, 2025

Phillip are you saying you are happy with to_dicts or do you see a problem with it?

The only problem I can see with to_dicts() is that if we ever want to return something other than a list of dicts then we have to implement a new method.

to_records slightly hedges against that possibility by using a more abstract noun.

On the other hand that may never be an issue.

@NickCrews
Copy link
Contributor Author

I could see how in the future we might want to return a Iterable[tuple]. However, I would NEVER want us to have one function that returns either Iterable[Mapping] or Iterable[tuple], they should be two different functions. So what should they be called? I worry that to_records could be understood as returning EITHER of these, so I think we should be more explicit, and avoid that name. I think my favorites at this point are the explicit to_dicts() and to_tuples(), and make them typed as returning the concrete type. If we add options for eg other Mapping-likes, then we @typing.overload them with those other concrete options.

We COULD call it .to_mappings(), but that just feels not idiomatic, I think that would be more confusing than helpful.

@cpcloud
Copy link
Member

cpcloud commented Feb 20, 2025

If we add options for eg other Mapping-likes, then we @typing.overload them with those other concrete options.

I'm not sure that overloading on return type makes sense. How would a type system be able to determine which overload was meant?

@NickCrews
Copy link
Contributor Author

Sorry, I think I over-explained that, I'm just thinking of normal overloading

# or could use the type dict instead of the string literal "dict"
@typing.overload
def to_dicts(self, *, cls: Literal["dict"]) -> Iterable[dict[str, Any]]: ...

@typing.overload
def to_dicts(self, *, cls: Literal["mydict"]) -> Iterable[MyDict[str, Any]]: ...

def to_dicts(self, *, cls = "dict"): ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements format Issues related to output formats tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Table.to_dicts()
3 participants