Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

March 4, 2025: This week(s) in DataFusion #15005

Open
alamb opened this issue Mar 4, 2025 · 0 comments
Open

March 4, 2025: This week(s) in DataFusion #15005

alamb opened this issue Mar 4, 2025 · 0 comments

Comments

@alamb
Copy link
Contributor

alamb commented Mar 4, 2025

Is your feature request related to a problem or challenge?

Introduction

This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community.

Community Highlights

Releases!

Performance

DataFusion's core value proposition is great performance without having to re-implement it yourself

Quality

Testing

Bug Fixes

DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down

Docs

Build time

Cleanups 🧹

Features

Features under way

Better Out of Core Support

In general, DataFusion is getting better at handling datasets that are larger than can fit in memory.

We can have nice things! (Explain plans)


> explain select * from t1 inner join t2 on t1.i=t2.i;

+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| logical_plan  | Inner Join: t1.i = t2.i                                    |
|               |   TableScan: t1 projection=[i]                             |
|               |   TableScan: t2 projection=[i]                             |
| physical_plan | ┌───────────────────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │        HashJoinExec       ├──────────────┐               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │    partition_sizes: [0]   ││       partitions: 1       │ |
|               | │       partitions: 1       ││    partition_sizes: [0]   │ |
|               | └───────────────────────────┘└───────────────────────────┘ |
|               |                                                            |
+---------------+------------------------------------------------------------+
2 row(s) fetched.

Better Error Messages

@eliaperantoni is working with various contributors to make the error messages better. This work is tracked in

Misc

Looking to get more involved? Please help review code! 🎣

DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.

We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.

Help wanted

  • I would love to see the community offer additional help performance testing, triaging bugs helping to make DataFusion a more stable foundation for building systems

Please feel leave your own comments on this ticket if you are looking for help

Community

Upcoming meetups:

  • Help schedule some!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant