Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial subquery support #37

Merged
merged 10 commits into from
May 22, 2024
Merged

feat: initial subquery support #37

merged 10 commits into from
May 22, 2024

Conversation

backkem
Copy link
Collaborator

@backkem backkem commented May 11, 2024

Ok, let's try this again.

This is a continuation of #33. cc @sardination

The PR includes initial support for subqueries. I also tried to re-write the analyzer optimizer logic in an attempt to make it more readable.

PS: It's also becoming clear we'll need to look into a good way to get tests setup. For example by federating to DataFusion itself.

@backkem backkem changed the title feat: Initial subquery support feat: initial subquery support May 11, 2024

// NopFederationProvider is used to represent tables that are not federated, but
// are resolved by DataFusion. This simplifies the logic of the optimizer rule.
struct NopFederationProvider {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really represents TableScans handled by DataFusion - so in some sense it "federates" back to DF itself? 🤔 I wonder if we should call it something like TableFederationProvider ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although Nop has the connotation that it isn't doing anything, so I think its also fine to leave as is.

mod table_provider;
pub use table_provider::*;

mod plan_node;
pub use plan_node::*;

pub fn default_session_state() -> SessionState {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

panic!("Could not locate ScalarSubqueryToJoin");
};

// TODO: check if we should allow other optimizers to run before the federation rule.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScalarSubqueryToJoin is fairly early on in the optimizer list. I wonder if we could just add this to the end? PushDownLimit and PushDownFilter seem beneficial for the federation case.
Screenshot 2024-05-15 at 8 45 36 PM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we take the time to think through the implications of having the optimizers run first and land it in separate PRs. Aside from that there are also federation-specific optimizations we can do (hinted at in #23). We may want to make different choices in those than in the local execution case.

@backkem backkem merged commit 0567c02 into main May 22, 2024
17 of 18 checks passed
@backkem backkem deleted the outer-reference branch May 22, 2024 06:54
hozan23 referenced this pull request in hozan23/datafusion-federation Aug 29, 2024
Co-authored-by: Suriya Kandaswamy <[email protected]>
Co-authored-by: Suriya Kandaswamy <[email protected]>
phillipleblanc pushed a commit that referenced this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants