Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add read array support #1456

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

feat: add read array support #1456

wants to merge 7 commits into from

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #1454 .

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link

codecov-commenter commented Mar 1, 2025

Codecov Report

Attention: Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Project coverage is 58.73%. Comparing base (f09f8af) to head (0699d15).
Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 87.50% 0 Missing and 1 partial ⚠️
...ala/org/apache/spark/sql/comet/CometScanExec.scala 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1456      +/-   ##
============================================
+ Coverage     56.12%   58.73%   +2.60%     
- Complexity      976     1018      +42     
============================================
  Files           119      122       +3     
  Lines         11743    12255     +512     
  Branches       2251     2308      +57     
============================================
+ Hits           6591     7198     +607     
+ Misses         4012     3899     -113     
- Partials       1140     1158      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@comphead comphead requested a review from andygrove March 3, 2025 16:14
@andygrove
Copy link
Member

Some tests are failing due to #1289

I think the root cause is that we are trying to shuffle with arrays and Comet shuffle does not support arrays yet. We need to fall back to Spark for these shuffles.

@andygrove
Copy link
Member

In CometExecRule we check to see if we support the partitioning types for the shuffle but do not check that we support the types of other columns.

@comphead Do you want to update these checks as part of this PR and see if it resolves the issue?

  case class CometExecRule(session: SparkSession) extends Rule[SparkPlan] {
    private def applyCometShuffle(plan: SparkPlan): SparkPlan = {
      plan.transformUp {
        case s: ShuffleExchangeExec
            if isCometPlan(s.child) && isCometNativeShuffleMode(conf) &&
              QueryPlanSerde.supportPartitioning(s.child.output, s.outputPartitioning)._1 =>
                 ...
              
              
        case s: ShuffleExchangeExec
            if (!s.child.supportsColumnar || isCometPlan(s.child)) && isCometJVMShuffleMode(
              conf) &&
              QueryPlanSerde.supportPartitioningTypes(s.child.output, s.outputPartitioning)._1 &&
              !isShuffleOperator(s.child) =>
                ...

@comphead
Copy link
Contributor Author

comphead commented Mar 4, 2025

In CometExecRule we check to see if we support the partitioning types for the shuffle but do not check that we support the types of other columns.

@comphead Do you want to update these checks as part of this PR and see if it resolves the issue?

  case class CometExecRule(session: SparkSession) extends Rule[SparkPlan] {
    private def applyCometShuffle(plan: SparkPlan): SparkPlan = {
      plan.transformUp {
        case s: ShuffleExchangeExec
            if isCometPlan(s.child) && isCometNativeShuffleMode(conf) &&
              QueryPlanSerde.supportPartitioning(s.child.output, s.outputPartitioning)._1 =>
                 ...
              
              
        case s: ShuffleExchangeExec
            if (!s.child.supportsColumnar || isCometPlan(s.child)) && isCometJVMShuffleMode(
              conf) &&
              QueryPlanSerde.supportPartitioningTypes(s.child.output, s.outputPartitioning)._1 &&
              !isShuffleOperator(s.child) =>
                ...

Thanks @andygrove I'll check that, you saving me hours of debugging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Support read array type using native reader
3 participants