-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix stale histogram lead to unexpected stats (backport #45614) #45646
Conversation
Signed-off-by: packy92 <[email protected]> (cherry picked from commit 1e5626e) # Conflicts: # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/BinaryPredicateStatisticCalculator.java # fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/Histogram.java # fe/fe-core/src/test/resources/sql/tpch-histogram-cost/q19.sql
Cherry-pick of 1e5626e has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr |
Signed-off-by: packy92 <[email protected]>
Signed-off-by: packy92 <[email protected]>
Signed-off-by: packy92 <[email protected]>
Quality Gate passedIssues Measures |
Why I'm doing:
Column histogram may not updated after the first collection.
If use the stale histogram to estimate row number may have a risk of divide zero exception because the estimated histogram may return 0 row count(a empty bucket and empty mcv).
What I'm doing:
If the predicate range doesn't overlap with the histogram, we use the min/max value instead of histogram to estimate statistics.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #45614 done by [Mergify](https://mergify.com). ## Why I'm doing: Column histogram may not updated after the first collection. If use the stale histogram to estimate row number may have a risk of divide zero exception because the estimated histogram may return 0 row count(a empty bucket and empty mcv).
What I'm doing:
If the predicate range doesn't overlap with the histogram, we use the min/max value instead of histogram to estimate statistics.
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: