Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system.bucket function to Iceberg #24200

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

posulliv
Copy link
Contributor

@posulliv posulliv commented Nov 20, 2024

Description

Add a UDF to expose the iceberg bucket transform via SQL. This can be used to help determine what bucket a value would be placed in. This could be used with the optimize procedure to only optimize specific buckets.

Spark has a similar UDF - https://iceberg.apache.org/javadoc/1.5.1/org/apache/iceberg/spark/functions/BucketFunction.html

Release notes

## Iceberg
* Add `system.bucket` function.

@cla-bot cla-bot bot added the cla-signed label Nov 20, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Nov 20, 2024
@ebyhr
Copy link
Member

ebyhr commented Nov 21, 2024

@martint Could you please review the syntax?

@posulliv posulliv force-pushed the add-iceberg-bucket-udf branch from 3f098db to f5d7a1a Compare November 21, 2024 15:38
@posulliv posulliv marked this pull request as ready for review November 21, 2024 16:27
@ebyhr ebyhr added the needs-docs This pull request requires changes to the documentation label Nov 22, 2024
@@ -33,6 +34,6 @@ public Iterable<ConnectorFactory> getConnectorFactories()
@Override
public Set<Class<?>> getFunctions()
{
return ImmutableSet.of(IcebergThetaSketchForStats.class);
return ImmutableSet.of(IcebergThetaSketchForStats.class, IcebergBucketFunction.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be exposed as a catalog function, not a global function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint thank you for reviewing! I'm not sure how to do this however? Is there an example of a function exposed as a catalog function in another connector I could look at to understand how to do this?

Copy link
Member

@ebyhr ebyhr Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can check random_string function in Faker. Also, we could rename to bucket once it is implemented as a catalog function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #24703 so we can avoid depending on trino-main in Iceberg connector.

@posulliv posulliv force-pushed the add-iceberg-bucket-udf branch from f5d7a1a to f96e88a Compare December 10, 2024 19:05
@posulliv
Copy link
Contributor Author

@ebyhr thank you for reviewing! I think I addressed all your comments if you want to have another look when you get a chance.

Copy link

github-actions bot commented Jan 1, 2025

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Jan 1, 2025
@ebyhr ebyhr added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jan 10, 2025
@ebyhr ebyhr force-pushed the add-iceberg-bucket-udf branch from f96e88a to 8265b27 Compare January 15, 2025 01:25
@ebyhr ebyhr changed the title Add iceberg_bucket UDF to iceberg plugin Add system.bucket function to Iceberg Jan 15, 2025
@ebyhr ebyhr removed the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector needs-docs This pull request requires changes to the documentation syntax-needs-review
Development

Successfully merging this pull request may close these issues.

3 participants