Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example to spark-expr crate #1365

Open
andygrove opened this issue Jan 31, 2025 · 3 comments
Open

Add example to spark-expr crate #1365

andygrove opened this issue Jan 31, 2025 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andygrove
Copy link
Member

andygrove commented Jan 31, 2025

What is the problem the feature request solves?

It would be nice to have an example in datafusion-comet-spark-expr that demonstrates how to register Spark-compatible expressions with a DataFusion context and then invoke them via SQL (and/or DataFrame API).

Describe the potential solution

No response

Additional context

No response

@andygrove andygrove added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jan 31, 2025
@viczsaurav
Copy link

I will give it a try

@viczsaurav
Copy link

viczsaurav commented Feb 6, 2025

Can we use the following approach? Where would this example reside?

  1. Initialize DataFusion Context

  2. Define Custom Expression

  • Implement a Scalar Function using make_scalar_function.
  • Simple string conversion = StringArray => uppercase strings
  1. Register the Expression
  • Wrap the function in a ScalarUDF.
  • Specify input/output data types and function volatility.
  • Add it to SessionContext using register_udf().
  1. Use in SQL Query
  • Read data into a data frame / create it from in-memory Vec
  • Execute an SQL query that invokes the registered function.
  1. Display Results - Show the transformed output using .show().await?.

@andygrove
Copy link
Member Author

Thanks for looking at this @viczsaurav.

Here is some sample code that may be useful while exploring this.

    #[tokio::test]
    async fn test() {
        let mut session_state = SessionStateBuilder::new().build();
        session_state
            .register_udf(
                create_comet_physical_fun("xxhash64", DataType::Int64, &session_state).unwrap(),
            )
            .unwrap();
        let ctx = SessionContext::new_with_state(session_state);
        let df = ctx.sql("select xxhash64('hello', 42)").await.unwrap();
        df.show().await.unwrap();
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants