Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified OpenSearch PPL Data Type #3345

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

penghuo
Copy link
Collaborator

@penghuo penghuo commented Feb 24, 2025

Description

This PR introduces a language specification abstraction to support SQL and PPL query processing. The main changes include:

  • A new interface, LangSpec, is added with a default SQL implementation and a custom PPLLangSpec that maps specific expression types (e.g., mapping BYTE to “tinyint”).
  • Updates in index describe requests and query results ensure that the correct type names are used based on the active language specification (SQL or PPL).
  • Updated tests verify the behavior of the language specification mappings and system index utilities.
  • Updated OpenSearch PPL data type doc.

To Reviewer

  • Ideally, Ideally, the query engine should use well-defined data types, with LangSpec serving as the protocol for translating these engine types to language-specific types. But, currently, the describe implementation relies on OpenSearchDataType (an extension of ExprCoreDataType). After the Calcite migration, all data types should be unified as CalciteDataType. then the specially handing in describe is not necessary.

Related Issues

#3339

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@penghuo penghuo self-assigned this Feb 24, 2025
@penghuo penghuo added the v3.0.0 label Feb 24, 2025
Signed-off-by: Peng Huo <[email protected]>
Comment on lines +25 to +30
static {
exprTypeToPPLType.put(ExprCoreType.BYTE, "tinyint");
exprTypeToPPLType.put(ExprCoreType.SHORT, "smallint");
exprTypeToPPLType.put(ExprCoreType.INTEGER, "int");
exprTypeToPPLType.put(ExprCoreType.LONG, "bigint");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a breaking change, why don't we directly change the old type to the new type, instead of introducing LangSpec? Isn't this unnecessarily adding complexity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The core type was not upgraded because I intended for the data type changes to affect only PPL and not SQL. This PR focuses on unifying PPL data types, while SQL data types can be addressed in a separate issue, as changes there would impact JDBC, ODBC, and CLI.

Ideally, the query engine should use well-defined data types, with LangSpec serving as the protocol for translating these engine types to language-specific types. Once the Calcite implementation is complete, CalciteDataType will translate to ExprDataType, and LangSpec will translate from ExprDataType to the PPL response data type.

Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Signed-off-by: Peng Huo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants