Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#957 Implement earliest and latest functions #1018

Merged
merged 6 commits into from
Jan 17, 2025

Conversation

currantw
Copy link
Contributor

Signed-off-by: currantw [email protected]

Description

Implements earliest and latest relative date time functions in PPL.

Related Issues

Resolves #957

Check List

  • Updated documentation:
    • PPL-Example-Commands.md
    • ppl-datetime.md
    • ppl-where-command.md
  • Implemented unit tests
    • TimeUtilsTest
    • SerializableTimeUdfTest
  • Implemented tests for combination with other commands
    • FlintSparkPPLBuiltInDateTimeFunctionITSuite
  • New added source code should include a copyright header
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: currantw <[email protected]>
docs/ppl-lang/functions/ppl-datetime.md Outdated Show resolved Hide resolved
Comment on lines -250 to -253
EARLIEST: 'EARLIEST';
EARLIEST_TIME: 'EARLIEST_TIME';
LATEST: 'LATEST';
LATEST_TIME: 'LATEST_TIME';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aggregation commands don't appear to have ever been implemented. They are not used, referenced, or documented anywhere outside the auto-generated Antlr modules, that I can find. Moreover, I think that their functionality (as best as I can guess from the name, since they aren't documented) can just be accomplished using the existing min and max functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there spark issues that refer to this syntax? if so should mention that they are no longer relevant

or does sql has these implemented? If so might want to create future issues to implement this as we are striving for feature parity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there spark issues that refer to this syntax? if so should mention that they are no longer relevant

Good idea. I searched and didn't find anything.

or does sql has these implemented? If so might want to create future issues to implement this as we are striving for feature parity.

OpenSearch SQL does not appear to have these implemented either, and I similarly wasn't able to find any GitHub issues that made mentioned of it.

@YANG-DB I assume it also makes sense to remove this syntax from OpenSearch SQL for parity? Should I raise an issue and create a (tiny) PR for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YANG-DB I have raise a cleanup issue on OpenSearch SQL here. Would you be able to triage it and I will create a (trivial) pull request to resolve it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: here is the pull request.

Signed-off-by: currantw <[email protected]>

#### **earliest**
[See additional function details](functions/ppl-datetime#earliest)
- `source = table | where earliest("-1wk", timestamp)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case where earliest is interpreted as
timestamp >= now() - 1s, depend on when the now() is resolved, it may produce insonsistent result?

Copy link
Contributor Author

@currantw currantw Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. Relative timestamp are based on calls to CurrentTimestamp (see org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala). If you look at the documentation for CurrentTimestamp, it says that:

Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value.

So now(), relative_timestamp("now"), earliest("now", timestamp), latest("now", timestamp) and current_timestamp should all return consistent results within the same query. I tested this out manually as well to verify (i.e. repeated calls to these methods within the same query return the same now timestamp, down to the millisecond).

@penghuo penghuo merged commit 54248bb into opensearch-project:main Jan 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PPL-Lang]support earliest/latest date-time functions
4 participants