Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time-series filtering #203

Merged
merged 6 commits into from
Jul 26, 2023
Merged

Conversation

osopardo1
Copy link
Member

@osopardo1 osopardo1 commented Jul 25, 2023

Description

Fixes #202 (and #201 😃 )

Type of change

It is a Bug Fix for the reading part, it does not include breaking changes on the algorithm or the writing of the files.

Now TimestampType and DateType are pre-processed on QueryFilterUtils to return a Millisecond representation that can be filtered properly by the index.

Checklist:

Here is the list of things you should do before submitting this pull request:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

How Has This Been Tested? (Optional)

I've added a test on src/test/scala/io/qbeast/spark/index/query called TimeSeriesQueryTest. In this class, I've tested four different ways of creating an index on a Date / Timestamp column. Each one were failing differently, and should fail if executed with previous commits.

Test Configuration:
TBD.

@osopardo1
Copy link
Member Author

osopardo1 commented Jul 25, 2023

To have a fixed version for 0.3.x for compatibility with Spark 3.2.x, the commits of this PR should be merged into a new 0.3.5 version release, and updated accordingly.

We can start thinking about making a separate branch for version 0.3.x and 0.4.x / further versions.

@codecov
Copy link

codecov bot commented Jul 25, 2023

Codecov Report

Merging #203 (a18d8dc) into main (6262048) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head a18d8dc differs from pull request most recent head 3b722d4. Consider uploading reports for the commit 3b722d4 to get more accurate results

@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   93.81%   93.83%   +0.01%     
==========================================
  Files          85       85              
  Lines        2087     2091       +4     
  Branches      175      172       -3     
==========================================
+ Hits         1958     1962       +4     
  Misses        129      129              
Files Changed Coverage Δ
...io/qbeast/spark/index/query/QuerySpecBuilder.scala 100.00% <ø> (ø)
...o/qbeast/spark/index/query/QueryFiltersUtils.scala 91.30% <100.00%> (+1.83%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@osopardo1 osopardo1 marked this pull request as ready for review July 26, 2023 06:48
@osopardo1
Copy link
Member Author

Hello @alexeiakimov ! Could you review this quick fix? Thank you 🙌

@osopardo1 osopardo1 merged commit 35bc1f0 into Qbeast-io:main Jul 26, 2023
1 check passed
@osopardo1 osopardo1 mentioned this pull request Jul 27, 2023
@osopardo1 osopardo1 deleted the 202-timestamp-filter branch August 2, 2023 05:51
osopardo1 added a commit to osopardo1/qbeast-spark that referenced this pull request Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp filtering returns empty DataFrame
2 participants