Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/230 get partitioning checkpoints #265

Merged
merged 28 commits into from
Sep 26, 2024

Conversation

salamonpavel
Copy link
Collaborator

@salamonpavel salamonpavel commented Sep 5, 2024

Introduces GET partitioning checkpoints endpoint with pagination
Closes #230

Release notes:

  • Introduces GET partitioning checkpoints endpoint with pagination

@salamonpavel salamonpavel added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 5, 2024
@salamonpavel salamonpavel self-assigned this Sep 5, 2024
@salamonpavel
Copy link
Collaborator Author

Release notes:

  • Introduces GET partitioning checkpoints endpoint with pagination

@salamonpavel salamonpavel marked this pull request as ready for review September 6, 2024 10:30
…kpoints

# Conflicts:
#	server/src/main/scala/za/co/absa/atum/server/api/repository/PartitioningRepositoryImpl.scala
@salamonpavel salamonpavel removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 6, 2024
Copy link

github-actions bot commented Sep 6, 2024

JaCoCo model module code coverage report - scala 2.13.11

Overall Project 63.64% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Sep 6, 2024

JaCoCo agent module code coverage report - scala 2.13.11

Overall Project 84.63% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Sep 6, 2024

JaCoCo reader module code coverage report - scala 2.13.11

Overall Project 100% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Sep 6, 2024

JaCoCo server module code coverage report - scala 2.13.11

Overall Project 75.41% -8.05% 🍏
Files changed 71.79%

File Coverage
CheckpointRepositoryImpl.scala 100% 🍏
PartitioningRepositoryImpl.scala 100% 🍏
CheckpointServiceImpl.scala 100% 🍏
PartitioningServiceImpl.scala 100% 🍏
BaseController.scala 100% 🍏
CheckpointControllerImpl.scala 100% -2.9% 🍏
PartitioningControllerImpl.scala 87.36% 🍏
CheckpointItemFromDB.scala 68.5% -19.69%
GetPartitioningCheckpoints.scala 58.1% -81.33%

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I like it, I just found a few tiny improvement ideas

lsulak
lsulak previously approved these changes Sep 17, 2024
# Conflicts:
#	server/src/main/scala/za/co/absa/atum/server/api/controller/PartitioningControllerImpl.scala
#	server/src/main/scala/za/co/absa/atum/server/api/repository/PartitioningRepositoryImpl.scala
lsulak
lsulak previously approved these changes Sep 17, 2024
status := 41;
status_text := 'Partitioning not found';
RETURN NEXT;
RETURN;
END IF;

IF i_checkpoints_limit IS NOT NULL THEN
SELECT count(*) > i_checkpoints_limit
Copy link
Contributor

@benedeki benedeki Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate solution, no need to change the current one, just for inspiration:

        PERFORM 1
        FROM runs.checkpoints C
        WHERE C.fk_partitioning = i_partitioning_id
          AND (i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
        LIMIT 1 
        OFFSET coalesce(i_offset) + i_checkpoints_limit;

        _has_more := found;

FROM runs.checkpoints C
WHERE C.fk_partitioning = i_partitioning_id
AND (i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
LIMIT i_checkpoints_limit + 1 OFFSET i_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complicated
First I thought this was a mistake to have ORDER BY omitted from the query.
Then I thought: wait we don't care about the order of we just want to see if there are more records.
Then I thought: but having the ORDER BY will cause to have the same data pages cached in memory for the query bellow.
And then I thought: but the ordering might be more expensive then the gain the cache gives.
So I don't know 🤣

Suggested change
LIMIT i_checkpoints_limit + 1 OFFSET i_offset
ORDER BY C.process_start_time DESC, C.id_checkpoint
LIMIT i_checkpoints_limit + 1 OFFSET i_offset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the code as is then for now :)

runs.measurements M ON C.id_checkpoint = M.fk_checkpoint
JOIN
limited_checkpoints LC
JOIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal quirk, no need to change here:
I do prefer INNER JOIN to JOIN to make it obvious. So if in future you would consider using that 👼

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will add the 'INNER' and do so going forward always

C.process_start_time,
C.id_checkpoint
LIMIT nullif(i_limit, 0);
LC.process_start_time, LC.id_checkpoint;
Copy link
Contributor

@benedeki benedeki Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major:
We want the measurements of one checkpoint go after each other followed by the next checkpoint etc.
While the current way how the measurements come in (all at once) ensure this gives the correct order, this is not guaranteed by the data itself, just by actual processes right now.

Suggested change
LC.process_start_time, LC.id_checkpoint;
LC.id_checkpoint, LC.process_start_time;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok switched the order


IF NOT FOUND THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem?
I think it's valid situation, if no checkpoint for an existing partitioning exist so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, so far in queries returning detail records of a master one, of there were no details one, we ust returned the empty set. Would prefer we stay consistent, and if for some reason this is considered better, changer there too.
Tag @TebaleloS @lsulak

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. How about status 12 with text 'OK with no checkpoints found'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the OK solution might cause even more problems, as the data returned are NULL. So fa-db would need to be able to handle that and recognize that the particular record is OK but return empty sequence.
In the other cases we just returned no records at all.

.in(GetPartitioningCheckpoints)
.in(jsonBody[CheckpointQueryDTO])
.in(V2Paths.Partitionings / path[Long]("partitioningId") / V2Paths.Checkpoints)
.in(query[Option[Int]]("limit").default(Some(10)).validateOption(Validator.inRange(1, 1000)))
Copy link
Contributor

@benedeki benedeki Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor:
Don't we want to put the (upper) limit to constants? Or we might consider different limit for each paginated endpoint?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we could maybe have some base endpoint for all/most paginated endpoints so we don't have to repeat this for every single one of them. Didn't want to optimize it yet given it's the first paginated endpoint ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do later sure. As hinted, this is minor thing, aboslutely not blocking.

@benedeki
Copy link
Contributor

Any chance to increase the coverage %, @salamonpavel ?

@benedeki
Copy link
Contributor

Any chance to increase the coverage %, @salamonpavel ?

Coincidentally @miroslavpojer today mentioned that jacoco-report had release version 1.7.1 with number of bugifxes. Can you please increase the version in the script to test and see how the new version works.

@salamonpavel
Copy link
Collaborator Author

Any chance to increase the coverage %, @salamonpavel ?

Coincidentally @miroslavpojer today mentioned that jacoco-report had release version 1.7.1 with number of bugifxes. Can you please increase the version in the script to test and see how the new version works.

Those coverage numbers are incorrect.

@miroslavpojer
Copy link
Collaborator

miroslavpojer commented Sep 19, 2024

Any chance to increase the coverage %, @salamonpavel ?

Coincidentally @miroslavpojer today mentioned that jacoco-report had release version 1.7.1 with number of bugifxes. Can you please increase the version in the script to test and see how the new version works.

Those coverage numbers are incorrect.

The new jacoco-report version explains the meaning of the negative percentage numbers.

The "delta" (the negative value next to the coverage) represents the percentage of newly added or modified lines of code that are not covered by unit tests. It calculates the difference in test coverage based solely on the changes made in the current commit or pull request. For example, if 10 lines of code are modified or added, and 8 of those lines are covered by unit tests, the "delta" would be -20%, indicating 20% of the newly changed code is untested. However, the "delta" has limitations. It can never be positive, meaning if you add more unit tests to cover existing, unmodified code, this additional coverage is not reflected in the delta. The metric only considers lines directly changed in the current set of modifications, as there is no mechanism to track improvements in coverage for pre-existing code that hasn't been altered.

Do the numbers make more sense with this explanation?
I still need the action control to switch this feature off.

@salamonpavel
Copy link
Collaborator Author

Any chance to increase the coverage %, @salamonpavel ?

Coincidentally @miroslavpojer today mentioned that jacoco-report had release version 1.7.1 with number of bugifxes. Can you please increase the version in the script to test and see how the new version works.

Those coverage numbers are incorrect.

The new jacoco-report version explains the meaning of the negative percentage numbers.

The "delta" (the negative value next to the coverage) represents the percentage of newly added or modified lines of code that are not covered by unit tests. It calculates the difference in test coverage based solely on the changes made in the current commit or pull request. For example, if 10 lines of code are modified or added, and 8 of those lines are covered by unit tests, the "delta" would be -20%, indicating 20% of the newly changed code is untested. However, the "delta" has limitations. It can never be positive, meaning if you add more unit tests to cover existing, unmodified code, this additional coverage is not reflected in the delta. The metric only considers lines directly changed in the current set of modifications, as there is no mechanism to track improvements in coverage for pre-existing code that hasn't been altered.

Do the numbers make more sense with this explanation? I still need the action control to switch this feature off.

Every single line in GetPartitioningCheckpoints.scala is invoked when testing the code yet it reports coverage of 58.1 %.

@miroslavpojer
Copy link
Collaborator

miroslavpojer commented Sep 19, 2024

Any chance to increase the coverage %, @salamonpavel ?

Coincidentally @miroslavpojer today mentioned that jacoco-report had release version 1.7.1 with number of bugifxes. Can you please increase the version in the script to test and see how the new version works.

Those coverage numbers are incorrect.

The new jacoco-report version explains the meaning of the negative percentage numbers.
The "delta" (the negative value next to the coverage) represents the percentage of newly added or modified lines of code that are not covered by unit tests. It calculates the difference in test coverage based solely on the changes made in the current commit or pull request. For example, if 10 lines of code are modified or added, and 8 of those lines are covered by unit tests, the "delta" would be -20%, indicating 20% of the newly changed code is untested. However, the "delta" has limitations. It can never be positive, meaning if you add more unit tests to cover existing, unmodified code, this additional coverage is not reflected in the delta. The metric only considers lines directly changed in the current set of modifications, as there is no mechanism to track improvements in coverage for pre-existing code that hasn't been altered.
Do the numbers make more sense with this explanation? I still need the action control to switch this feature off.

Every single line in GetPartitioningCheckpoints.scala is invoked when testing the code yet it reports coverage of 58.1 %.

The reason for 58,1% is in jacoco.xml
I have retested it on my local, and there is 100% coverage after our scala filtering. It looks like scala filtering was not fully applied for XML file. This is the next big task for the next QA Squad - move the Jacoco solution into the origin repository.

Comment on lines 130 to 134
IF NOT FOUND THEN
status := 12;
status_text := 'OK with no checkpoints found';
RETURN NEXT;
END IF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IF NOT FOUND THEN
status := 12;
status_text := 'OK with no checkpoints found';
RETURN NEXT;
END IF;

@salamonpavel salamonpavel merged commit 43ec036 into master Sep 26, 2024
9 of 10 checks passed
@salamonpavel salamonpavel deleted the feature/230-getPartitioningCheckpoints branch September 26, 2024 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GET /partitionings/{partId}/checkpoints
4 participants