Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#48: Review of the db model and functions #82

Merged
merged 16 commits into from
Nov 11, 2023

Conversation

benedeki
Copy link
Contributor

@benedeki benedeki commented Sep 28, 2023

  • replaced HSTORE with JSONB
  • segmentation expression changed to partitioning
  • major refactoring of functions and tables
  • root files renamed to adhere to fa-db deploy_db.py script
  • functions not (fully) implemented moved to subfolders future

NB!
Please ignore the content of the future folders. They contain eventually useful code that has been developed before, but right now out of scope.

Closes #48

* replaced HSTORE with JSONB
* segmentation expression changed to partitioning
* major refactoring of functions and tables
* root files renamed to adhere to fa-db `deploy_db.py` script
* functions not (fully) implemented moved to subfolders `future`
@benedeki benedeki added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 28, 2023
@benedeki benedeki self-assigned this Sep 28, 2023
@github-actions
Copy link

github-actions bot commented Sep 28, 2023

JaCoCo agent module code coverage report - spark:2 - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage 83.26% 🍏

@github-actions
Copy link

github-actions bot commented Sep 28, 2023

JaCoCo server module code coverage report - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage 17.14%


RETURN;
END;
$$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

GRANT EXECUTE ON FUNCTION runs._get_key_segmentation(hstore) TO atum_user;
ALTER FUNCTION runs._get_id_partitioning(JSONB) OWNER TO atum_owner;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not GRANT EXECUTE and why owner and not the user? I thought that only DB, schemas, and tables are owned by owner, and that the functions are supposed to be executed as the user, not owner

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or rather, is the idea here that these 'private' functions shouldn't be called by the user at all? Would it work if the user executed a function that calls this function?

Copy link
Contributor Author

@benedeki benedeki Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • yes the _ at the beginning of the function name signs a "private" function.
  • private function is not to be called outside, therefore no grants; thanks to that it can be lenient on input sanitation
  • ownership automatically means execution right
  • the magic of the function being (indirectly) usable by outside user is in the SECURITY DEFINER specification in the functions - it means that all the code within the function will be executed with right of the ROLE that owns the function not the ROLE that called the function. This prevents the needs to maintain cascade of access rights and opening private functions to outside use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks a lot for such detailed explanation!

@benedeki benedeki marked this pull request as ready for review October 22, 2023 07:59
@benedeki
Copy link
Contributor Author

benedeki commented Oct 22, 2023

Still need to test, but generally should be ready.
It is ready now 😉

@benedeki benedeki removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Oct 24, 2023
ALTER TABLE runs.segmentations
ADD CONSTRAINT segmentations_unq UNIQUE (segmentation);
ALTER TABLE runs.partitionings
ADD CONSTRAINT segmentations_unq UNIQUE (partitioning);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constraint might also be renamed to partitioning_unq.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider indexing the partitioning JSONB column as it's being used in _get_id_partitioning function when looking up partitioning id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique constraint automatically means an index created.

@benedeki benedeki mentioned this pull request Nov 3, 2023
Copy link

JaCoCo agent module code coverage report - spark:2 - scala 2.12.18

There is no coverage information present for the Files changed

Total Project Coverage 83.32% 🍏

Copy link

JaCoCo server module code coverage report - scala 2.12.18

There is no coverage information present for the Files changed

Total Project Coverage 17.34%

id_measurement BIGINT NOT NULL DEFAULT global_id(),
fk_measure_definition BIGINT NOT NULL,
fk_checkpoint UUID NOT NULL,
measurement_value JSONB NOT NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we could even create a composite type, because we already know what the structure should be. What do you think? More flexibility can be good in some cases, but can bring unexpected situations/bad values in others.

Not now, I just wanted to understand your reasoning

CREATE TABLE runs.additional_data
(
id_additional_data BIGINT NOT NULL DEFAULT global_id(),
key_segmentation BIGINT NOT NULL,
fk_partitioning BIGINT NOT NULL,
ad_name TEXT NOT NULL,
ad_value TEXT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this will be a JSONB, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the id of the partitioning, the surrogate key of the partitionings table. All other tables reference it, instead of the bulky JSONB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understood you to be honest, apologies. I was referring to ad_name and ad_value - so this means that a sinle record in additional_data table will hold only 1 metadata record that is 'flat' like this. I thought that a single additional_data record should be a JSON structure-check our data model:

case class AdditionalDataDTO(
  additionalData: Map[String, Option[String]]
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's decomposed for better handling. Indexing, searching, patterning and merging.
If the DB is to understand the data (do operations on them), I really prefer if they are not hidden in complex types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then. Let's use 'flat' data types whenever possible. In case of partitioning and measurements - those could also be decomposed, but let's leave it all as is, since we'll need some flexibility given the current state of the project (and already existing implementation)

@lsulak
Copy link
Collaborator

lsulak commented Nov 11, 2023

File list_flows.sql from the future/ dir still has Copyright 2022 ABSA Group Limited, it's the only one left, please if you can also change it so that the whole repo has it right

Copy link
Collaborator

@lsulak lsulak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benedeki benedeki merged commit 355feb7 into master Nov 11, 2023
6 of 7 checks passed
@benedeki benedeki deleted the feaure/48-review-of-the-db-model-and-functions branch November 11, 2023 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review of the db model and functions
3 participants