#48: Review of the db model and functions #82

benedeki · 2023-09-28T07:59:18Z

replaced HSTORE with JSONB
segmentation expression changed to partitioning
major refactoring of functions and tables
root files renamed to adhere to fa-db deploy_db.py script
functions not (fully) implemented moved to subfolders future

NB!
Please ignore the content of the future folders. They contain eventually useful code that has been developed before, but right now out of scope.

Closes #48

* replaced HSTORE with JSONB * segmentation expression changed to partitioning * major refactoring of functions and tables * root files renamed to adhere to fa-db `deploy_db.py` script * functions not (fully) implemented moved to subfolders `future`

github-actions · 2023-09-28T08:01:48Z

JaCoCo agent module code coverage report - spark:2 - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage	83.26%	🍏

github-actions · 2023-09-28T08:01:49Z

JaCoCo server module code coverage report - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage	17.14%	❌

database/src/main/01_users.ddl

lsulak · 2023-10-17T13:44:16Z

database/src/main/runs/_get_id_partitioning.sql


    RETURN;
 END;
 $$
 LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

-GRANT EXECUTE ON FUNCTION runs._get_key_segmentation(hstore) TO atum_user;
+ALTER FUNCTION runs._get_id_partitioning(JSONB) OWNER TO atum_owner;


why not GRANT EXECUTE and why owner and not the user? I thought that only DB, schemas, and tables are owned by owner, and that the functions are supposed to be executed as the user, not owner

or rather, is the idea here that these 'private' functions shouldn't be called by the user at all? Would it work if the user executed a function that calls this function?

yes the _ at the beginning of the function name signs a "private" function.

private function is not to be called outside, therefore no grants; thanks to that it can be lenient on input sanitation

ownership automatically means execution right

the magic of the function being (indirectly) usable by outside user is in the SECURITY DEFINER specification in the functions - it means that all the code within the function will be executed with right of the ROLE that owns the function not the ROLE that called the function. This prevents the needs to maintain cascade of access rights and opening private functions to outside use.

awesome, thanks a lot for such detailed explanation!

database/src/main/runs/_write_measurement.sql

database/src/main/runs/checkpoints.ddl

database/src/main/runs/future/write_measurement.sql

database/src/main/runs/checkpoint_measure_definitions.ddl

database/src/main/runs/write_checkpoint.sql

database/src/main/runs/checkpoints.ddl

database/src/main/runs/write_checkpoint.sql

Co-authored-by: Ladislav Sulak <[email protected]>

…ions

benedeki · 2023-10-22T07:59:33Z

~~Still need to test, but generally should be ready.~~
It is ready now 😉

…s://github.com/AbsaOSS/atum-service into feaure/48-review-of-the-db-model-and-functions

salamonpavel · 2023-10-31T11:50:29Z

database/src/main/postgres/runs/partitionings.ddl

-ALTER TABLE runs.segmentations
-    ADD CONSTRAINT segmentations_unq UNIQUE (segmentation);
+ALTER TABLE runs.partitionings
+    ADD CONSTRAINT segmentations_unq UNIQUE (partitioning);


The constraint might also be renamed to partitioning_unq.

We might want to consider indexing the partitioning JSONB column as it's being used in _get_id_partitioning function when looking up partitioning id.

Unique constraint automatically means an index created.

* fixes identified by tests

…ions

…sion

github-actions · 2023-11-10T14:03:56Z

JaCoCo agent module code coverage report - spark:2 - scala 2.12.18

There is no coverage information present for the Files changed

Total Project Coverage	83.32%	🍏

github-actions · 2023-11-10T14:03:57Z

JaCoCo server module code coverage report - scala 2.12.18

There is no coverage information present for the Files changed

Total Project Coverage	17.34%	❌

database/src/main/postgres/flows/_create_flow.sql

database/src/main/postgres/flows/_add_to_parent_flows.sql

database/src/main/postgres/runs/partitionings.ddl

database/src/main/postgres/runs/_get_id_partitioning.sql

database/src/main/postgres/runs/write_checkpoint.sql

lsulak · 2023-11-10T14:35:25Z

database/src/main/postgres/runs/measurements.ddl

+    id_measurement                      BIGINT NOT NULL DEFAULT global_id(),
+    fk_measure_definition               BIGINT NOT NULL,
+    fk_checkpoint                       UUID NOT NULL,
+    measurement_value                   JSONB NOT NULL,


I think here we could even create a composite type, because we already know what the structure should be. What do you think? More flexibility can be good in some cases, but can bring unexpected situations/bad values in others.

Not now, I just wanted to understand your reasoning

database/src/main/postgres/runs/checkpoints.ddl

lsulak · 2023-11-10T14:38:09Z

database/src/main/postgres/runs/additional_data.ddl

 CREATE TABLE runs.additional_data
 (
    id_additional_data      BIGINT NOT NULL DEFAULT global_id(),
-    key_segmentation        BIGINT NOT NULL,
+    fk_partitioning         BIGINT NOT NULL,
    ad_name                 TEXT NOT NULL,
    ad_value                TEXT,


I thought that this will be a JSONB, why not?

It's the id of the partitioning, the surrogate key of the partitionings table. All other tables reference it, instead of the bulky JSONB

I'm not sure if I understood you to be honest, apologies. I was referring to ad_name and ad_value - so this means that a sinle record in additional_data table will hold only 1 metadata record that is 'flat' like this. I thought that a single additional_data record should be a JSON structure-check our data model:

case class AdditionalDataDTO( additionalData: Map[String, Option[String]] )

It's decomposed for better handling. Indexing, searching, patterning and merging.
If the DB is to understand the data (do operations on them), I really prefer if they are not hidden in complex types.

Okay then. Let's use 'flat' data types whenever possible. In case of partitioning and measurements - those could also be decomposed, but let's leave it all as is, since we'll need some flexibility given the current state of the project (and already existing implementation)

database/src/main/postgres/runs/create_partitioning_if_not_exists.sql

database/src/main/postgres/runs/_write_measurement.sql

Co-authored-by: Ladislav Sulak <[email protected]>

lsulak · 2023-11-11T11:33:42Z

lsulak

code reviewed
pulled
built
ran against PR - I'll do it once I'll finish Create the FA-DB entities for the DB functions #23, I don't want to test similar/same things multiple times at the moment

…ions

benedeki added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 28, 2023

benedeki self-assigned this Sep 28, 2023

benedeki mentioned this pull request Oct 9, 2023

Replace hstore usage with the composite type and the new internal storage type #70

Closed

lsulak reviewed Oct 10, 2023

View reviewed changes

database/src/main/01_users.ddl Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/_write_measurement.sql Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/checkpoints.ddl Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/future/write_measurement.sql Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/checkpoint_measure_definitions.ddl Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/write_checkpoint.sql Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/write_checkpoint.sql Outdated Show resolved Hide resolved

lsulak reviewed Oct 17, 2023

View reviewed changes

database/src/main/runs/checkpoints.ddl Show resolved Hide resolved

benedeki commented Oct 19, 2023

View reviewed changes

database/src/main/runs/write_checkpoint.sql Outdated Show resolved Hide resolved

benedeki and others added 2 commits October 19, 2023 11:30

Apply suggestions from code review

9716751

Co-authored-by: Ladislav Sulak <[email protected]>

* finished review

66c2c06

benedeki marked this pull request as ready for review October 22, 2023 07:59

benedeki requested review from TebaleloS, Zejnilovic, dk1844 and salamonpavel as code owners October 22, 2023 07:59

Merge branch 'master' into feaure/48-review-of-the-db-model-and-funct…

bfaed44

…ions

benedeki removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Oct 24, 2023

benedeki mentioned this pull request Oct 26, 2023

#50: Create or adjust endpoint for accepting and saving checkpoint data #74

Merged

benedeki added 2 commits October 31, 2023 00:49

* new directory structure

d1bad1a

Merge branch 'feaure/48-review-of-the-db-model-and-functions' of http…

50d33d7

…s://github.com/AbsaOSS/atum-service into feaure/48-review-of-the-db-model-and-functions

salamonpavel reviewed Oct 31, 2023

View reviewed changes

benedeki added 3 commits November 2, 2023 08:14

* switched parameters order in runs.create_partitioning_if_not_exists

8896d26

* fixes

586bd19

* comment fix

566e71b

benedeki mentioned this pull request Nov 3, 2023

#103: Add DB tests #110

Merged

benedeki and others added 3 commits November 10, 2023 09:27

* renames based on chat agreements

49b91c2

* fixes identified by tests

Merge branch 'master' into feaure/48-review-of-the-db-model-and-funct…

8b0c51b

…ions

* Fixed unique constraint name to avoid the old "segmentation" expres…

f212690

…sion