-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#48: Review of the db model and functions #82
#48: Review of the db model and functions #82
Conversation
* replaced HSTORE with JSONB * segmentation expression changed to partitioning * major refactoring of functions and tables * root files renamed to adhere to fa-db `deploy_db.py` script * functions not (fully) implemented moved to subfolders `future`
JaCoCo agent module code coverage report - spark:2 - scala 2.12.12
|
JaCoCo server module code coverage report - scala 2.12.12
|
|
||
RETURN; | ||
END; | ||
$$ | ||
LANGUAGE plpgsql VOLATILE SECURITY DEFINER; | ||
|
||
GRANT EXECUTE ON FUNCTION runs._get_key_segmentation(hstore) TO atum_user; | ||
ALTER FUNCTION runs._get_id_partitioning(JSONB) OWNER TO atum_owner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not GRANT EXECUTE and why owner and not the user? I thought that only DB, schemas, and tables are owned by owner, and that the functions are supposed to be executed as the user, not owner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or rather, is the idea here that these 'private' functions shouldn't be called by the user at all? Would it work if the user executed a function that calls this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes the
_
at the beginning of the function name signs a "private" function. - private function is not to be called outside, therefore no grants; thanks to that it can be lenient on input sanitation
- ownership automatically means execution right
- the magic of the function being (indirectly) usable by outside user is in the
SECURITY DEFINER
specification in the functions - it means that all the code within the function will be executed with right of the ROLE that owns the function not the ROLE that called the function. This prevents the needs to maintain cascade of access rights and opening private functions to outside use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks a lot for such detailed explanation!
Co-authored-by: Ladislav Sulak <[email protected]>
|
…s://github.com/AbsaOSS/atum-service into feaure/48-review-of-the-db-model-and-functions
ALTER TABLE runs.segmentations | ||
ADD CONSTRAINT segmentations_unq UNIQUE (segmentation); | ||
ALTER TABLE runs.partitionings | ||
ADD CONSTRAINT segmentations_unq UNIQUE (partitioning); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constraint might also be renamed to partitioning_unq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider indexing the partitioning JSONB column as it's being used in _get_id_partitioning function when looking up partitioning id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique constraint automatically means an index created.
JaCoCo agent module code coverage report - spark:2 - scala 2.12.18
|
JaCoCo server module code coverage report - scala 2.12.18
|
id_measurement BIGINT NOT NULL DEFAULT global_id(), | ||
fk_measure_definition BIGINT NOT NULL, | ||
fk_checkpoint UUID NOT NULL, | ||
measurement_value JSONB NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we could even create a composite type, because we already know what the structure should be. What do you think? More flexibility can be good in some cases, but can bring unexpected situations/bad values in others.
Not now, I just wanted to understand your reasoning
CREATE TABLE runs.additional_data | ||
( | ||
id_additional_data BIGINT NOT NULL DEFAULT global_id(), | ||
key_segmentation BIGINT NOT NULL, | ||
fk_partitioning BIGINT NOT NULL, | ||
ad_name TEXT NOT NULL, | ||
ad_value TEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that this will be a JSONB, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the id of the partitioning, the surrogate key of the partitionings table. All other tables reference it, instead of the bulky JSONB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understood you to be honest, apologies. I was referring to ad_name
and ad_value
- so this means that a sinle record in additional_data
table will hold only 1 metadata record that is 'flat' like this. I thought that a single additional_data record should be a JSON structure-check our data model:
case class AdditionalDataDTO(
additionalData: Map[String, Option[String]]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's decomposed for better handling. Indexing, searching, patterning and merging.
If the DB is to understand the data (do operations on them), I really prefer if they are not hidden in complex types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then. Let's use 'flat' data types whenever possible. In case of partitioning and measurements - those could also be decomposed, but let's leave it all as is, since we'll need some flexibility given the current state of the project (and already existing implementation)
database/src/main/postgres/runs/create_partitioning_if_not_exists.sql
Outdated
Show resolved
Hide resolved
database/src/main/postgres/runs/create_partitioning_if_not_exists.sql
Outdated
Show resolved
Hide resolved
Co-authored-by: Ladislav Sulak <[email protected]>
File |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- code reviewed
- pulled
- built
- ran against PR - I'll do it once I'll finish Create the FA-DB entities for the DB functions #23, I don't want to test similar/same things multiple times at the moment
deploy_db.py
scriptfuture
NB!
Please ignore the content of the
future
folders. They contain eventually useful code that has been developed before, but right now out of scope.Closes #48