-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#69: partitioning check function on DB #109
#69: partitioning check function on DB #109
Conversation
JaCoCo agent module code coverage report - spark:2 - scala 2.12.12
|
JaCoCo server module code coverage report - scala 2.12.12
|
JaCoCo agent module code coverage report - spark:2 - scala 2.12.18
|
JaCoCo server module code coverage report - scala 2.12.18
|
@salamonpavel
|
Ok. Let's make sure though that the application level validations and db level validations are always in sync. Especially if we expose the json schema to third parties it shouldn't validate data that the db would reject. |
Given that we plan to perform data validation within our application, I believe it would be beneficial to adopt a 'fail-fast' approach at the database level. This would allow us to quickly identify and reject any invalid data, thereby enhancing the efficiency of our operations. I absolutely understand and appreciate your commitment to ensuring the integrity of our data. However, I've noticed that our SQL code currently includes tasks such as beautifying JSONs and composing messages. While these tasks are undoubtedly important, I feel they would be better suited to our Scala codebase. SQL excels at data manipulation and retrieval, but it might not be the best tool for tasks that involve complex string manipulation or error handling. By moving these tasks to our Scala code, we can make our SQL code simpler and more focused on its core responsibilities. Therefore, I propose that we simplify our SQL code to perform validation checks and return a boolean value. If the validation fails, we can immediately halt the operation ('fail-fast'). Any additional handling, such as generating error messages or beautifying JSONs, can be managed in our Scala application code. I believe this approach will allow us to leverage the strengths of both SQL and Scala, leading to more maintainable and efficient code. |
Hi @salamonpavel and @benedeki apologies for late response. My take on your thoughts and comments:
From @salamonpavel:
No longer this is the case
I think this might be perhaps dropped and the SQL function can be simplified to return only BOOL, perhaps with status codes if needed - I'll think about the future usecases of the main validation function - if I won't caome up with any, I'll drop it |
…ning-validation-db-functions
JaCoCo server module code coverage report - scala 2.13.11
|
* limitations under the License. | ||
*/ | ||
|
||
ALTER TABLE runs.partitionings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I don't like the migration scripts. Unclear end state. 😞
JaCoCo agent module code coverage report - scala 2.12.18
|
JaCoCo model module code coverage report - scala 2.12.18
|
Closes #69
This PR implements Partitioning validation on the DB side. The following validations were implemented:
The following were not implemented:
create_partitioning_if_not_exist
DB function does not do any normalization, so the check also shouldn't; if we decide to do it, then on both places. (cc @benedeki)create_partitioning_if_not_exist
is not aware of our plans to be able to have partitioning patterns - I still don't know exactly how we are gonna create and use them, so I didn't introduce such parameter / logic there yet. However, since we know that we'll have it, and already talked about the distinction between that and an actual (non-pattern) partitioning (i.e. in patterns values can be NULLs/empty strings and in non-patterns values can't be NULLs/empty strings), I already implemented all needed in the partitioning validation checks. (cc @benedeki)If you want to test this, just deploy it to your localhost PG DB and run some of this:
and