-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splitSql too aggressive in Redshift for stored procedure definition #381
Comments
In order to handle
then add an
This has not been tested, and I don't know if the BEGIN/CASE and END logic would need to be altered. |
my workaround for now is to use a modified version of N3C's parse_sql function to do the splitting of one query from the next, rather than using DatabaseConnector::executeSQL. Then I overrode executeSQL by adding a noSplit with default value FALSE to the function parameters and added a block between lines 416 and 417
|
The main purpose of SqlRender is to translate from one SQL dialect (OHDSISql) to others. |
The code that I'm trying to wrap into a procedure/function is from N3C's phenotype scripts, which I believe was the result of code translated from T-SQL. Further, N3C uses the OHDSI DatabaseConnector package to execute the queries by way of executeSql. executeSql calls SqlRender::splitSql, which is why I posted here. I will see if I can bypass the call out to splitSql by calling lowLevelExecuteSql instead of executeSql, seeing as I've already done the SQL splitting. I will also open a pull request over at DatabaseConnector to allow override of the call out to splitSql within (the non-low-level) executeSql and will reference this issue. |
I'm still trying to understand the actual problem you are trying to solve. So far we have done just fine in OHDSI without having to define custom procedures, especially on RedShift which is IMHO the most well-behaved database platform. I don't see the custom procedure in the sQL on the N3C website you referenced. What do you need the custom procedure for that requires either this fundamental change in SqlRender or a fundamental change in DatabaseConnector? |
I am taking data from multiple institutional sources from different schemas of my database and combining for N3C's ingestion. In order to determine a unified cohort, I can't follow the N3C dataflow that removes individuals from my cohort if not found in the data, as some individuals are in only one of the two data sources (whereas most are in both). Nor can I maintain separate cohorts for each data set, because some of the negative controls as determined for one dataset may be present in the covid positive group on the other database, so are inappropriate to use as negative controls and new negative controls would have to be selected. As I have to do some more complicated logic where I have to run the same base query for each of the data sources, it is "simpler" to put that query in a stored procedure and call with slightly modified input parameters for each run than to maintain two copies of the same query (that could lead to code drift over time if one query is changed but not the other). So, to make a long story short, I have a use case where I want to augment the N3C dataflow and the OHDSI database connector functionality in my own local copy. But generally, I think it would be handy to allow for SQLRender to properly handle stored procedures, because SQLRender serves as a good starting library for doing any work in R that interacts with databases in a range of SQL dialects, OHDSI-related project or not. |
Thanks for explaining! Have you thought about using SqlRender's parameterized SQL to address the problem of query modification? That would be in line with the SqlRender and DatabaseConnector flow, and would work across different database platforms without issue. |
splitSql does not consider
$$
or$LABEL$
to constitute valid code block toggles that are used in plpgsql (redshift and postgress).e.g.
is a valid single SQL statement
however splitSQL attempts to break the stored procedure into three separate statements
splitSql("CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1;\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND; $$ LANGUAGE plpgsql;\n")
[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1"
[2] "BEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND;"
[3] "$$ LANGUAGE plpgsql"
If not using a
DECLARE
section, you get 2 total statements[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(1,$1);\nEND;"
[2] "$$ LANGUAGE plpgsql"
Then executing these leads to
To test that it the original statement at top is valid, you can execute (using DBeaver, OmniDB, etc) that statement followed by
id "data"
1 abc
id "data"
1 abc
2 mno
1 def
7 [NULL]
possibly relevant refs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_PROCEDURE.html
https://www.postgresql.org/docs/current/sql-createprocedure.html
The text was updated successfully, but these errors were encountered: