splitSql too aggressive in Redshift for stored procedure definition #381

MPagel · 2024-12-17T17:24:09Z

splitSql does not consider $$ or $LABEL$ to constitute valid code block toggles that are used in plpgsql (redshift and postgress).

e.g.

CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$
DECLARE
	i int DEFAULT 1;
BEGIN
	CREATE TABLE IF NOT EXISTS testtab (
			id INT NOT NULL DEFAULT i
			,data varchar(50) DEFAULT NULL
			)
	DISTSTYLE ALL;
	INSERT INTO testtab(id, data) VALUES(i,$1);
END; $$ LANGUAGE plpgsql;

is a valid single SQL statement

however splitSQL attempts to break the stored procedure into three separate statements
splitSql("CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1;\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND; $$ LANGUAGE plpgsql;\n")

[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1"
[2] "BEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND;"
[3] "$$ LANGUAGE plpgsql"

If not using a DECLARE section, you get 2 total statements
[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(1,$1);\nEND;"
[2] "$$ LANGUAGE plpgsql"

Then executing these leads to

com.amazon.redshift.util.RedshiftException: Unterminated dollar quote started at position nn in SQL CREATE OR REPLACE PROCEDURE ... i int DEFAULT 1;. Expected terminating $$

To test that it the original statement at top is valid, you can execute (using DBeaver, OmniDB, etc) that statement followed by

truncate testtab;
CALL testproc('abc');
select * from testtab;

id "data"
1 abc

insert into testtab values (2,'mno');
insert into testtab(data) values ('def');
insert into testtab(id) values (7);
select * from testtab;

id "data"
1 abc
2 mno
1 def
7 [NULL]

possibly relevant refs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_PROCEDURE.html
https://www.postgresql.org/docs/current/sql-createprocedure.html

The text was updated successfully, but these errors were encountered:

MPagel · 2024-12-17T17:47:07Z

In order to handle $$ (but not $LABEL$ ) properly, I think boolean doubleDollar = false; String doubleDollarText = "$$"; needs to be inserted before

SqlRender/java/org/ohdsi/sql/SqlSplit.java

Line 40 in a0dad78

for (cursor = start; cursor < tokens.size(); cursor++) {

then add an else block into the if statement that follows (L42)

else if (token.text.equals(doubleDollarText)) {
  if (doubleDollar) { // closing an open block
    lastPop = nestStack.pop();
  } else { // open a $$ block
    nestStack.push(token.text);
  }
  doubleDollar = !(doubleDollar); //toggle
}

This has not been tested, and I don't know if the BEGIN/CASE and END logic would need to be altered.

MPagel · 2024-12-18T19:04:29Z

my workaround for now is to use a modified version of N3C's parse_sql function to do the splitting of one query from the next, rather than using DatabaseConnector::executeSQL. Then I overrode executeSQL by adding a noSplit with default value FALSE to the function parameters and added a block between lines 416 and 417
https://github.com/OHDSI/DatabaseConnector/blob/46eb774beae80714efa7a6ad39558c478ceb597c/R/Sql.R#L416

  if (noSplit) { 
    sqlStatements <- sql 
  } else {
    sqlStatements <- SqlRender::splitSql(sql)
  }

schuemie · 2024-12-19T15:40:37Z

The main purpose of SqlRender is to translate from one SQL dialect (OHDSISql) to others. splitSql() is mostly a convenience function for afterwards. I don't think the RedShift SQL you're referring to is the result of a translation?

MPagel · 2024-12-19T16:48:06Z

The code that I'm trying to wrap into a procedure/function is from N3C's phenotype scripts, which I believe was the result of code translated from T-SQL.

Further, N3C uses the OHDSI DatabaseConnector package to execute the queries by way of executeSql. executeSql calls SqlRender::splitSql, which is why I posted here. I will see if I can bypass the call out to splitSql by calling lowLevelExecuteSql instead of executeSql, seeing as I've already done the SQL splitting.

I will also open a pull request over at DatabaseConnector to allow override of the call out to splitSql within (the non-low-level) executeSql and will reference this issue.

schuemie · 2025-01-30T07:41:59Z

I'm still trying to understand the actual problem you are trying to solve. So far we have done just fine in OHDSI without having to define custom procedures, especially on RedShift which is IMHO the most well-behaved database platform. I don't see the custom procedure in the sQL on the N3C website you referenced.

What do you need the custom procedure for that requires either this fundamental change in SqlRender or a fundamental change in DatabaseConnector?

MPagel · 2025-01-31T16:31:48Z

I am taking data from multiple institutional sources from different schemas of my database and combining for N3C's ingestion. In order to determine a unified cohort, I can't follow the N3C dataflow that removes individuals from my cohort if not found in the data, as some individuals are in only one of the two data sources (whereas most are in both). Nor can I maintain separate cohorts for each data set, because some of the negative controls as determined for one dataset may be present in the covid positive group on the other database, so are inappropriate to use as negative controls and new negative controls would have to be selected.

As I have to do some more complicated logic where I have to run the same base query for each of the data sources, it is "simpler" to put that query in a stored procedure and call with slightly modified input parameters for each run than to maintain two copies of the same query (that could lead to code drift over time if one query is changed but not the other).

So, to make a long story short, I have a use case where I want to augment the N3C dataflow and the OHDSI database connector functionality in my own local copy. But generally, I think it would be handy to allow for SQLRender to properly handle stored procedures, because SQLRender serves as a good starting library for doing any work in R that interacts with databases in a range of SQL dialects, OHDSI-related project or not.

schuemie · 2025-02-03T11:04:05Z

Thanks for explaining! Have you thought about using SqlRender's parameterized SQL to address the problem of query modification? That would be in line with the SqlRender and DatabaseConnector flow, and would work across different database platforms without issue.

MPagel mentioned this issue Dec 19, 2024

add noSplit option to executeSql OHDSI/DatabaseConnector#298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

splitSql too aggressive in Redshift for stored procedure definition #381

splitSql too aggressive in Redshift for stored procedure definition #381

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 18, 2024 •

edited

Loading

schuemie commented Dec 19, 2024

MPagel commented Dec 19, 2024 •

edited

Loading

schuemie commented Jan 30, 2025

MPagel commented Jan 31, 2025 •

edited

Loading

schuemie commented Feb 3, 2025

splitSql too aggressive in Redshift for stored procedure definition #381

splitSql too aggressive in Redshift for stored procedure definition #381

Comments

MPagel commented Dec 17, 2024 • edited Loading

MPagel commented Dec 17, 2024 • edited Loading

MPagel commented Dec 18, 2024 • edited Loading

schuemie commented Dec 19, 2024

MPagel commented Dec 19, 2024 • edited Loading

schuemie commented Jan 30, 2025

MPagel commented Jan 31, 2025 • edited Loading

schuemie commented Feb 3, 2025

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 18, 2024 •

edited

Loading

MPagel commented Dec 19, 2024 •

edited

Loading

MPagel commented Jan 31, 2025 •

edited

Loading