Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48530][SQL] Support for local variables in SQL Scripting #49445

Open
wants to merge 56 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
73cb01b
first commit
dusantism-db Dec 24, 2024
813d282
POC works
dusantism-db Dec 24, 2024
1c08f57
make column res helper more functional
dusantism-db Dec 25, 2024
18da02f
move variables map to SqlScriptingScope
dusantism-db Dec 25, 2024
cee5f1a
implement proper namespace (scope label name) for local variables
dusantism-db Dec 27, 2024
47934ab
qualified names
dusantism-db Dec 30, 2024
399d4e8
update todos
dusantism-db Jan 3, 2025
6efe764
resolve catalogs + check for duplicates
dusantism-db Jan 3, 2025
769607d
set variable and normalized identifiers
dusantism-db Jan 3, 2025
6225956
resolve fully qualified session vars in tempvarManager only and updat…
dusantism-db Jan 6, 2025
241fc05
tests first batch
dusantism-db Jan 6, 2025
068e1ec
add more tests
dusantism-db Jan 8, 2025
60335db
add error messages, more tests and some comments
dusantism-db Jan 8, 2025
65b69d3
rename TempVariableManager.scala and add more tests
dusantism-db Jan 8, 2025
fe5dc7b
remove old logic for dropping variables, update tests and add more tests
dusantism-db Jan 8, 2025
4f8d2c1
add cleanup for scripting execution, separate drop and create variabl…
dusantism-db Jan 9, 2025
ba5b8d2
fix resolvecatalogs and add more tests
dusantism-db Jan 9, 2025
33f0aac
refactor to support properly setting variables
dusantism-db Jan 9, 2025
be6052f
add error message for system and session label names
dusantism-db Jan 9, 2025
4b1e8e1
small fixes and cleanup
dusantism-db Jan 10, 2025
90b106b
Fix duplicate detection for set variablwe
dusantism-db Jan 10, 2025
7ba0923
Add test for DECLARE OR REPLACE but ignore it until FOR is fixed
dusantism-db Jan 10, 2025
cd4e932
execute immediate don't resolve vars from scripts. Problem remains wi…
dusantism-db Jan 10, 2025
fdf3c5a
cleanup
dusantism-db Jan 10, 2025
52cbd17
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Jan 13, 2025
c134fd4
fix merge mistake
dusantism-db Jan 13, 2025
3ea762d
fix merge mistake 2
dusantism-db Jan 13, 2025
8e9352a
fix comments
dusantism-db Jan 15, 2025
78042e3
Update CreateVar, SetVar and lookupVariable to work with Execute Imme…
dusantism-db Jan 16, 2025
40ffa83
add enum for lookup variable mode
dusantism-db Jan 17, 2025
4a546a4
convert scripting variable manager to threadlocal
dusantism-db Jan 17, 2025
15d5554
fix e2e test
dusantism-db Jan 17, 2025
a2b20c5
add comment
dusantism-db Jan 17, 2025
e3077a4
add comment and regenerate golden files
dusantism-db Jan 21, 2025
6ce8f9c
fix failing test
dusantism-db Jan 21, 2025
ccab52c
refactor SqlScriptingVariableManager to be LexicalThreadLocal singlet…
dusantism-db Jan 22, 2025
370bf65
renames
dusantism-db Jan 23, 2025
0cea838
tagging approach
dusantism-db Jan 24, 2025
9895c69
Revert "tagging approach"
dusantism-db Jan 24, 2025
cd888dd
analysiscontext withExecuteImmediate
dusantism-db Jan 24, 2025
4fe7ab5
remove into clause flag
dusantism-db Jan 25, 2025
8a6b536
address comments
dusantism-db Jan 27, 2025
db573c1
remove parameter from lookupVariable
dusantism-db Jan 27, 2025
dadd517
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Jan 27, 2025
680e5d7
resolve comments 1
dusantism-db Feb 6, 2025
7d3008e
Merge remote-tracking branch 'upstream/master' into scripting-local-v…
dusantism-db Feb 6, 2025
901aa6c
improve logic to work with exception handlers
dusantism-db Feb 7, 2025
34677c7
add check for existing variable in set, and add comments to findVariable
dusantism-db Feb 7, 2025
b814b97
Introduce FakeLocalCatalog, remove sessionVariablesOnly flags and upd…
dusantism-db Feb 7, 2025
8074c63
resolve comments
dusantism-db Feb 7, 2025
220aeae
throw error if var not found in setvarexec
dusantism-db Feb 7, 2025
45ca867
resolve comments again
dusantism-db Feb 10, 2025
e1f1098
update resolvecatalogs according to wenchens comments, and forbid dro…
dusantism-db Feb 12, 2025
61a753f
change variableManager api to use only nameParts and VariableDefiniti…
dusantism-db Feb 12, 2025
f29a8fc
add test for drop session var
dusantism-db Feb 12, 2025
e184f8c
forbid session, builtin and sys* label names
dusantism-db Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions common/utils/src/main/resources/error/error-conditions.json
Original file line number Diff line number Diff line change
Expand Up @@ -3592,6 +3592,16 @@
"message" : [
"Variable <varName> can only be declared at the beginning of the compound."
]
},
"QUALIFIED_LOCAL_VARIABLE" : {
"message" : [
"The variable <varName> must be declared without a qualifier, as qualifiers are not allowed for local variable declarations."
]
},
"REPLACE_LOCAL_VARIABLE" : {
"message" : [
"The variable <varName> does not support DECLARE OR REPLACE, as local variables cannot be replaced."
]
}
},
"sqlState" : "42K0M"
Expand Down Expand Up @@ -3738,6 +3748,12 @@
],
"sqlState" : "42K0L"
},
"LABEL_NAME_FORBIDDEN" : {
"message" : [
"The label name <label> is forbidden."
],
"sqlState" : "42K0L"
},
"LOAD_DATA_PATH_NOT_EXISTS" : {
"message" : [
"LOAD DATA input path does not exist: <path>."
Expand Down Expand Up @@ -5803,6 +5819,11 @@
"SQL Scripting is under development and not all features are supported. SQL Scripting enables users to write procedural SQL including control flow and error handling. To enable existing features set <sqlScriptingEnabled> to `true`."
]
},
"SQL_SCRIPTING_DROP_TEMPORARY_VARIABLE" : {
"message" : [
"DROP TEMPORARY VARIABLE is not supported within SQL scripts. To bypass this, use `EXECUTE IMMEDIATE 'DROP TEMPORARY VARIABLE ...'` ."
]
},
"SQL_SCRIPTING_WITH_POSITIONAL_PARAMETERS" : {
"message" : [
"Positional parameters are not supported with SQL Scripting."
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.util

/**
* Helper trait for defining thread locals with lexical scoping. With this helper, the thread local
* is private and can only be set by the [[Handle]]. The [[Handle]] only exposes the thread local
* value to functions passed into its [[runWith]] method. This pattern allows for the lifetime of
* the thread local value to be strictly controlled.
*
* Rather than calling `tl.set(...)` and `tl.remove()` you would get a handle and execute your code
* in `handle.runWith { ... }`.
*
* Example:
* {{{
* object Credentials extends LexicalThreadLocal[Int] {
* def create(creds: Map[String, String]) = new Handle(Some(creds))
* }
* ...
* val handle = Credentials.create(Map("key" -> "value"))
* assert(Credentials.get() == None)
* handle.runWith {
* assert(Credentials.get() == Some(Map("key" -> "value")))
* }
* }}}
*/
trait LexicalThreadLocal[T] {
private val tl = new ThreadLocal[T]

private def set(opt: Option[T]): Unit = {
opt match {
case Some(x) => tl.set(x)
case None => tl.remove()
}
}

protected def createHandle(opt: Option[T]): Handle = new Handle(opt)

def get(): Option[T] = Option(tl.get)

/** Final class representing a handle to a thread local value. */
final class Handle private[LexicalThreadLocal] (private val opt: Option[T]) {
def runWith[R](f: => R): R = {
val old = get()
set(opt)
try f finally {
set(old)
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst

import org.apache.spark.sql.catalyst.catalog.VariableManager
import org.apache.spark.util.LexicalThreadLocal

object SqlScriptingLocalVariableManager extends LexicalThreadLocal[VariableManager] {
def create(variableManager: VariableManager): Handle = createHandle(Option(variableManager))
}
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ object FakeV2SessionCatalog extends TableCatalog with FunctionCatalog with Suppo
* even if a temp view `t` has been created.
* @param outerPlan The query plan from the outer query that can be used to resolve star
* expressions in a subquery.
* @param isExecuteImmediate Whether the current plan is created by EXECUTE IMMEDIATE. Used when
* resolving variables, as SQL Scripting local variables should not be
* visible from EXECUTE IMMEDIATE.
cloud-fan marked this conversation as resolved.
Show resolved Hide resolved
*/
case class AnalysisContext(
catalogAndNamespace: Seq[String] = Nil,
Expand All @@ -154,6 +157,7 @@ case class AnalysisContext(
referredTempFunctionNames: mutable.Set[String] = mutable.Set.empty,
referredTempVariableNames: Seq[Seq[String]] = Seq.empty,
outerPlan: Option[LogicalPlan] = None,
isExecuteImmediate: Boolean = false,

/**
* This is a bridge state between this fixed-point [[Analyzer]] and a single-pass [[Resolver]].
Expand Down Expand Up @@ -208,7 +212,16 @@ object AnalysisContext {
originContext.relationCache,
viewDesc.viewReferredTempViewNames,
mutable.Set(viewDesc.viewReferredTempFunctionNames: _*),
viewDesc.viewReferredTempVariableNames)
viewDesc.viewReferredTempVariableNames,
isExecuteImmediate = originContext.isExecuteImmediate)
set(context)
try f finally { set(originContext) }
}

def withExecuteImmediateContext[A](f: => A): A = {
val originContext = value.get()
val context = originContext.copy(isExecuteImmediate = true)

set(context)
try f finally { set(originContext) }
}
Expand Down Expand Up @@ -325,7 +338,10 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor

override def batches: Seq[Batch] = Seq(
Batch("Substitution", fixedPoint,
new SubstituteExecuteImmediate(catalogManager),
new SubstituteExecuteImmediate(
catalogManager,
resolveChild = executeSameContext,
checkAnalysis = checkAnalysis),
// This rule optimizes `UpdateFields` expression chains so looks more like optimization rule.
// However, when manipulating deeply nested schema, `UpdateFields` expression tree could be
// very complex and make analysis impossible. Thus we need to optimize `UpdateFields` early
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import scala.collection.mutable

import org.apache.spark.internal.Logging
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.SqlScriptingLocalVariableManager
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.SubExprUtils.wrapOuterReference
import org.apache.spark.sql.catalyst.plans.logical._
Expand Down Expand Up @@ -251,6 +252,14 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
}
}

/**
* Look up variable by nameParts.
* If in SQL Script, first check local variables, unless in EXECUTE IMMEDIATE
* (EXECUTE IMMEDIATE generated query cannot access local variables).
* if not found fall back to session variables.
* @param nameParts NameParts of the variable.
* @return Reference to the variable.
*/
def lookupVariable(nameParts: Seq[String]): Option[VariableReference] = {
// The temp variables live in `SYSTEM.SESSION`, and the name can be qualified or not.
def maybeTempVariableName(nameParts: Seq[String]): Boolean = {
Expand All @@ -266,22 +275,41 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
}
}

if (maybeTempVariableName(nameParts)) {
val variableName = if (conf.caseSensitiveAnalysis) {
nameParts.last
} else {
nameParts.last.toLowerCase(Locale.ROOT)
}
catalogManager.tempVariableManager.get(variableName).map { varDef =>
val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) {
nameParts
} else {
nameParts.map(_.toLowerCase(Locale.ROOT))
}

SqlScriptingLocalVariableManager.get()
// If we are in EXECUTE IMMEDIATE lookup only session variables.
.filterNot(_ => AnalysisContext.get.isExecuteImmediate)
// If variable name is qualified with session.<varName> treat it as a session variable.
cloud-fan marked this conversation as resolved.
Show resolved Hide resolved
.filter(_ =>
nameParts.length <= 2 && nameParts.init.map(_.toLowerCase(Locale.ROOT)) != Seq("session"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the function that validates the label name? We can do something like

.filterNot(nameParts => nameParts.length > 2 || (naneParts.length == 2 && !isValidLabelName(naneParts.head)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validSessionVariableName function will return different results depending on nameParts length. In your case we only call it when length is 2, which is correct behavior, but a little confusing in my opinion. The way it is now is easier to understand IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to have different tastes about the code style, but this is about functionality. The intention here is we skip looking up local variables if the name can never be local variables. We should check all the invalid label name patterns.

.flatMap(_.get(namePartsCaseAdjusted))
.map { varDef =>
VariableReference(
nameParts,
FakeSystemCatalog,
Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), variableName),
FakeLocalCatalog,
Identifier.of(Array(varDef.identifier.namespace().last), namePartsCaseAdjusted.last),
varDef)
}
} else {
None
}
.orElse(
if (maybeTempVariableName(nameParts)) {
catalogManager.tempVariableManager
.get(namePartsCaseAdjusted)
.map { varDef =>
VariableReference(
nameParts,
FakeSystemCatalog,
Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), namePartsCaseAdjusted.last),
varDef
)}
} else {
None
}
)
}

// Resolves `UnresolvedAttribute` to its value.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,13 @@ package org.apache.spark.sql.catalyst.analysis

import scala.jdk.CollectionConverters._

import org.apache.spark.SparkException
import org.apache.spark.sql.AnalysisException
import org.apache.spark.sql.catalyst.SqlScriptingLocalVariableManager
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.connector.catalog.{CatalogManager, CatalogPlugin, Identifier, LookupCatalog, SupportsNamespaces}
import org.apache.spark.sql.errors.DataTypeErrors.toSQLId
import org.apache.spark.sql.errors.QueryCompilationErrors
import org.apache.spark.util.ArrayImplicits._

Expand All @@ -35,10 +39,42 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
// We only support temp variables for now and the system catalog is not properly implemented
// yet. We need to resolve `UnresolvedIdentifier` for variable commands specially.
case c @ CreateVariable(UnresolvedIdentifier(nameParts, _), _, _) =>
val resolved = resolveVariableName(nameParts)
// From scripts we can only create local variables, which must be unqualified,
// and must not be DECLARE OR REPLACE.
val resolved = if (withinSqlScript) {
// TODO [SPARK-50785]: Uncomment this when For Statement starts properly using local vars.
// if (c.replace) {
// throw new AnalysisException(
// "INVALID_VARIABLE_DECLARATION.REPLACE_LOCAL_VARIABLE",
// Map("varName" -> toSQLId(nameParts))
// )
// }

if (nameParts.length != 1) {
dusantism-db marked this conversation as resolved.
Show resolved Hide resolved
throw new AnalysisException(
"INVALID_VARIABLE_DECLARATION.QUALIFIED_LOCAL_VARIABLE",
Map("varName" -> toSQLId(nameParts)))
}

SqlScriptingLocalVariableManager.get()
.getOrElse(throw SparkException.internalError(
"Scripting local variable manager should be present in SQL script."))
.qualify(nameParts.last)
} else {
val resolvedIdentifier = catalogManager.tempVariableManager.qualify(nameParts.last)

assertValidSessionVariableNameParts(nameParts, resolvedIdentifier)
resolvedIdentifier
}

c.copy(name = resolved)
case d @ DropVariable(UnresolvedIdentifier(nameParts, _), _) =>
dusantism-db marked this conversation as resolved.
Show resolved Hide resolved
val resolved = resolveVariableName(nameParts)
if (withinSqlScript) {
throw new AnalysisException(
"UNSUPPORTED_FEATURE.SQL_SCRIPTING_DROP_TEMPORARY_VARIABLE", Map.empty)
}
val resolved = catalogManager.tempVariableManager.qualify(nameParts.last)
assertValidSessionVariableNameParts(nameParts, resolved)
d.copy(name = resolved)

case UnresolvedIdentifier(nameParts, allowTemp) =>
Expand Down Expand Up @@ -73,28 +109,34 @@ class ResolveCatalogs(val catalogManager: CatalogManager)
}
}

private def resolveVariableName(nameParts: Seq[String]): ResolvedIdentifier = {
def ident: Identifier = Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), nameParts.last)
if (nameParts.length == 1) {
ResolvedIdentifier(FakeSystemCatalog, ident)
} else if (nameParts.length == 2) {
if (nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) {
ResolvedIdentifier(FakeSystemCatalog, ident)
} else {
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
}
} else if (nameParts.length == 3) {
if (nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) &&
nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) {
ResolvedIdentifier(FakeSystemCatalog, ident)
} else {
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
}
} else {
private def withinSqlScript: Boolean =
SqlScriptingLocalVariableManager.get().isDefined && !AnalysisContext.get.isExecuteImmediate

private def assertValidSessionVariableNameParts(
nameParts: Seq[String],
resolvedIdentifier: ResolvedIdentifier): Unit = {
if (!validSessionVariableName(nameParts)) {
throw QueryCompilationErrors.unresolvedVariableError(
nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE))
nameParts,
Seq(
resolvedIdentifier.catalog.name(),
resolvedIdentifier.identifier.namespace().head)
)
}

def validSessionVariableName(nameParts: Seq[String]): Boolean = nameParts.length match {
case 1 => true

// On declare variable, local variables support only unqualified names.
// On drop variable, local variables are not supported at all.
case 2 if nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE) => true

// When there are 3 nameParts the variable must be a fully qualified session variable
// i.e. "system.session.<varName>"
case 3 if nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) &&
nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE) => true

case _ => false
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,12 @@ class ResolveSetVariable(val catalogManager: CatalogManager) extends Rule[Logica
// Names are normalized when the variables are created.
// No need for case insensitive comparison here.
// TODO: we need to group by the qualified variable name once other catalogs support it.
val dups = resolvedVars.groupBy(_.identifier.name).filter(kv => kv._2.length > 1)
val dups = resolvedVars.groupBy(_.identifier).filter(kv => kv._2.length > 1)
if (dups.nonEmpty) {
throw new AnalysisException(
errorClass = "DUPLICATE_ASSIGNMENTS",
messageParameters = Map("nameList" -> dups.keys.map(toSQLId).mkString(", ")))
messageParameters = Map("nameList" ->
dups.keys.map(key => toSQLId(key.name())).mkString(", ")))
}

setVariable.copy(targetVariables = resolvedVars)
Expand Down
Loading