-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spline Support of Statment Create table as select #784
Comments
It should be supported. |
Hi from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import col
# Assuming SparkSession is already available as 'spark' in Databricks
# spark = SparkSession.builder.appName("example").getOrCreate()
# Define the schema for the table
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
# Create a DataFrame with the schema
data = [(1, "Alice", 30), (2, "Bob", 35), (3, "Charlie", 40)]
df = spark.createDataFrame(data, schema)
# Define the table name and path (replace with your desired table name and path)
silver_table_name = "silver_table_name1"
path = "/mnt/delta/" + silver_table_name
# Write the DataFrame to the table
df.write.format("delta").mode("overwrite").save(path)
# Register the table in the metastore
spark.sql(f"CREATE TABLE {silver_table_name} USING DELTA LOCATION '{path}'")
# Create a materialized view from the silver layer table
materialized_view_name = "your_materialized_view"
create_mv_sql = f"""
CREATE VIEW {materialized_view_name}
AS SELECT * FROM {silver_table_name}
"""
spark.sql(create_mv_sql)
# Optionally, verify the contents of the table and materialized view
spark.sql(f"SELECT * FROM {silver_table_name}").show()
spark.sql(f"SELECT * FROM {materialized_view_name}").show() and according to the execution plan i got in spline |
@wajda -- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.V_Accounts1;
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE lineage_data.lineagedemo.Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE lineage_data.lineagedemo.V_Accounts1
USING DELTA AS
SELECT *
FROM lineage_data.lineagedemo.Accounts1; |
@wajda if i add Using delta it works also in create as select -- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS V_Accounts1;
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE V_Accounts1
USING DELTA AS
SELECT *
FROM Accounts1; |
It might be possible that
or Another small thing to try is to enable non persistent actions capturing like this: spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true If you still do no see any mentioning of your "VIEW" anywhere - in logs, errors, or lineage - then it's very likely that Spark simply do not provide the information about that action at the first place. Please try and let us know what you've found. |
I will try |
I have added spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true |
By the way,Unity Catalog of Databricks support create view as select * from table |
spark.sparkContext.setLogLevel("DEBUG") |
I put on the head of my Databricks notebook %python
spark.sparkContext.setLogLevel("debug") %python
sc._jvm.za.co.absa.spline.harvester.SparkLineageInitializer.enableLineageTracking(spark._jsparkSession) %SQL
-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE OR REPLACE view V_Accounts1
AS
SELECT *
FROM Accounts1; But dont see any error on the driver logs |
@wajda I faced the similar issue and was able to see the plan. Although I don't have details right now to see what spline is generating for CTAS, I identified that because of below change in Spark caused this issue. Although this change is applicable starting Spark 3.3, usually databricks ports back changes to older version as well. This was true for ListQuery expression as well I fixed earlier in Spline. Need to extract the logical plan class correctly for Spline to work apache/spark@aa1b16b#diff-594addaa7ed3dd43fd0dad5fa81ade5ec2e570adf7c982c0570e3358a08a8e0a |
I have a statment in a Notebook in Databricks that is
the lineage is not shown
Only on
insert into
- it shows?Is it by design
I use version
spark_3_3_spline_agent_bundle_2_12_2_1_0_SNAPSHOT.jar
The text was updated successfully, but these errors were encountered: