Spline Agent on AWS Glue #414
-
Hi, I am trying to run the Spline Agent on AWS Glue (Which runs Spark). It seems to partially work, but I am not getting any lineage out on the logs, or console (can't see any) or HTTP Dispatcher. I know that the spline-agent does communicate with the Rest Server as I can see it ping the rest server with a HEAD request on job startup, but no lineage comes through. My Glue setup is as follows:
When running a job, I can see that the HTTP Dispatcher connects to the Spline Rest Server, but does not ever produce Lineage on the Cloudwatch logs, or connect to the Spline Rest server again (There are no calls to the server as per the logs). Is there anything that I am doing wrong, or does the Spline-Agent not support AWS Glue Spark? If it does not support Glue, how would one go about extending the Spline-Agent to support Glue? Would one need to extend the Agent to support the GlueContext? I have attached my sample Python Code in Listing 3. Any ideas, suggestions or similar efforts? Table 1. Spline Agent JARs which seem to work with Different Glue Versions
Listing 1. Contents of the spline.properties file
Listing 2. Extract of the AWS Glue log showing Spline Startup import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node xneelo_aws_test
xneelo_aws_test_node1646313615566 = glueContext.create_dynamic_frame.from_options(
connection_type="custom.jdbc",
connection_options={
"tableName": "film",
"dbTable": "film",
"connectionName": "xneelo2",
},
transformation_ctx="xneelo_aws_test_node1646313615566",
)
# Script generated for node ApplyMapping
ApplyMapping_node1631618598220 = ApplyMapping.apply(
frame=xneelo_aws_test_node1646313615566,
mappings=[],
transformation_ctx="ApplyMapping_node1631618598220",
)
# Script generated for node Amazon S3
AmazonS3_node1631618601508 = glueContext.getSink(
path="s3://dirk-ct-bucket/adv_works/",
connection_type="s3",
updateBehavior="UPDATE_IN_DATABASE",
partitionKeys=[],
enableUpdateCatalog=True,
transformation_ctx="AmazonS3_node1631618601508",
)
AmazonS3_node1631618601508.setCatalogInfo(
catalogDatabase="xneelo", catalogTableName="skalia_test"
)
AmazonS3_node1631618601508.setFormat("json")
AmazonS3_node1631618601508.writeFrame(ApplyMapping_node1631618598220)
job.commit() Listing 3. AWS Glue Python Code |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It could be caused by the internal implementation of the |
Beta Was this translation helpful? Give feedback.
It could be caused by the internal implementation of the
writeFrame
method. If it uses RDD then lineage won't be captured. See #33Also see #394 for a troubleshooting scenario for similar issues.