Replies: 1 comment 2 replies
-
Let me understand your question. So you run Spark jobs on multiple EMR clusters and want to track lineage centrally for all clusters, is that correct? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Well Iam planning on a design which could be generalized.
I am creating a framework which does multiple Transformations along with Auditing/lineaging. So the idea here is to ensure that the transformations happen in another module/EMR cluster(Maybe using PySpark/Scala Spark) entirely and lineaging happens in a different Emr Cluster. However To ensure that the auditing happens, I would need to send the SparkSessions across both the Spark jobs. Is there a better way to do the same?
Beta Was this translation helpful? Give feedback.
All reactions