-
This is one of the most frequently asked question about Spline Spark Agent. See below for the answer. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Spline agent listens to events generated by Spark when a Spark job is executed. The type of events and metadata they carry varies depending on what Spark action is called; whether you are using SQL, DataFrame or RDD API; the type of a data source you read form or write to and so on. Spline agent supports many of them, but not all of them, as there are potentially infinite number of different data sources, formats and technologies Spark can integrate with via 3rd-party extensions. Sometimes, it might happen that Spline agent receives an event that it doesn't know how to process, or intentionally ignores it as it "thinks" there is no useful (from the lineage tracking perspective) information associated with it. In such cases the lineage might not be captured as expected. If it's a right or wrong behavior depends on a particular situation. So, if the Spline agent initialized correctly, your job completed without errors, but the lineage or one of its parts is missing, I recommend the following troubleshooting steps:
The support for unrecognized write-commands can be added via a custom plugin. Knowing the internal structure of a command class is crucial. If it comes form a closed-source library or a Spark distribution Spline can try to inspect it reflectively and damp into the logs. Enable TRACE log level and Spline will automatically reflect the unrecognized command classes. Find the phrase "OBJECT DUMP BEGIN" in the logs, create an issue here on GitHub and paste the relevant text block there. |
Beta Was this translation helpful? Give feedback.
Spline agent listens to events generated by Spark when a Spark job is executed. The type of events and metadata they carry varies depending on what Spark action is called; whether you are using SQL, DataFrame or RDD API; the type of a data source you read form or write to and so on. Spline agent supports many of them, but not all of them, as there are potentially infinite number of different data sources, formats and technologies Spark can integrate with via 3rd-party extensions. Sometimes, it might happen that Spline agent receives an event that it doesn't know how to process, or intentionally ignores it as it "thinks" there is no useful (from the lineage tracking perspective) informatio…