-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge SQL failing with ParseException #70
Comments
@srinikvv how was this created: /sandbox/spark-acid-assembly-0.5.0.jar ? |
I built this assembly jar with latest code from master branch |
@srinikvv Can you check with this jar once: https://drive.google.com/file/d/1sqsFzUtyrWvXfE7g_Q8brNHqMnb14Gvv/view?usp=sharing ? And also share your jar please, if the one i provided works. |
@amoghmargoor Tried the jar provided and still see the same issue. |
@srinikvv this is working fine, i rechecked. This would happen only if sql extension is not getting added on your end. It is difficult for me to figure out why that would not get added. But can you check that angle by
|
@amoghmargoor I checked UPDATE is working fine. Line 56 in a36f56d
Hence I believe this is not an issue with sql extension not getting added. Please check and advice |
@srinikvv this stack trace you printed is not corresponding to current code too: Line 56 doesn't have function call to parsePlan. Please recheck your jars. |
@amoghmargoor below is from master branch, SparkAcidSqlParser.scala:56 is part of parsePlan function. Am I missing anything? |
@srinikvv SparkSession.sql(SparkSession.scala:642) calls SparkAcidSqlParser.parsePlan i.e., SparkAcidSqlParser.parsePlan(SparkAcidSqlParser.scala:56). SparkAcidSqlParser.parsePlan(SparkAcidSqlParser.scala:56) will call AbstractSqlParser.parsePlan and that will call SparkSqlParser.parse and so on... So according to stack trace Line 56 of SparkAcidSqlParser.scala should have a function call to parsePlan but instead it has throw statement which means you are using old code still. |
hey @srinikvv ... were you able to fix your issue ? |
@amoghmargoor your suspicion is correct, spark-shell was using a cached/previous version of the spark-acid-assembly.jar. I was able to test the latest version using pySpark and conf "spark.driver.userClassPathFirst=true". However, I see below error while trying to activate the extension:
|
oh ... something like this just works for me: |
@amoghmargoor I was finally able to get this working after downloading Hadoop2.8.2 binaries and setting SPARK_DIST_CLASSPATH to reference these libraries. However the MERGE syntax only works with ACID tables which are not bucketed and I see below exception when used against ACID bucketed tables:
As per the documentation on Apache Hive Confluence all ACID tables must be bucketed, hence MERGE statement with Spark-Acid without support for bucketed tables is not practically usable. Do you have any plans in the near future to support MERGE on bucketed tables? |
Hi @srinikvv , Btw regarding bucketed tables, Hive ACID does not require them anymore. This was the restriction with earlier implementation of Acid which was changed and now this restriction does not hold. We create non-bucketed Hive ACID table all the time internally in Qubole. If you are using Hive 3.1 and onwards you should be good. We did not add bucketed table support because bucketing hashes are not same across engine. |
@amoghmargoor Yes this can be added to FAQ or Troubleshooting. I believe the issue is when using a spark binaries compiled with hadoop libraries < 2.8.2 (was using spark-2.4.3-bin-hadoop2.7). As a workaround we have downloaded hadoop 2.8.2 libraries and setting SPARK_DIST_CLASSPATH to refer the new hadoop libraries as below before running the spark-submit command: A better approach may be is to build a spark 2.4.3 with Hadoop 2.8.2 binaries. I am currently trying this and let you know the result. Also I tried to perform merge on non bucketed ACID tables, was facing org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException for a specific table, below is the stack trace:
|
@amoghmargoor is there a plan to modify the plugin to work on Spark 3? |
I also wanted to report another error I observed while trying to perform merge using a spark table (RDD based table created using df.createOrReplaceTempView) as source.
|
@srinikvv W.r.t org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException, is this being thrown on task retries ? similar to issue #43 ? That issue has been fixed now. reg, Spark 3 we are yet to start work on it. My guess is we might start looking at it July end. reg the type issue, yes data transfer from non-nullable to nullable should be allowed. we will take a look at it. You can get unblocked by explicitly assigning schema to your source DF that matches target schema. |
hey @srinikvv , how are things ? were you able to get MERGE working ? |
@amoghmargoor I built a latest jar with code from master branch on 14-Jul and retested the failing merge statement, I still get this error for only a specific table. Below are the steps I do
Please check this and let me know if I can do anything to fix this issue. |
@amoghmargoor When we create ACID tables without bucketing, we see a lot of unevenly sized files underneath the HDFS storage. please let me know if we can do a Zoom session to show you this issue? |
@srinikvv that would be great. I and @sourabh912 are in PST timezone and would like to join the call. Send some time slots that would work for you guys. also feel free to join the group [email protected] |
@amoghmargoor we can meet at 16July IST 9:00PM to 10:00PM, if that works for you guys. |
hey @srinikvv I missed the message above. This timing should work for me on Friday (but I guess it would be Friday night for you guys). Or else on Monday too I would be available for the call. Let me know if that would be fine. |
@amoghmargoor lets meet Today. please use below zoom link: Join Zoom Meeting |
sure ... see you at 8:30 am PST.
…Sent from my iPhone
On 17-Jul-2020, at 5:18 AM, srinikvv ***@***.***> wrote:
@amoghmargoor lets meet Today. please use below zoom link:
Topic: Veera Venkata Rao's Zoom Meeting
Time: Jul 17, 2020 09:00 PM Mumbai, Kolkata, New Delhi
Join Zoom Meeting
https://VMware.zoom.us/j/98812494211?pwd=cjhqRktESTJ0L0l3elJxaVJ1YVMxUT09
Meeting ID: 988 1249 4211
Password: 494932
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@amoghmargoor Appreciate you guys taking time to understand/debug the issue. |
@srinikvv I think existing update API may not be able to support your use case. Can you try running by compiling the jars from here: https://github.com/amoghmargoor/spark-acid/pull/new/issue-70 ? I have added few logs. Run it with just 2 executors and provide me logs for the driver and both the executors after the failure. I may followup with few more such iterations. you can mail me the logs - [email protected], if you don't want to attach here. Another question we had is: Was speculative execution also enabled ? Thanks. |
@amoghmargoor shared the logs via email. Reg Speculative execution, we are executing jobs with default value for spark.speculation (which is false for version Spark2.4.3) |
@srinikvv We have figured out an issue why this could be happening and I have added the fix here: https://github.com/amoghmargoor/spark-acid/pull/new/issue-70. Can you recreate jar from the branch above and check if your issue has been fixed by it ? |
Hi Team,
I am trying to perform MERGE on HiveAcid talbes using Qubole Spark-ACID, but was facing below errors:
Created an assembly jar from latest code from mater and tried to execute MERGE statement using spark.sql from spark-shell:
The text was updated successfully, but these errors were encountered: