Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module not found: graphframes#graphframes;0.5.0-spark2.1-s_2.11 #16

Open
lgeistlinger opened this issue Aug 25, 2021 · 1 comment
Open

Comments

@lgeistlinger
Copy link

lgeistlinger commented Aug 25, 2021

Hi,

I am trying to follow the instructions under https://spark.rstudio.com/graphframes/ for running graphframes with spark version 2.1.0.
However, I am facing a similar issue as has been described before in #7.

That means after:

sparklyr::spark_install(version = "2.1.0")

I can connect to spark in a fresh R session via:

library(sparklyr)
sc <- spark_connect(master = "local", version = "2.1.0")

However, when also loading graphframes, I would run into the following error:

> library(sparklyr)
> library(graphframes)
> sc <- spark_connect(master = "local", version = "2.1.0", config = conf)
Ivy Default Cache set to: /Users/ludwig/.ivy2/cache
The jars for the packages stored in: /Users/ludwig/.ivy2/jars
:: loading settings :: url = jar:file:/Users/ludwig/spark/spark-2.1.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
	confs: [default]
:: resolution report :: resolve 1226ms :: artifacts dl 0ms
	:: modules in use:
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
	---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
		module not found: graphframes#graphframes;0.5.0-spark2.1-s_2.11

	==== local-m2-cache: tried

	  file:/Users/ludwig/.m2/repository/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom

	  -- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:

	  file:/Users/ludwig/.m2/repository/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar

	==== local-ivy-cache: tried

	  /Users/ludwig/.ivy2/local/graphframes/graphframes/0.5.0-spark2.1-s_2.11/ivys/ivy.xml

	  -- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:

	  /Users/ludwig/.ivy2/local/graphframes/graphframes/0.5.0-spark2.1-s_2.11/jars/graphframes.jar

	==== central: tried

	  https://repo1.maven.org/maven2/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom

	  -- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:

	  https://repo1.maven.org/maven2/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar

	==== spark-packages: tried

	  http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom

	  -- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:

	  http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar

		::::::::::::::::::::::::::::::::::::::::::::::

		::          UNRESOLVED DEPENDENCIES         ::

		::::::::::::::::::::::::::::::::::::::::::::::

		:: graphframes#graphframes;0.5.0-spark2.1-s_2.11: not found

		::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: graphframes#graphframes;0.5.0-spark2.1-s_2.11: not found]
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078)
	at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId,  : 
  Gateway in localhost:8880 did not respond.

I have tried using a more recent version of spark (2.4.3) as well as putting the apparently missing graphframes jars directly into the jars directory without success.

Any advice on how to resolve this would be greatly appreciated. Thanks!

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.7       graphframes_0.1.2 sparklyr_1.7.0   

loaded via a namespace (and not attached):
 [1] pillar_1.6.2        compiler_4.1.0      BiocManager_1.30.16
 [4] dbplyr_2.1.1        prettyunits_1.1.1   remotes_2.4.0      
 [7] r2d3_0.2.5          base64enc_0.1-3     tools_4.1.0        
[10] pkgbuild_1.2.0      digest_0.6.27       jsonlite_1.7.2     
[13] lifecycle_1.0.0     tibble_3.1.3        pkgconfig_2.0.3    
[16] rlang_0.4.11        cli_3.0.1           DBI_1.1.1          
[19] rstudioapi_0.13     curl_4.3.2          yaml_2.2.1         
[22] parallel_4.1.0      withr_2.4.2         httr_1.4.2         
[25] generics_0.1.0      vctrs_0.3.8         htmlwidgets_1.5.3  
[28] askpass_1.1         rappdirs_0.3.3      rprojroot_2.0.2    
[31] tidyselect_1.1.1    glue_1.4.2          forge_0.2.0        
[34] R6_2.5.0            processx_3.5.2      fansi_0.5.0        
[37] callr_3.7.0         purrr_0.3.4         tidyr_1.1.3        
[40] magrittr_2.0.1      ps_1.6.0            ellipsis_0.3.2     
[43] htmltools_0.5.1.1   assertthat_0.2.1    config_0.3.1       
[46] utf8_1.2.2          openssl_1.4.4       crayon_1.4.1
@lgeistlinger
Copy link
Author

The problem seems to be that the default repos (https://repo1.maven.org and http://dl.bintray.com), that sparklyr tries to install graphframes from, do not host the graphframes jars anymore.

This can be fixed that by adding "https://repos.spark-packages.org" to the list of repositories as done here.

Also the code can be updated to pull the latest version of graphframes (v0.8.1, Sep 2020), which works with Spark version 2.4 and higher, as done here.

I can provide a pull request if it seems worth incorporating these updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant