Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtim and results of dodgr_centrality #5

Closed
Robinlovelace opened this issue Oct 22, 2019 · 6 comments
Closed

Runtim and results of dodgr_centrality #5

Robinlovelace opened this issue Oct 22, 2019 · 6 comments

Comments

@Robinlovelace
Copy link
Member

Heads-up @mpadge, I tried dodgr_centrality() on the example data in this repo and it seemed to run substantially slower than igraph, ~20s vs ~1s. See here for result for igraph:

79ea950#diff-04c6e90faac2675aa89e2176d2eec7d8R186-R188

and dodgr:

79ea950#diff-04c6e90faac2675aa89e2176d2eec7d8R161

Also I couldn't see how to extract the centrality component in the sf object, as per UrbanAnalyst/dodgr#90 (comment)

I imagine I'm doing something wrong but thought the heads-up on these preliminary findings would be useful. Let me know if you know of a quick fix/error in my code. Cheers!

@mpadge
Copy link
Collaborator

mpadge commented Oct 22, 2019

Oh nah, that would likely be right - there is some non-negligible overhead associated with setting up the parallel job, so for jobs likely to run in less than probably a few minutes, igraph may likely be faster. There'll soon be a vignette on dodgr_centrality(), which will pretty clearly say that it really is only intended to be used for very large jobs, and that igraph will likely be more efficient for any other use cases.

I'll fix the missing centrality in sf tomorrow ...

@Robinlovelace
Copy link
Member Author

Do you have a reproducible example of it running on a larger network to compare with igraph? 20s is not too bad in any case, would like to see how it copes with the network of a larger city of Leeds, London or New York.

@mpadge
Copy link
Collaborator

mpadge commented Oct 23, 2019

Those cities would be entirely possible, but would take ages to benchmark. A network only needs to be big enough to take any finite time:

setwd ("/data/mega/code/repos/atfutures/dodgr")
devtools::load_all (".", export_all = FALSE)
#> Loading dodgr
f <- "selby.Rds"
net <- dodgr_streetnet_sc ("selby uk", expand = 0)
graph <- weight_streetnet (net, wt_profile = "bicycle") %>%
    dodgr_contract_graph ()
#> Loading required namespace: geodist
#> Loading required namespace: dplyr
message ("graph has ", format (nrow (graph), big.mark = ","), " edges")
#> graph has 5,806 edges

igr <- dodgr_to_igraph (graph)
#> Loading required namespace: igraph

f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE, edges = TRUE)
f_igraph <- function (graph) igraph::edge_betweenness (igr)
knitr::kable (rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_i <- f_igraph (igr),
             replications = 1) [, 1:4])
test replications elapsed relative
res_d <- f_dodgr(graph) 1 0.310 1.000
res_i <- f_igraph(igr) 1 1.011 3.261

Created on 2019-10-23 by the reprex package (v0.3.0)

Note that the more cores you have, the greater may be the overhead associated with establishing the paralllel threads, so you may not see advantages on larger machines until your graph size increases quite a bit. Try York if you want to see that - it has around 60,000 edges, so is around 10 times the size of Selby.


Also note that the two do not always give absolutely identical results, because igraph rounds to integers for routing with a precision of 1e-10, but this can be enough to modify routing compared with dodgr which uses floating-point precision throughout. I sometimes see in the above example one or two edges from the 5,806 which have marginally different centrality values.

@mpadge
Copy link
Collaborator

mpadge commented Oct 23, 2019

dodgr_to_sf() looks all okay with centrality:

devtools::load_all (".", export_all = FALSE)
#> Loading dodgr
f <- "selby.Rds"
net <- dodgr_streetnet_sc ("selby uk", expand = 0)
graph0 <- weight_streetnet (net, wt_profile = "bicycle") %>%
    dodgr_centrality (edges = TRUE)
#> Loading required namespace: geodist
#> Loading required namespace: dplyr
graph1 <- weight_streetnet (net, wt_profile = "bicycle") %>%
    dodgr_contract_graph () %>%
    dodgr_centrality (contract = FALSE, edges = TRUE) %>%
    dodgr_uncontract_graph ()
identical (graph0$centrality, graph1$centrality)
#> [1] TRUE
# The graphs themselves include hash values, so are never identical
graph_sf <- dodgr_to_sf (graph0)
#> Loading required namespace: sf
names (graph_sf) # centrality is there
#>  [1] "from_id"        "to_id"          "edge_id"        "from_lon"      
#>  [5] "from_lat"       "to_lon"         "to_lat"         "d"             
#>  [9] "object_"        "highway"        "oneway.bicycle" "oneway:bicycle"
#> [13] "d_weighted"     "time"           "time_weighted"  "centrality"    
#> [17] "component"      "geometry"

Created on 2019-10-23 by the reprex package (v0.3.0)
... and that's pretty cool that the whole piped flow from grabbing streetnet through to uncontracting with centrality just works like that.

@Robinlovelace
Copy link
Member Author

Very nice example, thanks @mpadge, reproduced 👍

Now looking at comparison with igraph results. All fuel for the 🔥 and making me look forward to the NY results. Can you share code you're using for that? Cheers!

@Robinlovelace
Copy link
Member Author

Update, I've converted that into a reprex with runtimes. Seems dodgr and igraph approaches are now very similar, and that the results look similar:

devtools::install_github("atfutures/dodgr")
#> Skipping install of 'dodgr' from a github remote, the SHA1 (2b42e612) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(tictoc)
library(dodgr)
library(pins)
tic("getting network")
# net <- dodgr_streetnet_sc ("selby uk", expand = 0)
# saveRDS(net, "net.Rds")
# piggyback::pb_upload("net.Rds", "ATFutures/who3")
# piggyback::pb_download_url("net.Rds", "ATFutures/who3")
net = readRDS(url("https://github.com/ATFutures/who3/releases/download/0.0.1/net.Rds"))
toc()
#> getting network: 1.694 sec elapsed
tic("weighting, uncontracted")
graph0 <- weight_streetnet (net, wt_profile = "bicycle") %>%
  dodgr_centrality (edges = TRUE)
#> Loading required namespace: geodist
#> Loading required namespace: dplyr
toc()
#> weighting, uncontracted: 2.038 sec elapsed
tic("weighting, contracted")
graph1 <- weight_streetnet (net, wt_profile = "bicycle") %>%
  dodgr_contract_graph () %>%
  dodgr_centrality (contract = FALSE, edges = TRUE) %>%
  dodgr_uncontract_graph ()
toc()
#> weighting, contracted: 2.028 sec elapsed
identical (graph0$centrality, graph1$centrality)
#> [1] TRUE
graph_sf <- dodgr_to_sf (graph0)
#> Loading required namespace: sf
names (graph_sf) # centrality is there
#>  [1] "from_id"        "to_id"          "edge_id"        "from_lon"      
#>  [5] "from_lat"       "to_lon"         "to_lat"         "d"             
#>  [9] "object_"        "highway"        "oneway.bicycle" "oneway:bicycle"
#> [13] "d_weighted"     "time"           "time_weighted"  "centrality"    
#> [17] "component"      "geometry"
plot(graph_sf["centrality"])

library(stplanr)
tic("getting network sf")
# net_sf = dodgr_streetnet("selby uk", expand = 0)
# saveRDS(net_sf, "net_sf.Rds")
# piggyback::pb_upload("net_sf.Rds", "ATFutures/who3")
# piggyback::pb_download_url("net_sf.Rds", "ATFutures/who3")
net_sf = readRDS(url("https://github.com/ATFutures/who3/releases/download/0.0.1/net_sf.Rds"))
toc()
#> getting network sf: 1.88 sec elapsed
tic("preprocessing sf network")
net_sf_cleaned = rnet_breakup_vertices(net_sf)
#> Splitting rnet object at the intersection points between nodes and internal vertexes
#> Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
#> Splitting rnet object at the duplicated internal vertexes
sln = SpatialLinesNetwork(net_sf_cleaned)
#> Warning in SpatialLinesNetwork.sf(net_sf_cleaned): Graph composed of
#> multiple subgraphs, consider cleaning it with sln_clean_graph().
sln = sln_clean_graph(sln)
#> Input sln composed of 6 graphs. Selecting the largest.
toc()
#> preprocessing sf network: 1.396 sec elapsed
tic("igraph betweenness")
sln@sl$centrality = igraph::edge_betweenness(sln@g)
toc()
#> igraph betweenness: 1.247 sec elapsed
plot(sln@sl["centrality"])

Created on 2019-10-23 by the reprex package (v0.3.0)

Heads-up @mpadge and, @agila5, may be worth reproducing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants