Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation #5

mikelove · 2023-07-21T09:33:39Z

a note to myself

Write up chapter showing how to compute these correlations that doesn't involve copying/modifying the SE data.

https://gist.github.com/mikelove/2e899346d92908e6cbe3448705e4b5de

FedericoVann · 2023-07-30T13:29:46Z

Hello!
I tried to speed up the computation.

By using vapply():

x_overlaps = x_overlaps %>%
  mutate(rho = vapply(seq_along(id.x), function(i) {
    cor(dat_x[id.x[i], ], dat_y[id.y[i], ])
  }, numeric(1)))

Regardless the method, an idea could be to convert the input into a data.table and then convert it back into a Granges class (if it is a mandatory step).

This seems to speeds everything up drastically.

I microbenched the map2_dbl and vapply methods with the input in both formats:

library(microbenchmark) # to benchmark different methods
library(data.table) # to convert GRanges into data.tables
library(gUtils) # to convert data.tables into GRanges class
library(ggplot2) # to make the autoplot

# input as GRanges object
x_overlaps_GRanges <- x %>% 
  join_overlap_inner(y, maxgap=100) %>%
  filter(tile_id.x == tile_id.y) %>%
  select(tile_id = tile_id.x, id.x, id.y) 

# input as data.table object
x_overlaps_data_table <- x %>% 
  join_overlap_inner(y, maxgap=100) %>%
  filter(tile_id.x == tile_id.y) %>%
  select(tile_id = tile_id.x, id.x, id.y) %>%
  as.data.table()

methods_performance <- microbenchmark(
  setup=set.seed(12),
  
  # Use the input as GRanges formal class
  # Calculate correlations using map2_dbl() and mutate()
  x_overlaps_map2_dbl_GR = x_overlaps_GRanges %>%
    mutate(rho = map2_dbl(id.x, id.y, function(.x, .y) {
      cor(dat_x[.x, ], dat_y[.y, ])
    })),

  
  # Calculate correlations using vapply() and mutate()
  x_overlaps_vapply_GR = x_overlaps_GRanges %>%
    mutate(rho = vapply(seq_along(id.x), function(i) {
      cor(dat_x[id.x[i], ], dat_y[id.y[i], ])
    }, numeric(1))),
  
  
  # Use the input converted into data.table format
  # Calculate correlations using map2_dbl() and mutate()
  x_overlaps_map2_dbl_DT = x_overlaps_data_table %>%
    mutate(rho = map2_dbl(id.x, id.y, function(.x, .y) {
      cor(dat_x[.x, ], dat_y[.y, ])
    })) %>%
    dt2gr(key = NULL, seqlengths = NULL, seqinfo = Seqinfo()),
  

  
  # Calculate correlations using vapply() and mutate()
  x_overlaps_vapply_DT = x_overlaps_data_table %>%
    mutate(rho = vapply(seq_along(id.x), function(i) {
      cor(dat_x[id.x[i], ], dat_y[id.y[i], ])
    }, numeric(1))) %>%
    dt2gr(key = NULL, seqlengths = NULL, seqinfo = Seqinfo()),
    
  
  times = 100
)

# look at the performances
methods_performance

# plot the performances (Time)
autoplot(methods_performance) + theme_bw()

An additional tip could be to parallelize the code with mclapply():

library(parallel)

# input as data.table object
x_overlaps <- x %>% 
  join_overlap_inner(y, maxgap=100) %>%
  filter(tile_id.x == tile_id.y) %>%
  select(tile_id = tile_id.x, id.x, id.y) %>%
  as.data.table()

# Calculate correlations using mclapply() and mutate()
x_overlaps$rho <- unlist(mclapply(seq_along(x_overlaps$id.x),
                                  mc.cores=detectCores(),
                                  function(i) {
  cor(dat_x[x_overlaps$id.x[i], ], dat_y[x_overlaps$id.y[i], ])
}))

All of the code above has been tested.

I hope all this has been helpful

Cheers!

FedericoVann · 2023-07-30T13:31:27Z

Performance_plot.pdf

mikelove · 2023-07-30T15:00:58Z

Sorry I should have explained this one better.

Yes we can speed up many operations by converting into data.table.

I meant that I have already derived a faster solution to a previous problem that doesn’t involve modifying the original S4 objects, but I haven’t written it up, so I assigned this to myself as a todo.

But I will take a look at your report, thank you!

mikelove added the documentation Improvements or additions to documentation label Jul 21, 2023

mikelove self-assigned this Jul 26, 2023

mikelove changed the title ~~Write up example of using enhancer-promoter overlaps to compute correlation~~ Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation Jul 30, 2023

mikelove added this to tidyomics open challenges Aug 6, 2023

mikelove moved this to Todo in tidyomics open challenges Aug 6, 2023

mikelove removed this from tidyomics open challenges Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation #5

Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation #5

mikelove commented Jul 21, 2023 •

edited

Loading

FedericoVann commented Jul 30, 2023

FedericoVann commented Jul 30, 2023

mikelove commented Jul 30, 2023 •

edited

Loading

Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation #5

Mike TODO: Write up example of using enhancer-promoter overlaps to compute correlation #5

Comments

mikelove commented Jul 21, 2023 • edited Loading

FedericoVann commented Jul 30, 2023

FedericoVann commented Jul 30, 2023

mikelove commented Jul 30, 2023 • edited Loading

mikelove commented Jul 21, 2023 •

edited

Loading

mikelove commented Jul 30, 2023 •

edited

Loading