Integrating ATAC-seq data from multiple species with a consensus peak set #1756

jenellewallace · 2024-07-30T19:20:12Z

jenellewallace
Jul 30, 2024

Hello, I am trying to integrate ATAC-seq data from human, chimp, and macaque. I have defined a consensus peak set in all three species using my own pipeline, so now I have consensus peaks that are named Peak_1, Peak_2, etc in all species which correspond to the location of the peak in each species' own coordinates. I would appreciate some help in figuring out how to set this up in Signac as the integration vignettes only seem to cover integrating datasets from a single species. Is there a way to name the peaks with their consensus names? I assume this would be necessary in order to merge the object and use the integration pipeline. When I tried this I got an error (Peak_1, etc names are stored in all_peaks$name):

cells = colnames(multi3)
macs2_counts <- FeatureMatrix(fragments = frags_list,features = all_peaks,cells = cells)
rownames(macs2_counts) = all_peaks$name 
multi3[["peaks_consensus"]] <- CreateChromatinAssay(counts = macs2_counts,fragments = frags_list[1], annotation = annotation,min.cells=-1,min.features = -1, cells = cells)
Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) : 
  some values in the "start" column cannot be turned into numeric values

I was able to create the ChromatinAssay when I left the rownames as is (chromosome coordinates), but this means the same peak will have a different name in each species and then I'm not sure how to do the integration. Any advice would be greatly appreciated!

jenellewallace · 2024-08-05T21:23:31Z

jenellewallace
Aug 5, 2024
Author

I found a workaround that allows integration across species - would still love feedback on whether this is the recommended way to do this but here's what I did in case this helps anyone else (only showing modified steps):

#Make objects for each species and store in a list called multi_species
#Subset to consensus peaks
multi_species_con <-  list()
for (sp in species){
  multi_species_con[[sp]] = multi_species[[sp]]
  DefaultAssay(multi_species_con[[sp]]) <- 'peaks_consensus'
  multi_species_con[[sp]][["peaks_celltypes"]] <- NULL #remove to use less memory
  multi_species_con[[sp]] <- subset(multi_species_con[[sp]], features = rownames(multi_species_con[[sp]])[1:num_con_peaks])
}
#Rename peaks to human coords and process each species
for (sp in species){
  Annotation(multi_species_con[[sp]]) = Annotation(multi_species_con$human)
  #rename counts
  rownames(multi_species_con[[sp]]@assays$peaks_consensus@counts)[1:num_con_peaks] = rownames(multi_species_con$human@assays$peaks_consensus@counts)[1:num_con_peaks]
  #rename data
  rownames(multi_species_con[[sp]]@assays$peaks_consensus@data)[1:num_con_peaks] = rownames(multi_species_con$human@assays$peaks_consensus@data)[1:num_con_peaks]
  #rename meta.features
  rownames(multi_species_con[[sp]]@assays$peaks_consensus@meta.features)[1:num_con_peaks] = rownames(multi_species_con$human@assays$peaks_consensus@meta.features)[1:num_con_peaks]
multi_species_con[[sp]] <- RunTFIDF(multi_species_con[[sp]])
  multi_species_con[[sp]] <- FindTopFeatures(multi_species_con[[sp]], min.cutoff = 'q0') #need to use all features or else they won't be the same across species
  multi_species_con[[sp]] <- RunSVD(multi_species_con[[sp]])
  multi_species_con[[sp]] <- RunUMAP(object = multi_species_con[[sp]], reduction = 'lsi', dims = 2:30)
  multi_species_con[[sp]] <- FindNeighbors(object = multi_species_con[[sp]], reduction = 'lsi', dims = 2:30)
 }
#Merge and integrate following the ATAC integration vignette. In FindIntegrationAnchors I increased k.anchor to 20 to improve the integration as it did not look very good with the default value of 5

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating ATAC-seq data from multiple species with a consensus peak set #1756

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Integrating ATAC-seq data from multiple species with a consensus peak set #1756

jenellewallace Jul 30, 2024

Replies: 1 comment

jenellewallace Aug 5, 2024 Author

jenellewallace
Jul 30, 2024

jenellewallace
Aug 5, 2024
Author