Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding out about the type of interaction #118

Open
3 tasks
vertesy opened this issue Jan 17, 2025 · 2 comments
Open
3 tasks

Finding out about the type of interaction #118

vertesy opened this issue Jan 17, 2025 · 2 comments
Assignees
Labels
question A user question or anything not obviously bug

Comments

@vertesy
Copy link

vertesy commented Jan 17, 2025

Hello,

I am using LIANA to infer cell-to-cell communication.
It does not seem to deal with the type of protein-protein interaction (Please correct me if I am wrong.): e.g. soluble paracrine signaling versus membrane bound protein's (pairwise) interaction of cell adhesion molecules.
However it leverages OmnipathDB via OmnipathR. Here again I did not find a deep annotation:

> OmnipathR::interaction_types()
[2025-01-17 13:56:41] [SUCCESS] [OmnipathR] Downloaded 26 records.
[1] "lncrna_post_transcriptional" "mirna_transcriptional"      
[3] "post_transcriptional"        "post_translational"         
[5] "small_molecule_protein"      "transcriptional"   
  • I guess "post_translational" refers to protein-protein interaction?

I am trying to find out about two levels of annotation:

Thank you for your time!
Abel

@deeenes deeenes self-assigned this Jan 17, 2025
@deeenes
Copy link
Member

deeenes commented Jan 17, 2025

Hello Abel,

Hello,

I am using LIANA to infer cell-to-cell communication. It does not seem to deal with the type of protein-protein interaction (Please correct me if I am wrong.): e.g. soluble paracrine signaling versus membrane bound protein's (pairwise) interaction. However it leverages OmnipathDB via OmnipathR. Here again I did not find a deep annotation:

OmnipathR::interaction_types()
[2025-01-17 13:56:41] [SUCCESS] [OmnipathR] Downloaded 26 records.
[1] "lncrna_post_transcriptional" "mirna_transcriptional"
[3] "post_transcriptional" "post_translational"
[5] "small_molecule_protein" "transcriptional"

* [ ]  I guess "post_translational" refers to _protein-protein interaction_?

Correct. But you're looking for something else, a functional or topological classification within PPI. An information that is part of the OmniPath Intercell database. In the Intercell database, we collected from ~18 resources a number of properties relevant in cell-cell communication. We also curated and combined these, merging synonymous categories, and deriving further categories by combining existing ones. Finally, we introduced a simple automatic quality filtering, based mostly on consensus across resources: this was necessary, because we realized our data is very noisy, with many false positives. I recommend you to compile your own annotations from OmniPath Intercell, by selecting and manually checking the categories and resources you need, and then apply this annotations to the PPI network, by joining the data frames. For a complete list of categories see intercell_categories() and intercell_summary(). How these categories are built you can see here. Minimal explanation is available here, but please don't try to build the complete database at home :) Rather, to clarify the meaning of the categories and properties, refer to our 2021 paper, especially Table EV10. The transmitter, receiver, secreted, plasma_membrane_transmembrane and plasma_membrane_peripheral columns show a conclusion based on all available data, but data by resource and by original category can be queried from OmniPath Intercell. For example, to see everything about the transmembrane category:

library(OmnipathR)
library(dplyr)

tm <- intercell(parent = 'transmembrane')

tm
# A tibble: 42,785 × 15
   category      parent    database scope aspect source uniprot genesymbol entity_type consensus_score transmitter receiver secreted
   <chr>         <chr>     <chr>    <chr> <chr>  <chr>  <chr>   <chr>      <chr>                 <dbl> <lgl>       <lgl>    <lgl>   
 1 transmembrane transmem… UniProt… gene… locat… resou… Q8N661  TMEM86B    protein                   4 FALSE       FALSE    FALSE   
 2 transmembrane transmem… UniProt… gene… locat… resou… Q8IWU2  LMTK2      protein                   7 FALSE       FALSE    FALSE   
 3 transmembrane transmem… UniProt… gene… locat… resou… P41273  TNFSF9     protein                   7 FALSE       FALSE    TRUE    
 4 transmembrane transmem… UniProt… gene… locat… resou… Q9Y661  HS3ST4     protein                   4 FALSE       FALSE    FALSE   
 5 transmembrane transmem… UniProt… gene… locat… resou… Q9UPX0  IGSF9B     protein                   5 FALSE       FALSE    FALSE   
 6 transmembrane transmem… UniProt… gene… locat… resou… Q9NYV7  TAS2R16    protein                   5 FALSE       FALSE    FALSE   
 7 transmembrane transmem… UniProt… gene… locat… resou… P01911  HLA-DRB1   protein                   8 FALSE       FALSE    FALSE   
 8 transmembrane transmem… UniProt… gene… locat… resou… Q6P9B9  INTS5      protein                   4 FALSE       FALSE    FALSE   
 9 transmembrane transmem… UniProt… gene… locat… resou… P05496  ATP5MC1    protein                   5 FALSE       FALSE    FALSE   
10 transmembrane transmem… UniProt… gene… locat… resou… P55344  LIM2       protein                   4 FALSE       FALSE    FALSE   
# ℹ 42,775 more rows
# ℹ 2 more variables: plasma_membrane_transmembrane <lgl>, plasma_membrane_peripheral <lgl>
# ℹ Use `print(n = ...)` to see more rows

We also see that 11 resources provide this category:

tm %>% pull(database) %>% unique
 [1] "UniProt_location"    "UniProt_topology"    "UniProt_keyword"     "OmniPath"            "GO_Intercell"       
 [6] "CellPhoneDB"         "OPM"                 "TopDB"               "LOCATE"              "Ramilowski_location"
[11] "HGNC"

Then to define plasma_membrane_transmembrane, it is combined with the plasma_membrane locational category.

Meanwhile in the Annotations database, the original data from each resource is available. For example, to see the resource OPM from the list above:

library(OmnipathR)

opm <- annotations(resources = 'OPM', wide = TRUE)
[2025-01-17 15:59:38] [SUCCESS] [OmnipathR] Downloaded 1648 annotation records.

opm
# A tibble: 760 × 6
   uniprot genesymbol entity_type membrane             family                                    transmembrane
   <chr>   <chr>      <chr>       <chr>                <chr>                                     <lgl>        
 1 Q8WWT9  SLC13A3    protein     Eykaryo. plasma      FXYD regulators                           TRUE         
 2 P08842  STS        protein     Endoplasm. reticulum Sulfatase                                 TRUE         
 3 P59666  DEFA3      protein     Secreted             Vertebrate defensin                       FALSE        
 4 P10415  BCL2       protein     Mitochon. outer      Bcl-2 inhibitors of programmed cell death TRUE         
 5 Q02127  DHODH      protein     Mitochon. inner      FMN-dependent dehydrogenase               TRUE         
 6 O14684  PTGES      protein     Endoplasm. reticulum MAPEG family                              TRUE         
 7 P47712  PLA2G4A    protein     Vesicle              Lysophospholipase catalytic domain        FALSE        
 8 P47712  PLA2G4A    protein     Eykaryo. plasma      C2 domain                                 FALSE        
 9 Q15080  NCF4       protein     Endosome             PX domain                                 FALSE        
10 P29972  AQP1       protein     Eykaryo. plasma      Major intrinsic protein (MIP) family      TRUE         
# ℹ 750 more rows
# ℹ Use `print(n = ...)` to see more rows

I am trying to find out about two levels of annotation:

* [ ]  a rough annotation of the type of protein-protein interaction[ ]  gene family level annotation using sth like: https://www.genenames.org/data/genegroup/#!/group/20 (ideally for interacting pairs).
  (Using gene symbol patterns can be misleading)

The HGNC classification is part of the OmniPath Annotations database, while certain curated categories of it, the ones related to cell-cell communication, are part of the OmniPath Intercell database:

library(OmnipathR)
library(dplyr)

hgnc <- intercell(resources = 'HGNC', scope = 'specific')

hgnc
# A tibble: 4,023 × 15
   category    parent      database scope aspect source uniprot genesymbol entity_type consensus_score transmitter receiver secreted
   <chr>       <chr>       <chr>    <chr> <chr>  <chr>  <chr>   <chr>      <chr>                 <dbl> <lgl>       <lgl>    <lgl>   
 1 lhfpl       transmembr… HGNC     spec… locat… resou… Q86UP9  LHFPL3     protein                   0 FALSE       FALSE    FALSE   
 2 lhfpl       transmembr… HGNC     spec… locat… resou… Q6ZUX7  LHFPL2     protein                   0 FALSE       FALSE    FALSE   
 3 lhfpl       transmembr… HGNC     spec… locat… resou… Q86WI0  LHFPL1     protein                   0 FALSE       FALSE    FALSE   
 4 lhfpl       transmembr… HGNC     spec… locat… resou… Q6ICI0  LHFPL7     protein                   0 FALSE       FALSE    FALSE   
 5 lhfpl       transmembr… HGNC     spec… locat… resou… Q7Z7J7  LHFPL4     protein                   0 FALSE       FALSE    FALSE   
 6 lhfpl       transmembr… HGNC     spec… locat… resou… Q8TAF8  LHFPL5     protein                   0 FALSE       FALSE    FALSE   
 7 lhfpl       transmembr… HGNC     spec… locat… resou… Q9Y693  LHFPL6     protein                   0 FALSE       FALSE    FALSE   
 8 ifn_induced plasma_mem… HGNC     spec… locat… resou… A6NNB3  IFITM5     protein                   0 FALSE       FALSE    FALSE   
 9 ifn_induced plasma_mem… HGNC     spec… locat… resou… Q01629  IFITM2     protein                   0 FALSE       FALSE    FALSE   
10 ifn_induced plasma_mem… HGNC     spec… locat… resou… P13164  IFITM1     protein                   0 FALSE       FALSE    FALSE   
# ℹ 4,013 more rows
# ℹ 2 more variables: plasma_membrane_transmembrane <lgl>, plasma_membrane_peripheral <lgl>
# ℹ Use `print(n = ...)` to see more rows

Protocadherins are one of the categories defined by HGNC, their parent in our classification is cell_adhesion:

hgnc %>% filter(category == 'protocadherin')
# A tibble: 70 × 15
   category parent database scope aspect source uniprot genesymbol entity_type consensus_score transmitter receiver secreted plasma_membrane_tran…¹ plasma_membrane_peri…²
   <chr>    <chr>  <chr>    <chr> <chr>  <chr>  <chr>   <chr>      <chr>                 <dbl> <lgl>       <lgl>    <lgl>    <lgl>                  <lgl>                 
 1 protoca… cell_… HGNC     spec… funct… resou… Q9Y5G7  PCDHGA6    protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 2 protoca… cell_… HGNC     spec… funct… resou… Q9Y5H2  PCDHGA11   protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 3 protoca… cell_… HGNC     spec… funct… resou… Q9Y5E5  PCDHB4     protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 4 protoca… cell_… HGNC     spec… funct… resou… Q9Y5E6  PCDHB3     protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 5 protoca… cell_… HGNC     spec… funct… resou… Q9Y5G1  PCDHGB3    protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 6 protoca… cell_… HGNC     spec… funct… resou… Q9Y5E2  PCDHB7     protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 7 protoca… cell_… HGNC     spec… funct… resou… Q9Y5F8  PCDHGB7    protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 8 protoca… cell_… HGNC     spec… funct… resou… COMPLE… COMPLEX:P… complex                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
 9 protoca… cell_… HGNC     spec… funct… resou… Q9Y5G4  PCDHGA9    protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
10 protoca… cell_… HGNC     spec… funct… resou… Q9Y5H5  PCDHA9     protein                   0 TRUE        TRUE     FALSE    TRUE                   FALSE                 
# ℹ 60 more rows
# ℹ abbreviated names: ¹​plasma_membrane_transmembrane, ²​plasma_membrane_peripheral
# ℹ Use `print(n = ...)` to see more rows

You see a few protein complexes annotated, almost all of these are in silico inferred, based on all components of the complex carrying the same annotation.

Original HGNC data is available in the Annotations database:

library(OmnipathR)
library(dplyr)

hgnc_original <- annotations(resources = 'HGNC', wide = TRUE)
[2025-01-17 16:02:43] [SUCCESS] [OmnipathR] Downloaded 28373 annotation records.

hgnc_original
# A tibble: 28,373 × 4
   uniprot genesymbol entity_type mainclass                            
   <chr>   <chr>      <chr>       <chr>                                
 1 P04217  A1BG       protein     Immunoglobulin like domain containing
 2 Q9NQ94  A1CF       protein     RNA binding motif containing         
 3 P01023  A2M        protein     Alpha-2-macroglobulin family         
 4 A8K2U0  A2ML1      protein     Alpha-2-macroglobulin family         
 5 U3KPV4  A3GALT2    protein     Glycosyltransferase family 6         
 6 Q9NPC4  A4GALT     protein     Alpha 1,4-glycosyltransferases       
 7 Q9NPC4  A4GALT     protein     Blood group antigens                 
 8 Q9UNA3  A4GNT      protein     Alpha 1,4-glycosyltransferases       
 9 Q9NRG9  AAAS       protein     Nucleoporins                         
10 Q9NRG9  AAAS       protein     WD repeat domain containing          
# ℹ 28,363 more rows
# ℹ Use `print(n = ...)` to see more rows

hgnc_original %>% filter(mainclass == 'Clustered protocadherins')

# A tibble: 70 × 4
   uniprot genesymbol entity_type mainclass               
   <chr>   <chr>      <chr>       <chr>                   
 1 Q9Y5I3  PCDHA1     protein     Clustered protocadherins
 2 Q9Y5H9  PCDHA2     protein     Clustered protocadherins
 3 Q9Y5H8  PCDHA3     protein     Clustered protocadherins
 4 Q9UN74  PCDHA4     protein     Clustered protocadherins
 5 Q9Y5H7  PCDHA5     protein     Clustered protocadherins
 6 Q9UN73  PCDHA6     protein     Clustered protocadherins
 7 Q9UN72  PCDHA7     protein     Clustered protocadherins
 8 Q9Y5H6  PCDHA8     protein     Clustered protocadherins
 9 Q9Y5H5  PCDHA9     protein     Clustered protocadherins
10 Q9Y5I2  PCDHA10    protein     Clustered protocadherins
# ℹ 60 more rows
# ℹ Use `print(n = ...)` to see more rows

Let me know if you have any further questions. And feedback is very welcome, about how we could improve these annotations and the API accessing them.

Best,

Denes

Thank you for your time! Abel

@deeenes deeenes added the question A user question or anything not obviously bug label Jan 17, 2025
@vertesy
Copy link
Author

vertesy commented Jan 17, 2025

Thank you Denes for the detailed and clear explanation! I will explore and get back if i have further questions.

best
Abel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A user question or anything not obviously bug
Projects
None yet
Development

No branches or pull requests

2 participants