PanGraph-DB : A Graph Database Framework for Complex Multi-Pangenome Analyses

About the project

PanGraph-DB is a data-centric pipeline capable of operating on a unified graph dataset consisting of multiple pangenome graphs, as computed by the PPanGGOLiN framework, and that further leverages the Neo4j graph database to perform complex analyses. These are expressed as graph database queries in Neo4j's Cypher query language. Note that the methodology is, however, system-agnostic and can be reproduced using any other graph database system.

This repository includes a Jupyter notebook describing performance and scalability experiments performed on datasets of up to 10 pangenomes, with sizes ranging from 200 to 1800 MB, on a workload comprised of 10 queries (available here).

Pangenomes	# of genes	# of genomes	# of families	# raw edges	# of RGPs	# of spots	# of modules	HDF5 size (MB)
Acinetobacter baumannii	1 044 515	285	14 400	30 147	9 764	364	609	616
Enterobacter bugandensis	526 062	118	18 143	23 734	3 424	326	250	212
Enterobacter cloacae	651 827	137	22 953	32 270	6 083	292	526	358
Enterobacter hormaechei	739 490	159	18 166	29 798	5744	280	742	415
Enterobacter kobei	705 811	150	20 836	29 311	5 740	181	535	386
Enterobacter roggenkampii	978 031	210	26 080	40 459	8 807	319	712	537
Enterococcus faecium	570 257	207	7 889	18 627	6 195	189	318	301
Klebsiella pneumoniae	3 100 409	600	29 139	61 865	25 014	529	1 167	1 800
Pseudomonas aeruginosa	1 892 646	313	23 699	42 084	10 706	543	909	1200
Staphylococcus aureus	1 686 977	638	7 017	18 047	11 869	268	203	991

Authors

Jérôme Arnoux, Genoscope/LABGeM - CEA, CNRS, Paris Saclay University
Angela Bonifati, Liris CNRS, Lyon 1 University
Alexandra Calteau, Genoscope/LABGeM - CEA, CNRS, Paris Saclay University
Stefania Dumbrava, SAMOVAR/Inst. Polytechnique de Paris, ENSIIE
Guillaume Gautreau, MetaGenoPolis, University Paris-Saclay, INRAE, MGP

Dependencies

We list all required dependencies below.

Use pip to install:

dict2graph==2.0.0
graphio==0.4.0

Use conda to install:

ppanggolin==1.2.74
pyhmmer==0.6.3
py2neo==2021.2.3
rgi==6.0.1
genome_updater==0.5.1

Neo4j:

Add local DBMS with a Neo4j version of 4.4.11
APOC 4.4.0.10 or more
Optional : Neo4J Desktop 1.5.0

Dataset

The original dataset is available here.

Running the project

To begin, note that you must have an empty Neo4J DMBS (version 4.4.11) open and available with the APOC plugin install (version 4.4.0.10).

To execute the PangenomeGraph.ipynb script, you will need to first install some packages.

These are listed in the following conda environment file conda-env.yml.

The in development version of PPanGGOLiN is required to satisfy some feature and pangenome compatibility.

To install the conda environment in the jupyter kernel, please copy and paste the following code in your terminal:

git clone https://github.com/labgem/PanGraph-DB.git
conda update -n base -c defaults conda -y
conda env create --file conda-env.yml
conda init bash
conda activate pangraph
git clone -b release1.3 https://github.com/labgem/PPanGGOLiN.git
pip install PPanGGOLiN/.
pip install --user ipykernel
python -m ipykernel install --user --name=pangraph
jupyter notebook --notebook-dir=./PanGraph-DB

Next run all the cells to obtain the corresponding results. Don't forget to change the data path.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
script		script
PangenomeGraph.ipynb		PangenomeGraph.ipynb
README.md		README.md
conda-env.yml		conda-env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PanGraph-DB : A Graph Database Framework for Complex Multi-Pangenome Analyses

About the project

Authors

Dependencies

Dataset

Running the project

About

Releases

Packages

Contributors 2

Languages

jpjarnoux/PanGraph-DB

Folders and files

Latest commit

History

Repository files navigation

PanGraph-DB : A Graph Database Framework for Complex Multi-Pangenome Analyses

About the project

Authors

Dependencies

Dataset

Running the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages