-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Biological Network Graph Analysis and Learning (bngal
) is a package written in R to create high-quality, complex correlation networks from biological data.
bngal
creates separate correlation networks at every level of taxonomic classification (phylum-ASV) from an ASV/OTU count table to visualize complex co-occurrence substructures in the data via edge betweenness clustering. Numeric variables from a corresponding metadata can be optionally included to explore environmental-taxonomic correlations. "Subcommunity networks" can be created in parallel to explore different correlation patterns within a dataset in addition to a global comparison. For example, one may want to examine separate networks for the human skin, oral, and gut microbiomes from the same dataset, while also examining microbial co-occurrence patterns across the whole body. Another may want to do the same thing for subsurface environments that span distinct geological contexts. As such, microbial ecologists from a wide range of backgrounds may be interested in applying bngal
to model microbial niche space in the habitats they study!
Although bngal
is released as a standalone R package and can be interactively used in an IDE such as RStudio, I strongly recommend running the command-line utility wrapper (bngal-cli
) to simplify its use, especially for first-time users. You can quickly install both the bngal
R package and its command-line utility wrapper via the following instructions:
-
bngal-cli
requires Anaconda to successfully install. Install the appropriate Anaconda version for your operating system if you don't have it already. - Clone the
bngal-cli
GitHub repository into your directory of choice (my-directory
) and run the setup script in a bash or zsh shell session. This will install thebngal
R package within a new conda environment called "bngal":
cd my-directory
git clone https://github.com/mselensky/bngal-cli
cd bngal-cli
bash bngal-setup.sh
And that's it! Sit tight and grab a coffee while bngal-cli
installs. It may take a few minutes.
Once you successfully install and activate the bngal
environment, you can remove the bngal-cli
folder. When the bngal
environment is active, you will have access to two bngal
functions:
Function | Application |
---|---|
bngal-build-nets |
Build network model(s) according to defined cutoffs |
bngal-summarize-nets |
Summarize and visualize network statistics from bngal-build-nets
|
If you only want to use the bngal
R package interactively, you can install it and its dependencies within an active R session via:
suppressMessages(if (!require("pacman")) install.packages("pacman", repos="https://cran.r-project.org/"))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(dplyr))
pacman::p_load(parallel, tidyr, plyr, Hmisc, RColorBrewer, igraph,
visNetwork, ggpubr, grid, gridExtra, plotly,
purrr, viridis)
if (!require("bngal")) pacman::p_install_gh("mselensky/bngal")
bngal-build-nets
creates co-occurrence networks at every level of taxonomic classification (phylum-ASV). Critically, the first column of the ASV/OTU must be named "sample-id". One of the metadata file's columns must be named "sample-id" (position does not matter). Both files must be in CSV format and contain unique rows.
There are only three required options for bngal-build-nets
: --asv-table
, a rarefied ASV/OTU table, --metadata
, sample metadata corresponding to asv-table
, and --output
, a directory path that must exist. By default, bngal
will only create networks from pairwise associations that have at least 5 observations across the dataset and have an absolute correlation coefficient of at least 0.6 (p <= 0.05). Users may tweak these and many other bngal
parameters to their liking; run bngal-build-nets --help
for more details:
Usage: bngal-build-nets [options]
Options:
-a ASV_TABLE, --asv_table=ASV_TABLE
(Required) ASV count table named by Silva-138 L7 taxonomies. Ideally rarefied and filtered as necessary.
* First column must be named 'sample-id' and must contain unique identifiers.
* Must be an absolute abundance ASV table.
-m METADATA, --metadata=METADATA
(Required) Sample metadata corresponding to asv_table. Must be a .CSV file with sample identifiers in a column named `sample-id.`
-o OUTPUT, --output=OUTPUT
(Required) Output directory for network graphs and data.
-c CORRELATION, --correlation=CORRELATION
Metric for pairwise comparisons. Can be one of 'pearson' or 'spearman'.
* Default = spearman
-r CORR_COLUMNS, --corr_columns=CORR_COLUMNS
Metadata columns to include in pairwise correlation networks.
* Multiple columns may be provided given the following syntax: 'col1,col2'
* Default = NULL
-k CORR_CUTOFF, --corr_cutoff=CORR_CUTOFF
Absolute correlation coefficient cutoff for pairwise comparisons.
* Default = 0.6
-p P_VALUE, --p_value=P_VALUE
Maximum cutoff for p-values calculated from pairwise relationships.
* Default = 0.05
-f ABUN_CUTOFF, --abun_cutoff=ABUN_CUTOFF
Relative abundance cutoff for taxa (values 0-1 accepted). Anything lower than this value is removed before network construction.
* Default = 0
-x CORES, --cores=CORES
Number of CPUs to use. Can only parallelize on Mac or Linux OS. Currently, bngal can only run on multiple cores when sub_comm_col is provided.
* Default = 1
-n SUBNETWORKS, --subnetworks=SUBNETWORKS
Metadata column by which to split data in order to create separate networks.
* If not provided, bngal will create a single network from the input ASV table.
* Default = NULL
-t TRANSFORMATION, --transformation=TRANSFORMATION
Numeric transformation to apply to input data. Can be one of 'log10'.
* Default = NULL
-d DIRECTION, --direction=DIRECTION
Direction for --abun-cutoff. Can be one of 'greaterThan' or 'lessThan'.
* Default = 'greaterThan'
-s SIGN, --sign=SIGN
Type of pairwise relationship for network construction. Can be one of 'positive', 'negative', or 'all'.
* Default = 'all'
-b OBS_THRESHOLD, --obs_threshold=OBS_THRESHOLD
('Observational threshold') Minimum number of unique observations required for a given pairwise relationship to be included in the network.
* Default = 5
-g GRAPH_LAYOUT, --graph_layout=GRAPH_LAYOUT
Type of igraph layout for output network plots.
* Refer to the igraph documentation for the full list of options: https://igraph.org/r/html/latest/layout_.html
* Default = 'layout_nicely'
-h, --help
Show this help message and exit
The simplest use case is to create a global network of the entire input ASV table without including any metadata variables:
conda activate bngal
cd data-directory
OUT_DR=output-directory
mkdir -p $OUT_DR # output directory must exist
bngal-build-nets \
--asv_table="rarefied-asv-table.csv" \
--metadata="sample_metadata.csv" \
--output=$OUT_DR
An example output from this command is below:
![Screen Shot 2022-08-08 at 8 44 00 PM](https://user-images.githubusercontent.com/48727421/183544645-b2ff4b3e-9789-4f37-b617-b68b47ee8cea.png)
![Screen Shot 2022-08-08 at 8 44 34 PM](https://user-images.githubusercontent.com/48727421/183544715-773614fc-002b-44b1-bcd6-6bb4983a0258.png)
If you want to split your input data into separate networks based on the metadata column "region"
, run them in parallel across 4 CPUs, reduce the number of required pairwise associations to 3 observations in the dataset, and include 5 numeric metadata variables in the network ("metacol[1-5]"), the command would be:
conda activate bngal
cd data-directory
OUT_DR=output-directory
mkdir -p $OUT_DR # output directory must exist
bngal-build-nets \
--asv_table="rarefied-asv-table.csv" \
--metadata="sample_metadata.csv" \
--output=$OUT_DR \
--obs_threshold=3 \
--subnetworks="region" \
--cores=4 \
--corr_columns='metacol1,metacol2,metacol3,metacol4,metacol5'
This creates interactive network plots via plotly for each unique variable within the metadata
column "region"
. Taxa nodes are represented as filled circles, while metadata variables are squares. The width of each edge corresponds to the strength of the correlation coefficient, and the color indicates its direction (red=negative, blue=positive). As an example, genus-level associations from the "region 6"
variable can be visualized by their edge betweenness clusters:
![Screen Shot 2022-08-08 at 5 09 34 PM](https://user-images.githubusercontent.com/48727421/183523201-0aab1948-9899-43b3-9a37-83a68c40325e.png)
They can also be selected and colored by phylum:
![Screen Shot 2022-08-08 at 5 18 31 PM](https://user-images.githubusercontent.com/48727421/183524179-4ddcc278-0f07-4868-80fe-430b41dca98a.png)
Plots can also be colored by "functional grouping" from a curated list of family-level functions defined in the literature. Note: be very careful with any conclusions you might draw from this! Remember that phylogeny != function. Functional categories are based on the nearest cultured relative. When multiple major biogeochemical functions are represented within a given family, the grouping is marked as "multiple".
![Screen Shot 2022-08-08 at 5 11 39 PM](https://user-images.githubusercontent.com/48727421/183523451-b4f64262-0110-4dd5-8ab8-10db44138396.png)
bngal
will automatically produce these three plots for each unique "region" within the "region"
column for every level of taxonomic classification. These plots can be found in a subfolder that is named the same as what is defined in the "--graph_layout
" option (layout_nicely
by default). The underlying data for each network is saved into the network-data
subfolder for downstream functions (coming soon!) and are grouped by taxonomic level, and pairwise correlation statistics for each constructed network are exported to the subfolder pairwise-summaries
.