-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Biological Network Graph Analysis and Learning (bngal
) is a package written in R to create high-quality, complex correlation networks from biological data.
bngal
creates separate correlation networks at every level of taxonomic classification (phylum-ASV) from an ASV/OTU count table to visualize complex co-occurrence substructures in the data via edge betweenness clustering. Numeric variables from a corresponding metadata can be optionally included to explore environmental-taxonomic correlations. "Subcommunity networks" can be created in parallel to explore different correlation patterns within a dataset in addition to a global comparison. For example, one may want to examine separate networks for the human skin, oral, and gut microbiomes from the same dataset, while also examining microbial co-occurrence patterns across the whole body. Another may want to do the same thing for subsurface environments that span distinct geological contexts. As such, microbial ecologists from a wide range of backgrounds may be interested in applying bngal
to model microbial niche space in the habitats they study!
Although bngal
is released as a standalone R package and can be interactively used in an IDE such as RStudio, I strongly recommend running the command-line utility wrapper (bngal-cli
) to simplify its use, especially for first-time users. You can quickly install both the bngal
R package and its command-line utility wrapper via the following instructions:
-
bngal-cli
requires Anaconda to successfully install. Install the appropriate Anaconda version for your operating system if you don't have it already. - Clone the
bngal-cli
GitHub repository into your directory of choice (my-directory
) and run the setup script in a bash or zsh shell session. This will install thebngal
R package within a new conda environment called "bngal":
cd my-directory
git clone https://github.com/mselensky/bngal-cli
cd bngal-cli
bash bngal-setup.sh
And that's it! Sit tight and grab a coffee while bngal-cli
installs. It may take a few minutes.
Once you successfully install and activate the bngal
environment, you can remove the bngal-cli
folder. When the bngal
environment is active, you will have access to two bngal
functions:
Function | Application |
---|---|
bngal-build-nets |
Build network model(s) according to defined cutoffs |
bngal-summarize-nets |
Summarize and visualize network statistics from bngal-build-nets
|
If you only want to use the bngal
R package interactively, you can install it and its dependencies within an active R session via:
source("https://raw.githubusercontent.com/mselensky/bngal-cli/main/R/install-R-pkgs.R")
The first step in the bngal
pipeline, bngal-build-nets
, creates co-occurrence networks at every level of taxonomic classification (phylum-ASV) and exports the output data for downstream processing. Critically, the first column of the ASV/OTU table must be named "sample-id". One of the metadata file's columns must also be named "sample-id" (position does not matter). Both files must be in CSV format and contain unique "sample-id" values.
There are only three required options for bngal-build-nets
: --asv-table
, a rarefied ASV/OTU table, --metadata
, sample metadata corresponding to asv-table
, and --output
, a directory path that must exist. By default, bngal
will only create networks from pairwise associations that have at least 5 observations across the dataset and have an absolute correlation coefficient of at least 0.6 (p <= 0.05). Users may tweak these and many other bngal
parameters to their liking - see the bngal-build-nets
Wiki page for more details.
The simplest use case is to create a global network of the entire input ASV table without including any metadata variables. By default, the "observational threshold", or number of unique observations required per pairwise relationship to be included in the network, is set to 5
. Building such a network looks like:
conda activate bngal
cd data-directory
OUT_DR=`pwd`/all-communities
mkdir -p $OUT_DR
bngal-build-nets \
--asv_table="example-asv-table.csv" \
--metadata="example-metadata.csv" \
--output=$OUT_DR
The above command results in several output subfolders. The subfolder network-plots
contains publication-ready network visualizations with nodes colored by phylum and edge between cluster (EBC) in the network-plots/pdfs
subfolder:
Nodes colored by phylum:
Nodes colored by EBC:
Nodes can also be colored by "functional groupings" from a curated list of family-level functions defined in the literature. Note: be very careful with any conclusions you might draw from this! Remember that phylogeny != function. Functional categories are based on the nearest cultured relative. When multiple major biogeochemical functions are represented within a given family, the grouping is marked as "multiple". Refer to this key for Grouping legend names. This feature is only available at the taxonomic level of "family" or below:
To facilitate network structure exploration, network-plots/html
contains the same plots as interactive HTML figures that users can manually manipulate and re-save as PDFs:
The pairwise-summaries
output subfolder contains a list of pairwise node statistics for each sample included in network analysis.
The second step in the bngal
pipeline, bngal-summarize-nets
, outputs more useful network summary data and plots. bngal-summarize-nets
takes the output directory path of bngal-build-nets
as its input. In bash, this --network_dir=$OUT_DR While bngal-build-nets
constructs the networks and identifies edge betweenness clusters (EBC) in the data, bngal-summarize-nets
calculates the relative abundance of each EBC per sample in the dataset. These summary data, alongside the distribution of each EBC and taxon in the dataset, are exported to the network-summary-tables
subfolder. Notably, the "*_tax_spread.csv" output file reports the EBC assigned to a given taxon along with its abundance distribution in the data set.
bngal-summarize-nets
is also useful to visualize biogeographic patterns of taxonomic and EBC distributions. For example, imagine that your samples are categorized by the metadata column region
and you want to examine whether certain EBCs are associated with certain regions. By including the --fill_ebc_by
option below, bngal-summarize-nets
will produce "EBC composition" plots that summarize which region
the majority of the taxa comprising each EBC originate:
bngal-summarize-nets \
--asv_table="example-asv-table.csv" \
--metadata="example-metadata.csv" \
--network_dir=$OUT_DR \
--fill_ebc_by="region"
By examining the contents of ebc-composition-plots
at the ASV level, we see that EBCs 1 and 10 are both highly central clusters in the network, but tend to be most abundant in region4
and region1
, respectively:
To visualize the distribution of EBCs across each sample, bngal-summarize-nets
also produces clustered taxa barplots. By examining the contents of taxa-barplots/ebc
, more biogeographic patters are revealed; EBC 1 is almost exclusive to region4
, EBC 10 is most abundant in a subset of region1
communities, and EBC 3 appears to be fairly well-distributed throughout the dataset:
Similar clustered barplots filled by taxonomic phylum and family-level functional groupings at taxa-barplots/phylum
and taxa-barplots/groupings
, respectively. For example, this is the same clustered barplot filled by phylum: