Home

Welcome to the `bngal` wiki!

What is `bngal`?

Biological Network Graph Analysis and Learning (bngal) is a package written in R to create high-quality, complex correlation networks from biological data.

bngal creates separate correlation networks at every level of taxonomic classification (phylum-ASV) from an ASV/OTU count table to visualize complex co-occurrence substructures in the data via edge betweenness clustering. Numeric variables from a corresponding metadata can be optionally included to explore environmental-taxonomic correlations. "Subcommunity networks" can be created in parallel to explore different correlation patterns within a dataset in addition to a global comparison. For example, one may want to examine separate networks for the human skin, oral, and gut microbiomes from the same dataset, while also examining microbial co-occurrence patterns across the whole body. Another may want to do the same thing for subsurface environments that span distinct geological contexts. As such, microbial ecologists from a wide range of backgrounds may be interested in applying bngal to model microbial niche space in the habitats they study!

Installation

Although bngal is released as a standalone R package and can be interactively used in an IDE such as RStudio, I strongly recommend running the command-line utility wrapper (bngal-cli) to simplify its use, especially for first-time users. You can quickly install both the bngal R package and its command-line utility wrapper via the following instructions:

Command line utility (recommended)

bngal-cli requires Anaconda to successfully install. Install the appropriate Anaconda version for your operating system if you don't have it already.
Clone the bngal-cli GitHub repository into your directory of choice (my-directory) and run the setup script in a bash or zsh shell session. This will install the bngal R package within a new conda environment called "bngal":

cd my-directory
git clone https://github.com/mselensky/bngal-cli
cd bngal-cli
bash bngal-setup.sh

And that's it! Sit tight and grab a coffee while bngal-cli installs. It may take a few minutes.

Once you successfully install and activate the bngal environment, you can remove the bngal-cli folder. When the bngal environment is active, you will have access to two bngal functions:

Function	Application
`bngal-build-nets`	Build network model(s) according to defined cutoffs
`bngal-summarize-nets`	Summarize and visualize network statistics from `bngal-build-nets`

R package only

If you only want to use the bngal R package interactively, you can install it and its dependencies within an active R session via:

suppressMessages(if (!require("pacman")) install.packages("pacman", repos="https://cran.r-project.org/"))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(dplyr))
pacman::p_load(parallel, tidyr, plyr, Hmisc, RColorBrewer, igraph,
               visNetwork, ggpubr, grid, gridExtra, plotly,
               purrr, viridis)
if (!require("bngal")) pacman::p_install_gh("mselensky/bngal")

Example use case

bngal-build-nets creates co-occurrence networks at every level of taxonomic classification (phylum-ASV). Critically, the first column of the ASV/OTU must be named "sample-id". One of the metadata file's columns must be named "sample-id" (position does not matter). Both files must be in CSV format and contain unique rows.

There are only three required options for bngal-build-nets: --asv-table, a rarefied ASV/OTU table, --metadata, sample metadata corresponding to asv-table, and --output, a directory path that must exist. By default, bngal will only create networks from pairwise associations that have at least 5 observations across the dataset and have an absolute correlation coefficient of at least 0.6 (p <= 0.05). Users may tweak these and many other bngal parameters to their liking; run bngal-build-nets --help for more details:

Usage: bngal-build-nets [options]

Options:
	-a ASV_TABLE, --asv_table=ASV_TABLE
		(Required) ASV count table named by Silva-138 L7 taxonomies. Ideally rarefied and filtered as necessary.
                        * First column must be named 'sample-id' and must contain unique identifiers.
                        * Must be an absolute abundance ASV table.

	-m METADATA, --metadata=METADATA
		(Required) Sample metadata corresponding to asv_table. Must be a .CSV file with sample identifiers in a column named `sample-id.`

	-o OUTPUT, --output=OUTPUT
		(Required) Output directory for network graphs and data.

	-c CORRELATION, --correlation=CORRELATION
		Metric for pairwise comparisons. Can be one of 'pearson' or 'spearman'.
                        * Default = spearman

	-r CORR_COLUMNS, --corr_columns=CORR_COLUMNS
		Metadata columns to include in pairwise correlation networks.
                        * Multiple columns may be provided given the following syntax: 'col1,col2'
                        * Default = NULL

	-k CORR_CUTOFF, --corr_cutoff=CORR_CUTOFF
		Absolute correlation coefficient cutoff for pairwise comparisons.
                        * Default = 0.6

	-p P_VALUE, --p_value=P_VALUE
		Maximum cutoff for p-values calculated from pairwise relationships.
                        * Default = 0.05

	-f ABUN_CUTOFF, --abun_cutoff=ABUN_CUTOFF
		Relative abundance cutoff for taxa (values 0-1 accepted). Anything lower than this value is removed before network construction.
                        * Default = 0

	-x CORES, --cores=CORES
		Number of CPUs to use. Can only parallelize on Mac or Linux OS. Currently, bngal can only run on multiple cores when sub_comm_col is provided.
                        * Default = 1

	-n SUBNETWORKS, --subnetworks=SUBNETWORKS
		Metadata column by which to split data in order to create separate networks.
                        * If not provided, bngal will create a single network from the input ASV table.
                        * Default = NULL

	-t TRANSFORMATION, --transformation=TRANSFORMATION
		Numeric transformation to apply to input data. Can be one of 'log10'.
                        * Default = NULL

	-d DIRECTION, --direction=DIRECTION
		Direction for --abun-cutoff. Can be one of 'greaterThan' or 'lessThan'.
                        * Default = 'greaterThan'

	-s SIGN, --sign=SIGN
		Type of pairwise relationship for network construction. Can be one of 'positive', 'negative', or 'all'.
                        * Default = 'all'

	-b OBS_THRESHOLD, --obs_threshold=OBS_THRESHOLD
		('Observational threshold') Minimum number of unique observations required for a given pairwise relationship to be included in the network.
                        * Default = 5

	-g GRAPH_LAYOUT, --graph_layout=GRAPH_LAYOUT
		Type of igraph layout for output network plots.
                        * Refer to the igraph documentation for the full list of options: https://igraph.org/r/html/latest/layout_.html
                        * Default = 'layout_nicely'

	-h, --help
		Show this help message and exit

The simplest use case is to create a global network of the entire input ASV table without including any metadata variables:

conda activate bngal

cd data-directory
OUT_DR=output-directory
mkdir -p $OUT_DR # output directory must exist

bngal-build-nets \
  --asv_table="rarefied-asv-table.csv" \
  --metadata="sample_metadata.csv" \
  --output=$OUT_DR

An example output from this command is below:

If you want to split your input data into separate networks based on the metadata column "region", run them in parallel across 4 CPUs, reduce the number of required pairwise associations to 3 observations in the dataset, and include 5 numeric metadata variables in the network ("metacol[1-5]"), the command would be:

conda activate bngal

cd data-directory
OUT_DR=output-directory
mkdir -p $OUT_DR # output directory must exist

bngal-build-nets \
  --asv_table="rarefied-asv-table.csv" \
  --metadata="sample_metadata.csv" \
  --output=$OUT_DR \
  --obs_threshold=3 \
  --subnetworks="region" \
  --cores=4 \
  --corr_columns='metacol1,metacol2,metacol3,metacol4,metacol5'

This creates interactive network plots via plotly for each unique variable within the metadata column "region". Taxa nodes are represented as filled circles, while metadata variables are squares. The width of each edge corresponds to the strength of the correlation coefficient, and the color indicates its direction (red=negative, blue=positive). As an example, genus-level associations from the "region 6" variable can be visualized by their edge betweenness clusters:

They can also be selected and colored by phylum:

Plots can also be colored by "functional grouping" from a curated list of family-level functions defined in the literature. Note: be very careful with any conclusions you might draw from this! Remember that phylogeny != function. Functional categories are based on the nearest cultured relative. When multiple major biogeochemical functions are represented within a given family, the grouping is marked as "multiple".

bngal will automatically produce these three plots for each unique "region" within the "region" column for every level of taxonomic classification. These plots can be found in a subfolder that is named the same as what is defined in the "--graph_layout" option (layout_nicely by default). The underlying data for each network is saved into the network-data subfolder for downstream functions (coming soon!) and are grouped by taxonomic level, and pairwise correlation statistics for each constructed network are exported to the subfolder pairwise-summaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the `bngal` wiki!

What is `bngal`?

Installation

Command line utility (recommended)

R package only

Example use case

Clone this wiki locally

Home

Welcome to the bngal wiki!

What is bngal?

Installation

Command line utility (recommended)

R package only

Example use case

Clone this wiki locally

Welcome to the `bngal` wiki!

What is `bngal`?