Skip to content
Lucas Czech edited this page Feb 18, 2025 · 63 revisions

gappa is a collection of commands for working with phylogenetic data. Its main focus are evolutionary placements of short environmental sequences on a reference phylogenetic tree. See Phylogenetic Placement for an introduction describing a typical pipeline.

Many commands in gappa are implementations of our novel methods. At the same time, it offers some commands that are also implemented in the excellent guppy tool. However, being written in C++, our gappa is much faster and needs less memory for most of the tasks.

Command Line Interface

gappa is used via its command line interface, with subcommands for each task. The commands have the general structure:

gappa <module> <subcommand> <options>

The modules are simply a way of organizing the commands.


  • Module analyze: Analyze and compare different jplace files, that is, find differences and patterns between different samples.
  • Module edit: Edit, manipulate, and transform files in different formats.
  • Module examine: Examine, visualize, and tabulate information in files.
  • Module prepare: Prepare and generate data and files needed to run typical pipelines and analyses.

Module analyze

Commands for analyzing and comparing placement data, that is, finding differences and patterns.

Subcommand Description
correlation Calculate the Edge Correlation of samples and metadata features.
dispersion Calculate the Edge Dispersion between samples.
edgepca Perform Edge PCA (Principal Component Analysis) for a set of samples.
imbalance-kmeans Run Imbalance k-means clustering on a set of samples.
krd Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples.
phylogenetic-kmeans Run Phylogenetic k-means clustering on a set of samples.
placement-factorization Perform Placement-Factorization on a set of samples.
squash Perform Squash Clustering for a set of samples.

Module edit

Commands for editing and manipulating files like jplace, fasta or newick.

Subcommand Description
accumulate Accumulate the masses of each query in jplace files into basal branches so that they exceed a given mass threshold.
extract Extract placements from clades of the tree and write per-clade jplace files.
filter Filter jplace files according to some criteria, that is, remove all queries and/or placement locations that do not pass the provided filter(s).
merge Merge jplace files by combining their pqueries into one file.
multiplicity Edit the multiplicities of queries in jplace files.
split Split the queries in jplace files into multiple files, for example, according to an OTU table.

Module examine

Commands for examining, visualizing, and tabulating information in placement data.

Subcommand Description
assign Taxonomically assign placed query sequences and output tabulated summarization.
edpl Calcualte the Expected Distance between Placement Locations (EDPL) for all pqueries.
graft Make a tree with each of the query sequences represented as a pendant edge.
heat-tree Make a tree with edges colored according to the placement mass of the samples.
info Print basic information about placement files.
lwr-distribution Print a summary table that represents the distribution of the likelihood weight ratios (LWRs) of all pqueries.
lwr-histogram Print a table with histograms of the likelihood weight ratios (LWRs) of all pqueries.
lwr-list Print a list of all pqueries with their likelihood weight ratios (LWRs).

Module prepare

Commands for preparing and preprocessing of phylogenetic and placement data.

Subcommand Description
chunkify Chunkify a set of fasta files and create abundance maps.
clean-tree Clean a tree in Newick format by removing parts that other parsers have difficulties with.
phat Generate consensus sequences from a sequence database according to the PhAT method.
taxonomy-tree Turn a taxonomy into a tree that can be used as a constraint for tree inference.
unchunkify Unchunkify a set of jplace files using abundance map files and create per-sample jplace files.

Module simulate

Commands for random generation of phylogenetic and placement data.

Subcommand Description
random-alignment Create a random alignment with a given numer of sequences of a given length.
random-placements Create a set of random phylogenetic placements on a given reference tree.
random-tree Create a random tree with a given numer of leaf nodes.

Module tools

Auxiliary commands of gappa.

Subcommand Description
citation Print references to be cited when using gappa.
license Show the license of gappa.
version Extended version information about gappa.
Clone this wiki locally