The Secure Bioinformatics Reuse effort is a subproject of the Secure Code Reuse Project focusing on, well, bioinformatics.
The goal of the subproject is to assess security vulnerabilities in open source bioinformatics software packages.
Our approach is informed by the work of Duan, R. et al. "Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages". arXiv:2002.01139 [cs] (2020) which involved analysis of metadata, Abstract Syntax Trees (ASTs), dataflow, and dynamic execution of JavaScript, Python, and Ruby from their respective package managers.
In this subproject we
- Implement a simple metadata search of Bioconda recipes or BioContainers Dockerfiles
- Trace Bioconda package installs and BioContainers Dockerfile builds
- Trace execution of nf-core Nextflow pipelines
- Scan recent bioinformatics Python repositories using Aura, a Python tool developed by Martin Čarnogurský to analyze the AST of Python code
Bioconda is a channel for the conda package manager specializing in bioinformatics software. Each package added to Bioconda also has a corresponding Docker BioContainer automatically created and uploaded to Quay.io. nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow, software which enables scalable and reproducible scientific workflows using software containers.
We seek to answer the following questions:
- What Bioconda and BioContainer metadata exists? Is the data useful? If so, how can the data be used, and what does the data show?
- Can useful dynamic analysis be performed during conda installs and docker builds? If so, how can this dynamic analysis be done, and what does the analysis show?
- Can useful dynamic analysis be performed during nextflow pipeline runs? If so, how can this dynamic analysis be done, and what does the analysis show?
- What security vulnerabilities does Aura identify in recent bioinformatics repositories? Do the characterizes of vulnerabilities in bioinformatics Python packages differ from Python packages in general?
The code in this repository requires Conda, Docker, Nextflow, nf-core, Python (and associated requirements), and the AWS command line interface (for configuration). Dask, an open source library for parallel computing written in Python, is used to distribute processing over a cluster. As a result, after creating an instance with all requirements installed, and Amazon Machine Image is created and used for launching images on the cluster. The following sections describe installation details.
All images are based on Ubuntu 20.04 LTS.
Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux.
See: https://conda.io/projects/conda/en/latest/user-guide/install/linux.html
diff Miniconda3-latest-Linux-x86_64.sh.sha256sum.actual Miniconda3-latest-Linux-x86_64.sh.sha256sum.expected
bash Miniconda3-latest-Linux-x86_64.sh
conda config --set auto_activate_base false
Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
See: https://conda.io/projects/conda/en/latest/user-guide/install/linux.html, and https://docs.docker.com/engine/install/linux-postinstall/
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
| sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo docker run hello-world
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
docker run hello-world
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
Nextflow is a reactive workflow framework and a programming DSL that eases the writing of data-intensive computational pipelines.
curl -s https://get.nextflow.io | bash
nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow.
See: https://nf-co.re/
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create --name nf-core python=3.7 nf-core nextflow
conda activate nf-core
nf-core list --json
Python is an interpreted high-level general-purpose programming language.
sudo apt-get install python3-venv
pip install -r requirements.txt
GreyNoise tells security analysts what not to worry about. They collect, analyze and label data on IPs that saturate security tools with noise. This unique perspective helps analysts confidently ignore irrelevant or harmless activity, creating more time to uncover and investigate true threats.
See: https://www.greynoise.io/, https://developer.greynoise.io/reference/community-api, and https://developer.greynoise.io/docs/libraries-sample-code
greynoise setup -k UserAPIKey
Amazon Web Services (AWS) is a subsidiary of Amazon providing on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services.
See: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
sudo apt install awscli
aws configure
An alternative to strace. Not currently used.
See: http://manpages.ubuntu.com/manpages/focal/man1/dtrace.1.html
sudo apt install systemtap-sdt-dev
An alternative to strace that produces JSON output. Not currently used.
See: https://github.com/draios/sysdig/wiki/How-to-Install-Sysdig-for-Linux
sudo apt install sysdig
The one true editor. For the old school, or old school at heart.
See: https://www.gnu.org/software/emacs/
sudo snap install emacs --classic
The simple tools in this repository are written in bash and distributed using Python. This section contains a summary of usage.
Use Aura to scan a Python path or Git repository.
aura-scan [-R] [-H target-host] [-P] python-src scan-home
Uses Aura to scan a Python source, either a path or Git repository, and produce JSON output in the scan home directory. The Python path can be to an individual Python file, or to a directory containing Python files.
Optionally recursively copy the output directory to the target host, or purge the output directory.
-R Recursively copy the output directory to the target host
-H Set the target host IP address, default: 52.207.108.184
-P Purge output directory
Trace a conda install of a package.
strace-conda-install [-c channel] [-s suffix] [-C] [-R] [-H target-host] [-P] package
Uses strace to trace the installation of a package fron a channel using conda.
A directory is created to contain all output files, and each uses a base name give by "strace-conda-install--".
Optionally recursively copy the output directory to the target host, or purge the output directory.
-c The conda channel containing the package, default: bioconda
-s The suffix of the base name for the output directory and
files, default: ""
-C Clean conda environment
-R Recursively copy the output directory to the target host
-H Set the target host IP address, default: 52.207.108.184
-P Purge output directory
Trace a docker build.
strace-docker-build [-s suffix] [-C] [-R] [-H target-host] [-P] build-directory package version
Uses strace to trace the build of the docker file in the build directory with tag "ralatsdio/:".
A directory is created to contain all output files, and each uses a base name give by "strace-docker-build--%{version}".
Optionally recursively copy the output directory to the target host, or purge the output directory.
-s The suffix of the base name for the output directory and
files, default: ""
-C Clean up new Docker images
-R Recursively copy the output directory to the target host
-H Set the target host IP address, default: 52.207.108.184
-P Purge output directory
Trace run of nf-core pipeline.
strace-pipeline-run [-R] [-H target-host] [-P] pipeline
Uses strace to trace the nextflow run of an nf-core pipeline.
Optionally recursively copy the output directory to the target host, or purge the output directory.
-R Recursively copy the output directory to the target host
-H Set the target host IP address, default: 52.207.108.184
-P Purge output directory
Run functions on a cluster.
distribute.py [-h] [-C TARGET_COUNT] [-T INSTANCE_TYPE] [-s] [-t] [-R MAX_RUNS] [-F] [-a | -c | -d | -p]
Run bash scripts as a subprocess functions on a Dask cluster.
-h, --help show this help message and exit
-C TARGET_COUNT, --target-count TARGET_COUNT
target count of machines in cluster
-T INSTANCE_TYPE, --instance-type INSTANCE_TYPE
instance type for machines in cluster
-s, --start-pool start instances in cluster
-t, --terminate-pool terminate instances in cluster
-R MAX_RUNS, --max-runs MAX_RUNS
maximum number of runs
-F, --run-function run function locally for testing
-a, --aura-scan run Aura scans
-c, --strace-conda-install
trace conda installs
-d, --strace-docker-build
trace docker builds
-p, --strace-pipeline-run
trace pipeline runs
Directions for contributing to Bioconda are collected here so that simple exploits can be attempted.
-
Create a Fork of the IQT Labs Bioconda recipes repository
[email protected]:IQTLabs/bioconda-recipes.git
-
Create a local “clone”
git clone [email protected]:IQTLabs/bioconda-recipes.git cd bioconda-recipes git remote add upstream https://github.com/bioconda/bioconda-recipes.git
-
Create a branch
# Make sure our master is up to date with Bioconda git checkout master git pull upstream master git push origin master # Create and checkout a new branch for our work git checkout -b ralatsdc/recipe-for-apc git push -u origin ralatsdc/recipe-for-apc
-
Make some edits
-
Test recipes locally using the Circle CI client
You can execute an almost exact copy of the Bioconda Linux build pipeline by installing the CircleCI client locally and running it from the folder where your copy of bioconda-recipes resides:
# Ensure the build container is up-to-date docker pull quay.io/bioconda/bioconda-utils-build-env:latest # Run the build locally circleci build
You can use Docker volume bind-mounts to capture the local package channel with the newly built packages:
mkdir -p conda-bld rm -rf conda-bld/* circleci build --volume $PWD/conda-bld:/opt/conda/conda-bld
After a successful build, you can then install from the local channel by providing the path to it:
conda install -c file://$PWD/conda-bld your-package
For the apc package, for example, after modifying any of the files in recipes/apc, run these commands:
conda activate apc cd ~/bioconda_recipes mkdir -p conda-bld conda remove apc conda clean --all sudo rm -rf conda-bld/* circleci build --volume $PWD/conda-bld:/opt/conda/conda-bld conda install -c file://$PWD/conda-bld apc
Note that a hash, preferably sha256, is required to verify the integrity of the source package. Generate the hash using, for example:
wget -O- https://github.com/ralatsdc/apc/archive/refs/tags/v0.1.2.tar.gz | shasum -a 256
-
Push changes
-
Create a pull request
Once you have opened a PR, the Bioconda build system will start testing your changes. The recipes you have added or modified will be linted and built. Unless you are very lucky, you will encounter some errors during the build you will have to fix. Repeat 2. Make Some Edits and 3. Push Changes as often as needed.
Eventually, your build will “turn green”. If you are a member of Bioconda, you can now add the please review & merge label to submit your PR for review. Otherwise, just ask on Gitter or ping @bioconda/core.
Once you changes have been approved, they will be “merged” into the Bioconda main repository and the altered packages uploaded to our channel.
-
Delete your branch
-
Install your package
See: https://conda.io/projects/conda/en/latest/index.html See: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
See: https://man7.org/linux/man-pages/man1/strace.1.html