GitHub - marcelm/dnaio: Efficiently read and write sequencing data from Python

dnaio processes FASTQ, FASTA and uBAM files

dnaio is a Python 3.9+ library for very efficient parsing and writing of FASTQ and also FASTA files. Since dnaio version 1.1.0, support for efficiently parsing uBAM files has been implemented. This allows reading ONT files from the dorado basecaller directly.

The code was previously part of the Cutadapt tool and has been improved significantly since it has been split out.

Example usage

The main interface is the dnaio.open function:

import dnaio

with dnaio.open("reads.fastq.gz") as f:
    bp = 0
    for record in f:
        bp += len(record)
print(f"The input file contains {bp/1E6:.1f} Mbp")

For more, see the tutorial and API documentation.

Installation

Using pip:

pip install dnaio zstandard

zstandard can be omitted if support for Zstandard (.zst) files is not required.

Features and supported file types

FASTQ input and output
FASTA input and output
BAM input
Compressed input and output (.gz, .bz2, .xz and .zst are detected automatically)
Paired-end data in two files
Interleaved paired-end data in a single file
Files with DOS/Windows linebreaks can be read
FASTQ files with a second header line (after the +) are supported

Limitations

Multi-line FASTQ files are not supported
FASTQ and uBAM parsing is the focus of this library. The FASTA parser is not as optimized

Name	Name	Last commit message	Last commit date
Latest commit marcelm Changelog v1.2.3 Nov 12, 2024 213563e · Nov 12, 2024 History 793 Commits
.github/workflows	.github/workflows	Test all supported Pythons on Windows	Nov 11, 2024
doc	doc	Documentation and changelog for BAM reading	Nov 2, 2023
helpers	helpers	Reformat with Black	Apr 8, 2022
src/dnaio	src/dnaio	Prefer peeking over seeking	Nov 11, 2024
tests	tests	Improve user messages	Nov 4, 2024
.codecov.yml	.codecov.yml	Remove duplicate configuration key	Nov 10, 2020
.editorconfig	.editorconfig	Tabs to spaces	Jun 26, 2019
.gitattributes	.gitattributes	Prevent \r\n line endings in sam file	Sep 26, 2023
.gitignore	.gitignore	Switch to setuptools_scm	Jun 26, 2019
.pre-commit-config.yaml	.pre-commit-config.yaml	Add pre-commit configuration	Apr 8, 2022
.readthedocs.yaml	.readthedocs.yaml	Add build.os key	Aug 31, 2023
CHANGES.rst	CHANGES.rst	Changelog v1.2.3	Nov 12, 2024
CITATION.cff	CITATION.cff	Add all Ruben's given names	Jan 22, 2024
LICENSE	LICENSE	MIT license	Nov 20, 2023
README.rst	README.rst	Drop support for Python 3.8	Nov 8, 2024
pyproject.toml	pyproject.toml	Drop support for Python 3.8	Nov 8, 2024
setup.py	setup.py	Simplify ASCII check	Sep 17, 2023
tox.ini	tox.ini	Drop support for Python 3.8	Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dnaio processes FASTQ, FASTA and uBAM files

Example usage

Installation

Features and supported file types

Limitations

Links

About

Used by 215

Contributors 5

Languages

License

marcelm/dnaio

Folders and files

Latest commit

History

Repository files navigation

dnaio processes FASTQ, FASTA and uBAM files

Example usage

Installation

Features and supported file types

Limitations

Links

About

Topics

Resources

License

Citation

Stars

Watchers

Forks

Used by 215

Contributors 5

Languages