Name		Name	Last commit message	Last commit date
parent directory ..
exe		exe
includes		includes
input		input
obj		obj
output		output
src		src
README.md		README.md
README.txt		README.txt
checker.py		checker.py
clustering_config.cfg		clustering_config.cfg
clustering_config_w.cfg		clustering_config_w.cfg
makefile		makefile
requirements.txt		requirements.txt

README.md

Clustering

This directory hosts implementations of two distributed clustering algorithms, each designed to cluster sequenced DNA data efficiently.

Q-Gram Clustering

Q-Gram clustering is a distributed clustering algorithm for DNA storage reads proposed in [1] that minimizes the number of edit distance calculations by pre-calculating q-gram signatures for each read and then using Hamming distance comparisons between them The implementation follows the paper [1], please refer to the paper for more details on the implementation.

W-Gram Clustering

In w-gram clustering, we compute w-gram signatures instead of pre-calculating q-gram signatures. While a q-gram signature signals the existence or non-existence of a set of random substrings within a string, a w-gram signature conveys the position of the specified substrings within the string. We use the L1 norm to measure the dissimilarity between the w-gram signatures. Apart from these two modifications, the algorithm retains its structure from the baseline.

Usage

The algorithm being used can be picked in makefile and the clustering configurations can be set in clustering_config.cfg. Once they have been set, just start clustering using

make run

References

[1] Rashtchian, C., Makarychev, K., Racz, M., Ang, S., Jevdjic, D., Yekhanin, S., Ceze, L. and Strauss, K., (2017). Clustering billions of reads for DNA data storage. Advances in Neural Information Processing Systems, 30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3-clustering

3-clustering

README.md

Clustering

Table of Contents

Q-Gram Clustering

W-Gram Clustering

Usage

References

Files

3-clustering

Directory actions

More options

Directory actions

More options

Latest commit

History

3-clustering

Folders and files

parent directory

README.md

Clustering

Table of Contents

Q-Gram Clustering

W-Gram Clustering

Usage

References