Skip to content

Prediction/source tracking of metagenomic samples source using machine learning

License

Notifications You must be signed in to change notification settings

maxibor/sourcepredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f78d01d · Jun 28, 2019

History

86 Commits
Jun 28, 2019
Jun 27, 2019
Jun 28, 2019
Jan 7, 2019
Jan 7, 2019
Jun 28, 2019
Jun 28, 2019
Jun 28, 2019
Jun 28, 2019
Jan 4, 2019
Jun 27, 2019
Jun 27, 2019
May 21, 2019
Jun 27, 2019
Dec 13, 2018
Jun 28, 2019
Jun 27, 2019
Jun 28, 2019

Repository files navigation

Build Status Coverage Status Anaconda-Server Badge Documentation Status

Prediction/classification of the origin of a metagenomics sample.

Installation

$ conda install -c etetoolkit -c bioconda -c maxibor sourcepredict

Example

$ wget wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sample.csv -O dog_test_sample.csv
$ sourcepredict -t 6 dog_test_sample.csv
Step 1: Checking for unknown proportion
  == Sample: ERR1915662 ==
	Adding unknown
	Normalizing (GMPR)
	Computing Bray-Curtis distance
	Performing MDS embedding in 2 dimensions
	KNN machine learning
	Training KNN classifier on 6 cores...
	-> Testing Accuracy: 1.0
	----------------------
	- Sample: ERR1915662
		 known:98.61%
		 unknown:1.39%
  == Sample: ERR1915662_copy ==
	Adding unknown
	Normalizing (GMPR)
	Computing Bray-Curtis distance
	Performing MDS embedding in 2 dimensions
	KNN machine learning
	Training KNN classifier on 6 cores...
	-> Testing Accuracy: 1.0
	----------------------
	- Sample: ERR1915662_copy
		 known:98.61%
		 unknown:1.39%
Step 2: Checking for source proportion
	Computing weighted_unifrac distance on species rank
	TSNE embedding in 2 dimensions
	KNN machine learning
	Performing 5 fold cross validation on 6 cores...
	Trained KNN classifier with 10 neighbors
	-> Testing Accuracy: 0.99
	----------------------
	- Sample: ERR1915662
		 Canis_familiaris:96.14%
		 Homo_sapiens:2.44%
		 Soil:1.42%
	- Sample: ERR1915662_copy
		 Canis_familiaris:96.14%
		 Homo_sapiens:2.44%
		 Soil:1.42%
Sourcepredict result written to dog_test_sample.sourcepredict.csv

Documentation

The documentation of SourcePredict is available here: sourcepredict.readthedocs.io

Sourcepredict source file

Environments included in the default source file

  • Homo sapiens gut microbiome
  • Canis familiaris gut microbiome
  • Soil microbiome

Updating the source file

To update the sourcefile with new kraken results, see the instruction in the dedicated Jupyter notebook

About

Prediction/source tracking of metagenomic samples source using machine learning

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published