Skip to content

OTU filtering

Yoann Dufresne edited this page Nov 29, 2017 · 3 revisions

Basic filter for OTU matrix. The aim of this module is to remove OTU with less sequences than a parameter threshold.

Module interactions

Main inputs

  • Minimum read by OTU: The threshold. If it's fixed to 5, all the OTU containing up to 4 reads will be discarded.
  • Input OTU matrix: A tsv file where each line is a cluster and each column a sample. The first line will be used as header and the first column as cluster id. Here is an example of such a matrix
OTU  sample_1 sample_2 sample_3  
 0      0       456      124  
 1      1        0        3  
 2      12       7        59  
  • Filtered OTU matrix: Same OTU matrix as the input without the lines corresponding to the OTU under the threshold. Here is the previous example with a threshold of 5
OTU  sample_1 sample_2 sample_3  
 0      0       456      124  
 2      12       7        59  
  • Representative sequences: A fasta file containing one sequence per OTU. If the headers contain the cluster annotation ;cluster=2;, the software will take it in account. If the annotation is not present, the fist sequence will be considered as the representative sequence of the OTU 0, the second OTU 1, ... This input can remain empty.
  • Filtered representative sequences: The input fasta file without the representative sequences of the OTU filtered in the matrix. The cluster annotation will be outputted.
  • Clusterized reads: A fasta file containing all the reads annotated with their cluster assignment (ie: with the annotation ;cluster=124; in the headers). This input can remain empty.
  • Filtered clusterized reads: The same fasta as previously without the reads from the clusters filtered

References

Clone this wiki locally