Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
maxgmarin authored Mar 25, 2024
1 parent 41b6de1 commit db2dce6
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ A **pan**-genome **q**uality **c**ontrol toolkit for evaluating nucleotide redun
## Motivation
![PanQC_NRC_Diagram](https://github.com/maxgmarin/panqc/raw/main/Images/PanQC_NRC_Diagram.png)

The Nucleotide Redundancy Correction (NRC) pipeline adjusts for redundancy at the DNA level in two steps (Methods). In step one, all genes predicted to be absent at the Amino Acid (AA) level are compared to their corresponding assembly at the nucleotide level. In cases where the nucleotide sequence is found with high coverage and sequence identity (Query Coverage & Sequence Identity > 90%), the gene is marked as “present at the DNA level”. Next, all genes are clustered and merged using a k-mer based metric of nucleotide similarity. Cases where two or more genes are divergent at the AA level but highly similar at the nucleotide level will be merged into a single “nucleotide similarity gene cluster”. After applying this method the pan-genome gene presence matrix is readjusted according to these results.
The panqc Nucleotide Redundancy Correction (NRC) pipeline adjusts for redundancy at the DNA level within pan-genome estimates in two steps. In step one, all genes predicted to be absent at the Amino Acid (AA) level are compared to their corresponding assembly at the nucleotide level. In cases where the nucleotide sequence is found with high coverage and sequence identity (Query Coverage & Sequence Identity > 90%), the gene is marked as “present at the DNA level”. Next, all genes are clustered and merged using a k-mer based metric of nucleotide similarity. Cases where two or more genes are divergent at the AA level but highly similar at the nucleotide level will be merged into a single “nucleotide similarity gene cluster”. After applying this method the pan-genome gene presence matrix is readjusted according to these results.

<!---
**When to use this software**:
Expand Down Expand Up @@ -95,14 +95,14 @@ NOTE: Make sure that your current working directory (CWD) is `tests/data` within
## Full usage

`panqc` has 2 sub-commands:
- `nrc` - Run the full **N**ucleotide **R**edundancy **C**orrection pipeline on a pan-genome analyses.
- `utils` - Run utlity scripts and sub-pipelines of the full NRC pipeline
- `nrc` - Run the full panqc **N**ucleotide **R**edundancy **C**orrection pipeline on a pan-genome analyses.
- `utils` - Run utlity scripts and sub-pipelines of the full panqc NRC pipeline

---

### `panqc nrc`

Run the complete Nucleotide Redundancy Correction pipeline
Run the complete panqc Nucleotide Redundancy Correction (NRC) pipeline

```
$ panqc nrc --help
Expand Down Expand Up @@ -143,7 +143,7 @@ optional arguments:

### `panqc utils`

Within `utils` there are 3 sub-commands that run specific components of the NRC pipeline:
Within `utils` there are 3 sub-commands that run specific components of the panqc NRC pipeline:
- `utils asmseqcheck` - Perform alignment of all genes classified as absent to their respective assemblies.
- `utils ava` - Perform all vs all comparison of k-mer profiles of input sequences.
- `utils nscluster` - Perform nucleotide similarity clustering and readjust pan-genome estimates.
Expand All @@ -165,14 +165,18 @@ optional arguments:
```

>🚧 Check back soon for full usage for each of the utility sub-pipelines of the NRC pipeline 🚧
>🚧 Check back soon for full usage for each of the utility sub-pipelines of the panqc toolkit 🚧

## Contributing and Issues
>🚧 Check back soon 🚧


## Citing
>🚧 Check back soon 🚧

<!---
If you use `panqc` in your work, please cite:
> TBD
Expand Down

0 comments on commit db2dce6

Please sign in to comment.