Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate gene #205

Merged
merged 28 commits into from
Jun 10, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
31e50f5
Add option to write proteins sequences from genes in pangenome
jpjarnoux Mar 28, 2024
e54565c
Split function to add check function
jpjarnoux Mar 28, 2024
035330e
Little refactoring
jpjarnoux Mar 28, 2024
788cd5c
Add translate arguments as kwargs
jpjarnoux Mar 28, 2024
9cb7b31
Replace cpu by threads to be more accurate
jpjarnoux Mar 28, 2024
12186a5
Write test and documentation for the new option
jpjarnoux Mar 28, 2024
7febe1b
Fix call new option for GitHub action
jpjarnoux Mar 29, 2024
ee283cb
Fix call new option for GitHub action
jpjarnoux Mar 29, 2024
922073d
Refactoring
jpjarnoux Apr 10, 2024
9aaac23
Make gene translation more flexible by cutting off the end of genes t…
jpjarnoux Apr 10, 2024
8c83f9c
Merge branch 'dev' into TranslateGene
jpjarnoux May 27, 2024
8129bfa
Fix unset variable
jpjarnoux May 27, 2024
dde184c
Merge branch 'dev' into TranslateGene
jpjarnoux Jun 3, 2024
02b201a
Replace translate_with_mmseqs by translate gene
jpjarnoux Jun 4, 2024
4cb53d1
Add a soft link in mmseqs createdb
jpjarnoux Jun 4, 2024
326403f
Replace add_spot_str function by Spot class method __str__
jpjarnoux Jun 4, 2024
7e09b19
Fix bug in projection and refactoring
jpjarnoux Jun 5, 2024
e09e5f7
Improve documentation on write fasta
jpjarnoux Jun 6, 2024
9c8fb76
Fix createdb mode
jpjarnoux Jun 7, 2024
56e4318
Improve documentation on write fasta
jpjarnoux Jun 7, 2024
13bd4e0
Add test unit for align
jpjarnoux Jun 7, 2024
b5b0e4d
Use context manager to write sequences
jpjarnoux Jun 7, 2024
1516f12
Improve GitHub workflow
jpjarnoux Jun 7, 2024
767092f
Merge branch 'dev' into TranslateGene
jpjarnoux Jun 7, 2024
e909a65
simplify unit test
JeanMainguy Jun 10, 2024
0e69f70
Solve requested change
jpjarnoux Jun 10, 2024
93a4b01
Make alignment with MMSeqs2 more flexible.
jpjarnoux Jun 10, 2024
f30ab34
clean old code
jpjarnoux Jun 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ jobs:
ppanggolin rarefaction --output stepbystep -f -p stepbystep/pangenome.h5 --depth 5 --min 1 --max 50 -ms 10 -fd -ck 30 -K 3 --soft_core 0.9 -se $RANDOM
ppanggolin draw -p stepbystep/pangenome.h5 --tile_plot --nocloud --soft_core 0.92 --ucurve --output stepbystep -f
ppanggolin rgp -p stepbystep/pangenome.h5 --persistent_penalty 2 --variable_gain 1 --min_score 3 --dup_margin 0.05
ppanggolin spot -p stepbystep/pangenome.h5 --spot_graph --overlapping_match 2 --set_size 3 --exact_match_size 1
ppanggolin spot -p stepbystep/pangenome.h5 --output stepbystep --spot_graph --overlapping_match 2 --set_size 3 --exact_match_size 1 -f
ppanggolin draw -p stepbystep/pangenome.h5 --draw_spots -o stepbystep -f
ppanggolin module -p stepbystep/pangenome.h5 --transitive 4 --size 3 --jaccard 0.86 --dup_margin 0.05
ppanggolin write_pangenome -p stepbystep/pangenome.h5 --output stepbystep -f --soft_core 0.9 --dup_margin 0.06 --gexf --light_gexf --csv --Rtab --stats --partitions --compress --json --spots --regions --borders --families_tsv --cpu 1
Expand All @@ -100,6 +100,7 @@ jobs:
ppanggolin fasta -p stepbystep/pangenome.h5 --output stepbystep -f --prot_families module_0
ppanggolin fasta -p stepbystep/pangenome.h5 --output stepbystep -f --prot_families core
ppanggolin fasta -p stepbystep/pangenome.h5 --output stepbystep -f --gene_families module_0 --genes module_0
ppanggolin fasta -p stepbystep/pangenome.h5 --output stepbystep -f --proteins cloud --cpu $NUM_CPUS --keep_tmp

ppanggolin draw -p stepbystep/pangenome.h5 --draw_spots --spots all -o stepbystep -f
ppanggolin metrics -p stepbystep/pangenome.h5 --genome_fluidity --no_print_info --recompute_metrics --log metrics.log
Expand Down
61 changes: 51 additions & 10 deletions docs/user/writeFasta.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@ When using the `softcore` filter, the `--soft_core` option can be used to modify

## Genes

This option can be used to write the nucleotide CDS sequences. It can be used as such, to write all of the genes of the pangenome for example:
### Nucleotide sequences

With the `--genes partition` option PPanGGOLiN will write the nucleotide CDS sequences for the given partition.
It can be used as such, to write all the genes of the pangenome for example:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes all
Expand All @@ -30,34 +33,72 @@ Or to write only the persistent genes:
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes persistent
```

### Protein sequences

With the `--proteins partition` option PPanGGOLiN will write the nucleotide CDS sequences for the given partition.
It can be used as such, to write all the genes of the pangenome for example:

## Protein families
```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES --proteins all
```

This option can be used to write the protein sequences of the representative sequences for each family. It can be used as such for all families:
Or to write only the cloud genes:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES --genes_prot cloud
```

To translate the gene sequences, PPanGGOLiN uses the [MMSeqs2](https://github.com/soedinglab/MMseqs2) `translatenucs` command.
So for this option you can specify multiple threads with `--cpu`.
You can also specify the translation table to use with `--translate_table`.
Finally, you can keep the temporary directory -that you can specify with `--tmpdir`- with the [MMSeqs2](https://github.com/soedinglab/MMseqs2) database using the `--keep_tmp` option.
jpjarnoux marked this conversation as resolved.
Show resolved Hide resolved

## Gene families

### Protein sequences

With the `--prot_families partition` option PPanGGOLiN will write the protein sequences of the representative gene for each family for the given partition.
It can be used as such for all families:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_PROT --prot_families all
```

or for all of the shell families for example:
Or for all the shell families for example:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_PROT --prot_families shell
```

### Nucleotide sequences

## Gene families

This option can be used to write the gene sequences of the representative sequences for each family. It can be used as such:
With the `--gene_families partition` option PPanGGOLiN will write the nucleotide sequences of the representative gene for each family for the given partition.
It can be used as such for all families:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES_FAMILIES --gene_families all
```

or for the cloud families for example:
Or for the core families for example:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES_FAMILIES --gene_families core
```


## Modules
All the precedent command admit a module as partition.

So you can write the protein sequences for the family in module_X as such:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --prot_families module_X
```

Or the nucleotide sequence of all genes in module_X:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_GENES_FAMILIES --gene_families cloud
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --genes module_X
```

## Regions
Expand All @@ -73,4 +114,4 @@ It can be used as such:

```bash
ppanggolin fasta -p pangenome.h5 --output MY_REGIONS --regions all --fasta genomes.fasta.list
```
```
Loading
Loading