From de63e72b9b90183072d25ac99077fc66b907696c Mon Sep 17 00:00:00 2001 From: JeanMainguy Date: Thu, 9 Nov 2023 17:50:16 +0100 Subject: [PATCH] add specific doc for gff, tables and proksee output --- docs/user/Flat/gff.md | 48 +++++++++++++++++++++++++++++++++++++++ docs/user/Flat/proksee.md | 31 +++++++++++++++++++++++++ docs/user/Flat/tables.md | 21 +++++++++++++++++ 3 files changed, 100 insertions(+) create mode 100644 docs/user/Flat/gff.md create mode 100644 docs/user/Flat/proksee.md create mode 100644 docs/user/Flat/tables.md diff --git a/docs/user/Flat/gff.md b/docs/user/Flat/gff.md new file mode 100644 index 00000000..ac3266f1 --- /dev/null +++ b/docs/user/Flat/gff.md @@ -0,0 +1,48 @@ + +The `--gff` argument generates GFF files, each containing pangenome annotations for individual genomes within the pangenome. The GFF file format is a widely recognized standard in bioinformatics and can seamlessly integrate into downstream analysis tools. + +To generate GFF files from a pangenome HDF5 file, you can use the following command: + +```bash +ppanggolin write_genomes -p pangenome.h5 --gff -o output +``` + +This command will create a gff directory within the output directory, with one GFF file per genome. + +Pangenome annotations within the GFF are recorded in the attribute column of the file. + +For CDS features, pangenome annotations are recorded in the attribute column of the file: + +CDS features have the following attributes: + +- **family:** ID of the gene family to which the gene belongs. +- **partition:** The partition of the gene family, categorized as persistent, shell, or cloud. +- **module:** If the gene family belongs to a module, the module ID is specified with the key 'module.' +- **rgp:** If the gene is part of a Region of Genomic Plasticity (RGP), the RGP name is specified with the key 'rgp.' + +For Regions of Genomic Plasticity (RGPs), RGPs are specified under the feature type 'region.' + +RGPs have the following attributes: + +- The attribute 'spot' designates the spot ID where the RGP is inserted. When the RGP has no spot, the term 'No_spot' is used. +- The 'Note' attribute specifies that this feature is an RGP. + + +Here is an example showcasing the initial lines of the GFF file for the Acinetobacter baumannii AYE genomes: + +```gff +##gff-version 3 +##sequence-region NC_010401.1 1 5644 +##sequence-region NC_010402.1 1 9661 +##sequence-region NC_010403.1 1 2726 +##sequence-region NC_010404.1 1 94413 +##sequence-region NC_010410.1 1 3936291 +NC_010401.1 . region 1 5644 . + . ID=NC_010401.1;Is_circular=true +NC_010401.1 ppanggolin region 629 5591 . . . Name=NC_010401.1_RGP_0;spot=No_spot;Note=Region of Genomic Plasticity (RGP) +NC_010401.1 external gene 629 1579 . + . ID=gene-ABAYE_RS00005 +NC_010401.1 external CDS 629 1579 . + 0 ID=ABAYE_RS00005;Parent=gene-ABAYE_RS00005;product=replication initiation protein;family=ABAYE_RS00005;partition=cloud;rgp=NC_010401.1_RGP_0 +NC_010401.1 external gene 1576 1863 . + . ID=gene-ABAYE_RS00010 +NC_010401.1 external CDS 1576 1863 . + 0 ID=ABAYE_RS00010;Parent=gene-ABAYE_RS00010;product=hypothetical protein;family=ABAYE_RS00010;partition=cloud;rgp=NC_010401.1_RGP_0 +NC_010401.1 external gene 2054 2572 . - . ID=gene-ABAYE_RS00015 +NC_010401.1 external CDS 2054 2572 . - 0 ID=ABAYE_RS00015;Parent=gene-ABAYE_RS00015;product=tetratricopeptide repeat protein;family=HTZ92_RS18670;partition=shell;rgp=NC_010401.1_RGP_0 +``` \ No newline at end of file diff --git a/docs/user/Flat/proksee.md b/docs/user/Flat/proksee.md new file mode 100644 index 00000000..f839bc4e --- /dev/null +++ b/docs/user/Flat/proksee.md @@ -0,0 +1,31 @@ +The `--proksee` argument generates JSON map files containing pangenome annotations, which can be visualized using Proksee at [https://proksee.ca/](https://proksee.ca/). + +To generate JSON map files, you can use the following command: + +```bash +ppanggolin write_genomes -p pangenome.h5 --proksee -o output +``` + +This command will create a proksee directory within the output directory, with one JSON file per genome. + + +To load a JSON map file on Proksee, follow these steps: +1. Navigate to the "Map JSON" tab. +2. Upload your file using the browse button. +3. Click the "Create Map" button to generate the visualization. + +A genome visualized by Proksee with PPanGGOLiN annotation appears as depicted below: + + +```{image} ../_static/proksee_exemple_A_baumannii_AYE.png +:align: center +``` + +*Image: Genome visualized by Proksee with PPanGGOLiN annotation.* + + +The visualization consists of three tracks: +- **Genes:** Color-coded by their gene family partition. +- **RGP (Region of Genomic Plasticity):** Spot associated to the RGPs are specified in the annotation of the object. +- **Module:** Displaying modules within the genome. The completion of the module is specified in the annotation of the object. + diff --git a/docs/user/Flat/tables.md b/docs/user/Flat/tables.md new file mode 100644 index 00000000..204d23aa --- /dev/null +++ b/docs/user/Flat/tables.md @@ -0,0 +1,21 @@ +This option writes in a 'tables' directory. There will be a file written in the .tsv file format for every single genome in the pangenome. +The columns of this file are described in the following table : + +| Column | Description | +|----------------------|--------------------------------------------------------------------------------------------------------------------------------| +| gene | the unique identifier of the gene | +| contig | the contig that the gene is on | +| start | the start position of the gene | +| stop | the stop position of the gene | +| strand | The strand that the gene is on | +| ori | Will be T if the gene name is dnaA | +| family | the family identifier to which the gene belongs to | +| nb_copy_in_org | The number of copy of the family in the organism (basically, if 1, the gene has no closely related paralog in that organism) | +| partition | the partition to which the gene family of the gene belongs to | +| persistent_neighbors | The number of neighbors classified as 'persistent' in the pangenome graph | +| shell_neighbors | The number of neighbors classified as 'shell' in the pangenome graph | +| cloud_neighbors | The number of neighbors classidied as 'cloud' in the pangenome graph | + +Those files can be generated as such : + +`ppanggolin write_genomes -p pangenome.h5 --tables` \ No newline at end of file