Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error annotating pangenome #285

Closed
seajane opened this issue Sep 18, 2024 · 6 comments
Closed

Error annotating pangenome #285

seajane opened this issue Sep 18, 2024 · 6 comments
Labels

Comments

@seajane
Copy link

seajane commented Sep 18, 2024

Hello!

I am getting an error trying to write the pangenome. It is odd because I have two different datasets and it is only one that throws this error. I hesitated writing this issue as it may be due to some configuration of my data, but have thus far been unable to figure what could be different and what could be causing the problem.

I have ppanggolin 2.1.1 installed in the computing cluster installed from source. I am trying to annotate using the --anno option:

ppanggolin annotate --anno ../2024.09.16__run1test2_.txt -o ../ppgg_run1

It starts fine and runs up until writing the genome metadata.

Screenshot 2024-09-18 at 12 57 54 PM

And then throws this error:


  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/main.py", line 221, in main
    ppanggolin.workflow.all.launch(args)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/workflow/all.py", line 294, in launch
    launch_workflow(args, panrgp=True, panmodule=True)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/workflow/all.py", line 61, in launch_workflow
    write_pangenome(pangenome, filename, args.force, disable_bar=args.disable_prog_bar)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/formats/writeBinaries.py", line 756, in write_pangenome
    write_metadata(pangenome, h5f, disable_bar)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/formats/writeMetadata.py", line 315, in write_metadata
    write_metadata_metatype(h5f, pangenome.status["metasources"]["genomes"][-1],
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/formats/writeMetadata.py", line 238, in write_metadata_metatype
    source_table = h5f.create_table(metatype_group, source, desc_metadata(*meta_len[:-1]), expectedrows=meta_len[-1])
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/formats/writeMetadata.py", line 87, in desc_metadata
    desc_dict = {attr: tables.StringCol(itemsize=max_value) for attr, max_value in max_len_dict.items()}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/ppanggolin/formats/writeMetadata.py", line 87, in <dictcomp>
    desc_dict = {attr: tables.StringCol(itemsize=max_value) for attr, max_value in max_len_dict.items()}
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/tables/description.py", line 199, in __init__
    atombase.__init__(self, *args, **kwargs)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/tables/atom.py", line 623, in __init__
    Atom.__init__(self, 'S%d' % itemsize, shape, dflt)
  File "/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/tables/atom.py", line 505, in __init__
    self.dtype = dtype = np.dtype((nptype, npshape))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid itemsize in generic type tuple
/rsrch5/home/genomic_med/hkbouzek/.local/lib/python3.11/site-packages/tables/file.py:114: UnclosedFileWarning:

Closing remaining open file: ../ppgg_run1_n/pangenome.h5`

Thank you in advance.

@JeanMainguy
Copy link
Member

Hello,

It seems like the error you're encountering might be related to how we're handling metadata on our end. Starting from version v2.1.0 (PR #227), some information about contigs/genomes is extracted from annotation files and included as metadata in the pangenome file.

In your case, it looks like some of the extracted metadata isn't being written correctly to the pangenome file, which is causing the error. The intended behavior is that if any metadata can't be written, it should be ignored, and a log is printed instead.

Would you be able to share the annotation files you used with us, assuming they're not sensitive? It would help us investigate further.

Thanks for bringing this to our attention!

@seajane
Copy link
Author

seajane commented Sep 23, 2024

All of the files are publically released so was collecting the information to send to you, but thought I might give ppanggolin a try with the public genbank files as they are all PGAP annotated and the ones we generated were annotated by RAST. The PGAP annotation files are working fine, the RAST annotation ones are failing. The annotation is just slightly different enough to be causing the error. I think this also might be the root of the issue I wrote with the translation table default (#226). The single file that was failing the entire process at that time was RAST annotated.

I also tried with fasta files and those work well, but I do need the genes annotated with product names. As an aside, is there gene product annotation that occurs with this feature, the result doesn't have this and I don't know if I am missing something.

@JeanMainguy
Copy link
Member

When using ppanggolin with FASTA files, it won't give you any gene product annotations. It just runs prodigal to call the genes, and that's it. So if you require gene product annotation you would have to use annotation files as input.

Would you mind sharing a RAST-annotated genome that’s causing the issue? Just one problematic genome file that causes the error would be enough and very helpful for us to debug. It's probably something specific in the RAST output that we haven't considered for yet.

@seajane
Copy link
Author

seajane commented Sep 24, 2024

Archive.zip
Here are five files that were annotated by RAST

@JeanMainguy
Copy link
Member

JeanMainguy commented Sep 25, 2024

Thank you so much for providing the files! They were very helpful.

The issue arises from the genome_md5 tag in the source feature of the GBK file, which is empty and causing problems downstream. I’ve addressed this in PR #287 on the branch fix_genome_metadata_handeling, and this fix will be included in the next release.

In the meantime, if you need to run PPanGGOLiN with these types of files, you can use the fix_genome_metadata_handeling branch. Just follow the instructions here and replace dev with fix_genome_metadata_handeling:

git clone --branch fix_genome_metadata_handeling https://github.com/labgem/PPanGGOLiN.git
cd PPanGGOLiN
pip install . 

@JeanMainguy
Copy link
Member

The fix has been release in version 2.2.0 .
Please don’t hesitate to reopen this issue or create a new one if the error reappears.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants