Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ascii' codec #252

Closed
yl030 opened this issue Jul 17, 2024 · 6 comments
Closed

'ascii' codec #252

yl030 opened this issue Jul 17, 2024 · 6 comments
Labels

Comments

@yl030
Copy link

yl030 commented Jul 17, 2024

Hello ppanggolin team,
I installed ppanggolin through manual and when I run % ppanggolin all --anno genomes.gff.list

100%|██████████| 2675/2675 [17:46<00:00, 2.51file/s]

0%| | 0/2675 [00:00<?, ?genome/s]
0%| | 0/2675 [00:00<?, ?genome/s]
Traceback (most recent call last):
File "/gss1/App_os7/miniconda3/envs/ppanggolin/bin/ppanggolin", line 10, in
sys.exit(main())
^^^^^^
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/main.py", line 222, in main
ppanggolin.workflow.all.launch(args)
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/workflow/all.py", line 295, in launch
launch_workflow(args, panrgp=True, panmodule=True)
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/workflow/all.py", line 62, in launch_workflow
write_pangenome(pangenome, filename, args.force, disable_bar=args.disable_prog_bar)
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/formats/writeBinaries.py", line 709, in write_pangenome
write_annotations(pangenome, h5f, disable_bar=disable_bar)
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/formats/writeAnnotations.py", line 390, in write_annotations
write_organisms(pangenome, h5f, annotation, desc, disable_bar)
File "/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/ppanggolin/formats/writeAnnotations.py", line 74, in write_organisms
organism_row["name"] = org.name
~~~~~~~~~~~~^^^^^^^^
File "tables/tableextension.pyx", line 1673, in tables.tableextension.Row.setitem
UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 0: ordinal not in range(128)
/gss1/App_os7/miniconda3/envs/ppanggolin/lib/python3.12/site-packages/tables/file.py:113: UnclosedFileWarning:

Closing remaining open file: ppanggolin_output_DATE2024-07-16_HOUR08.10.22_PID12964/pangenome.h5

@jpjarnoux
Copy link
Member

Hi !

Could I know the source of the annotation files ?
It looks like there is an unknown field in your GFF.
Related issue that could help #222 #232 #233

Also if none of that help you. Could you share your files or a sample that reproduce the error ?

Regards

@CedricMidoux
Copy link

I had the same problem with \u2019 ( in /product="ATPase/5’-3’ helicase helicase subunit RecD)

@Tsingsjeen
Copy link

Does this issue solved? I also tried with output files, .gff3 and gbff from bakta, neither of them worked and same report errors, "UnicodeEncodeError: 'ascii' codec...". I then back to .fna files, it works, but the annotation became less clear.

@Tsingsjeen
Copy link

Sorry for the spam, now worked following the solution above!

@JeanMainguy
Copy link
Member

Hello,

Great to hear.

We have addressed this issue in PR #291 which will be in the next release.
So this problem should no longer occur.

@JeanMainguy
Copy link
Member

With version 2.2.0, non-ASCII characters are now filtered out, which should prevent the error reported above.

Please don’t hesitate to reopen this issue or create a new one if the error reappears.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants