Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent encoding errors for data files #54

Open
dalito opened this issue Jan 21, 2025 · 0 comments
Open

Prevent encoding errors for data files #54

dalito opened this issue Jan 21, 2025 · 0 comments
Labels
repo4cat NFDI4Cat Central Data Repository Type: Feature a feature request

Comments

@dalito
Copy link
Member

dalito commented Jan 21, 2025

Some people upload ascii-encoded csv but have unicode or latin-1 characters in the file. These files are then not correctly displayed. Example (deg-symbol in header of right column): https://repository.nfdi4cat.org/file.xhtml?persistentId=hdl:21.11165/4cat/9m3x-9a96/2&version=3.0

Dataverse docs on encoding https://guides.dataverse.org/en/latest/user/tabulardataingest/csv-tsv.html#recognized-data-types-and-formatting
suggests to use utf-8 encoding whenever non-ascii charaters are in the files

This issue is about adding a check for presence of chars > 127 in ascii-encoded files on upload to warn the user about the problem. In principle, the publication of datasets with such errors should be prevented.

@dalito dalito added repo4cat NFDI4Cat Central Data Repository Type: Feature a feature request labels Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo4cat NFDI4Cat Central Data Repository Type: Feature a feature request
Projects
Status: New
Development

No branches or pull requests

1 participant