Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report invalid data URL errors. #562

Open
BigBlueHat opened this issue Jan 30, 2025 · 1 comment
Open

Report invalid data URL errors. #562

BigBlueHat opened this issue Jan 30, 2025 · 1 comment

Comments

@BigBlueHat
Copy link
Contributor

Data URLs are frequently used in JSON payloads to store images or other binary (or non-JSON friendly text) data stored as a string.

The work well stored in JSON-LD terms defined with "@type": "@id" in the context (or less meaningfully as raw string values).

However, the strings are often extremely long and it can be hard to detect error within them such as spaces which may be introduced if the URL is constructed incorrectly or at some point not URI encoded properly (ex: + getting turned into ).

An invalid data URL will currently be treated as a relative URL by the parser:

{
  "image": " arst829235"
}

The space character in the above (very fake) data URL makes the URL invalid. The parser will therefore drop the term or when parsed in "safe mode" it will throw a "Relative object reference found." error.

It may be useful (at least as an option) to output warnings or errors when invalid data URLs are detected in a document.

@davidlehn
Copy link
Member

  • Garbage in, garbage out.
  • The checks in toRdf.js use isAbsolute from url.js. It's a very basic check. I assume correct enough, but could use some eyes.
  • "relative {graph,subject,predicate,object} reference" is confusing when it's different types of garbage input. I'm not sure how to best determine what kind of error it is. Some specific checks for, say, whitespace in URLs might help and are cheap. What other heuristic checks would help?
  • Those error names are unofficial "safe mode" ones and could be updated or changed as needed. I think at the time the idea was that if it's not an absolute URL, it's relative. But that's not really true since it could be a bad URL/IRI. If there's a good way to differentiate and have more correct errors, that would be a nice improvement.
  • Any URL with a space has the same issue, so this should likely be a general issue vs specific "data:" handling.
{
  "image": "https://example.com/foo bar"
}
  • A related general debugging improvement that would help for many issues is optional path tracking to more easily narrow down which JSON is causing the errors. Things like that would help people at least know which data exactly is causing an issue. (This has been long planned and I started work but never finished.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants