You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reading about confusable characters in CE and it warns "Note: This list is not guaranteed complete! Use it as a guide only. The Unicode character set will change at times in the future, so it's on you to keep up."
I think this is a noble goal, but I'm just not sure how it's going to work, especially as specified. Where is the list? I wasn't aware of a Unicode property called "confusable". Is there an ICU function to return the current set? (Sort of: they call it "spoof detection".)
So I googled and the first hit was this page, which says that (for example) letter O and digit 0 are confusable. Does that mean I can't use "O" or "0" in a CTE key without escaping it?
I dug around in the CE source code for a while (I'm not a Go programmer) and found this. That seems to match what's in the CE documentation, but it's very different from UTS#39.
I agree in principle that allowing "spoofing"/"confusables" can be problematic for human-readable text formats, but the rules I see here are so vague I've been researching this for an hour and I still can't tell if "O" is a valid string key or not.
The text was updated successfully, but these errors were encountered:
The issue I'm trying to solve is problems where otherwise valid symbols make it difficult for humans to see what's going on:
c1
{
something = [abc]
}
Is something pointing to a list or a string? Because [ != [, it's a string. And since it doesn't contain any reserved symbol characters, it can be printed without quotes, leading to the above situation where a computer understands what it is, but a human doesn't.
This leads to the unfortunate problem of locking down a continually changing spec (Unicode) and deciding which characters are perceptually too close to characters that alter the structure of the document... I suppose maybe I'll have to go on an exhaustive search of all possible problematic characters, which will complicate all encoders, but the alternative is to allow confusing documents.
I'm reading about confusable characters in CE and it warns "Note: This list is not guaranteed complete! Use it as a guide only. The Unicode character set will change at times in the future, so it's on you to keep up."
I think this is a noble goal, but I'm just not sure how it's going to work, especially as specified. Where is the list? I wasn't aware of a Unicode property called "confusable". Is there an ICU function to return the current set? (Sort of: they call it "spoof detection".)
So I googled and the first hit was this page, which says that (for example) letter O and digit 0 are confusable. Does that mean I can't use "O" or "0" in a CTE key without escaping it?
I dug around in the CE source code for a while (I'm not a Go programmer) and found this. That seems to match what's in the CE documentation, but it's very different from UTS#39.
I agree in principle that allowing "spoofing"/"confusables" can be problematic for human-readable text formats, but the rules I see here are so vague I've been researching this for an hour and I still can't tell if "O" is a valid string key or not.
The text was updated successfully, but these errors were encountered: