Error leads to early exit/failed detection in MBCSGroupProbers #162

Orden4 · 2023-09-29T15:49:38Z

Input file for reference:
Rocio.txt

UTF-unknown was unable to detect the correct encoding (Shift JIS), while uchardet did correctly identify it.
This took a while to figure out, but eventually I discovered that it was because of a few lines like these (line 98): victory3 ="Hay que salvar al mundo, ｿte uniras a nosotras?".
Mugen character files are a thing of nightmare. I assume that this is a character made by someone Spanish/Brazilian, then edited by someone Japanese.

After some investigation I indeed found that these probers are practically identical to uchardet, but there is a discrepency that caused the results to deviate. Namely, UTF-unknown exits early when it encounters an error, while uchardet simply continues. As far as I could tell, uchardet never exits as a result of a state machine error, in any prober at all. And indeed, upon removing the early exits, I got a correct detection as well.

The text was updated successfully, but these errors were encountered:

304NotModified · 2023-09-29T18:04:18Z

Namely, UTF-unknown exits early when it encounters an error

What is the call your are executing?

Orden4 · 2023-09-29T18:58:42Z

Apologies, I got a bit distracted during writing and didn't really explain properly. I don't mean the actual application exits early, I meant that it exits its probing early, namely due to this:

if (codingState == StateMachineModel.ERROR)
{
	state = ProbingState.NotMe;
	break;
}

which is located in the HandleData loop of every probe within the MBCS group. This condition to exit the HandleData loop early isn't present in uchardet and is the reason why UTF-unknown fails to find any encoding.

As for the exact call, this occurs with any parsing method, although I debugged it via the DetectFromBytes method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error leads to early exit/failed detection in MBCSGroupProbers #162

Error leads to early exit/failed detection in MBCSGroupProbers #162

Orden4 commented Sep 29, 2023 •

edited

Loading

304NotModified commented Sep 29, 2023

Orden4 commented Sep 29, 2023 •

edited

Loading

Error leads to early exit/failed detection in MBCSGroupProbers #162

Error leads to early exit/failed detection in MBCSGroupProbers #162

Comments

Orden4 commented Sep 29, 2023 • edited Loading

304NotModified commented Sep 29, 2023

Orden4 commented Sep 29, 2023 • edited Loading

Orden4 commented Sep 29, 2023 •

edited

Loading

Orden4 commented Sep 29, 2023 •

edited

Loading