Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linear congruential random number generator #4357

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

stweil
Copy link
Member

@stweil stweil commented Nov 22, 2024

This reverts commit 2252936.

Fixes: #4146, #4148, #4270

@egorpugin
Copy link
Contributor

But what is the reason of incorrect work?

If we check this table
https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use

we see our parameters from original code
MMIX by Donald Knuth | 264 | 6364136223846793005 | 1442695040888963407

Here we can find that
https://en.cppreference.com/w/cpp/numeric/random
std::minstd_rand is std::linear_congruential_engine<std::uint_fast32_t,                                48271, 0, 2147483647

Maybe my port was not presicely correct in terms of that std::minstd_rand is not exact what it was, I think we should try to make our code use std.

For example, using our_rand = std::linear_congruential_engine<std::uint_fast32_t, 6364136223846793005ULL, 1442695040888963407ULL, UINT64_MAX>

@egorpugin
Copy link
Contributor

If new random was more correct, then old code has issues - in that place with delete after assert.

@marcreichman-pfi
Copy link

@stweil as i mentioned in those various tickets - this does remove the assertion failures and segfaults. I will point out, in most of these cases, compared to the original for a working variation on one of the bad images, and compared to my alternate assertion fix, the outputs are a bit different. I assume we can attribute that to the difference in random data?

Fixes: tesseract-ocr#4146, tesseract-ocr#4148, tesseract-ocr#4270
Fixes: 2252936 ("Use linear congruential random number generator [...]")
Signed-off-by: Stefan Weil <[email protected]>
@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

Maybe my port was not presicely correct in terms of that std::minstd_rand is not exact what it was, I think we should try to make our code use std.

For example, using our_rand = std::linear_congruential_engine<std::uint_fast32_t, 6364136223846793005ULL, 1442695040888963407ULL, UINT64_MAX>

That's a good hint, thank you! I updated my pull request as suggested.

@stweil stweil changed the title Revert "Use linear congruential random number generator from C++11." Fix linear congruential random number generator Nov 22, 2024
@marcreichman-pfi
Copy link

@stweil @egorpugin Coincidentally I tried that change locally as well. It also does not crash anymore, but the output is changed again a bit. Perhaps related to other things which were going on in this class at the time?

@egorpugin
Copy link
Contributor

@stweil
Does it work with updated parameters like with old rand?

@egorpugin
Copy link
Contributor

Also I'll put my analysis here.

Tried to reproduce this example on windows - no success (everything works).
#4148 (comment)

Is it specific STL issue? libstdc++ only? libc++?


Analysis

We have no issues with previous random implemenation.

So, cases are:

  1. Code around assert + delete is incorrect and old random code is incorrect, so the issue stayed hidden.
  2. Code around assert + delete is incorrect and new random is more correct, so it shows the problem around assert.
  3. Code around assert + delete is correct and new random is incorrect - fix rand here.

What case do we have?

@tesseract-ocr tesseract-ocr deleted a comment from Mr586 Nov 22, 2024
@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

Does it work with updated parameters like with old rand?

Yes, the updated parameters are sufficient to fix the assertion and the heap issue.

@egorpugin
Copy link
Contributor

Good, thank you.

I'm not well aware of tesseract testing, but I think it is worth to add some of those assert triggering images into the test suite.
For example, it should be a good test case for my changes with lists.

@stweil stweil merged commit 32fee19 into tesseract-ocr:main Nov 22, 2024
3 checks passed
@stweil stweil deleted the fix4148 branch November 22, 2024 20:46
@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

Yes, the updated parameters are sufficient to fix the assertion and the heap issue.

They fix these issues, but cause failures of recodebeam_test and intsimdmatrix_test. My previous revert seems to work better.

@egorpugin
Copy link
Contributor

The problem is that 2,3 and 4 args to template become std::uint_fast32_t, but they should be kept as 64 bit.
std::linear_congruential_engine<std::uint_fast32_t, 6364136223846793005ULL, 1442695040888963407ULL, UINT64_MAX>

like this

std::linear_congruential_engine<std::uint_fast64_t, 6364136223846793005ULL, 1442695040888963407ULL, UINT64_MAX>

@stweil
Copy link
Member Author

stweil commented Nov 22, 2024

That does not fix the unittests. I hope I can fix it tomorrow.

@egorpugin
Copy link
Contributor

egorpugin commented Nov 22, 2024

It seems we cannot express
MMIX by Donald Knuth | 2^64 | 6364136223846793005 | 1442695040888963407
with std::linear_congruential_engine because of 2^64 modulus (we need 128 bit type for it to pass into template).
But 2^64 modulus is just a division by 1.

So the old code did exactly it plus converting a result into 32 bit int (taking first 31 bits).

    u *= 6364136223846793005ULL;
    u += 1442695040888963407ULL;
    return u >> 33;

But when I change my code to this snippet, the test does not work either.

@egorpugin
Copy link
Contributor

It is even more clear now.

Our tests depends on random generator (default seed). Output is always different - unreliable.

Tesseract execution depends on random generator. Is random even seeded there?
If no, then tesseract code has issue around that assert + delete, it is unrelated to random generator.
Random only showed that there is an issue somewhere there.

@stweil
Copy link
Member Author

stweil commented Nov 23, 2024

I restored the old random generator (and the related unittest code) now. This fixes the unittests and the reported issues.

@egorpugin
Copy link
Contributor

No, it does not fix the problem. Which is the problem around those lists, asserts and delete.
Only hides it behind unseeded rng.

@stweil
Copy link
Member Author

stweil commented Nov 23, 2024

I don't understand that. Do you have a test case which still triggers the assertion?

@egorpugin
Copy link
Contributor

Code behavior depends on random generator.
We observe crashes on one generator and do not see them on other.

That means that the problem is not in rng, but in that code itself.
Old rng just hides the problem where changes in rng uncover it.

You can try to change initial seed (1 currently) to something else and try provided cases with it.

@stweil
Copy link
Member Author

stweil commented Nov 23, 2024

The initial seed 1 is not used, because the recognition sets a new seed in SetRandomSeed (not only once, but many times). And it calls TRand::SignedRand several thousand times.

@egorpugin
Copy link
Contributor

egorpugin commented Nov 23, 2024

The only place during recognition is https://github.com/tesseract-ocr/tesseract/blob/main/src/lstm/lstmrecognizer.h#L290
It sets seed based on iteration id. It means the numbers are the same during each run, they are not random (like based on time).

@egorpugin
Copy link
Contributor

I cannot even call it 'random'. Not a fair one.
It is just a series of same numbers.

I can understand if they are time based so they change each run, but not like we have it now.

@stweil
Copy link
Member Author

stweil commented Nov 23, 2024

It's good that the series is not random, but pseudorandom, because otherwise some results would also be random (which is not desired).

And I think that I now know why commit 32fee19 still produced buggy code: the implementation of IntRand could return negative numbers which violated the specification. Fixing this also fixed the memory issues and the assertion.

@egorpugin
Copy link
Contributor

I see. Indeed, return e() >> 33; part was missing.

@egorpugin
Copy link
Contributor

It's good that the series is not random, but pseudorandom

It is not even pseudorandom. It is just a fixed series. Always the same. And it hides other code issues.

@marcreichman-pfi
Copy link

@stweil @egorpugin new data point, new segv after these changes: #4146 (comment)

Overall the stability is improved, but fwiw the image in that comment worked ok in 5.5.0 tag.

@stweil
Copy link
Member Author

stweil commented Nov 25, 2024

@marcreichman-pfi, I don't see such a regression, only a new error which you reported ("ELIST2_ITERATOR::add_before_stay_put:Error:Attempting to add an element with non nullptr links, to a list").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault with certain language packs on certain images
3 participants