-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix linear congruential random number generator #4357
Conversation
But what is the reason of incorrect work? If we check this table we see our parameters from original code Here we can find that Maybe my port was not presicely correct in terms of that For example, |
If new random was more correct, then old code has issues - in that place with |
@stweil as i mentioned in those various tickets - this does remove the assertion failures and segfaults. I will point out, in most of these cases, compared to the original for a working variation on one of the bad images, and compared to my alternate assertion fix, the outputs are a bit different. I assume we can attribute that to the difference in random data? |
Fixes: tesseract-ocr#4146, tesseract-ocr#4148, tesseract-ocr#4270 Fixes: 2252936 ("Use linear congruential random number generator [...]") Signed-off-by: Stefan Weil <[email protected]>
That's a good hint, thank you! I updated my pull request as suggested. |
@stweil @egorpugin Coincidentally I tried that change locally as well. It also does not crash anymore, but the output is changed again a bit. Perhaps related to other things which were going on in this class at the time? |
@stweil |
Also I'll put my analysis here. Tried to reproduce this example on windows - no success (everything works). Is it specific STL issue? libstdc++ only? libc++? Analysis We have no issues with previous random implemenation. So, cases are:
What case do we have? |
Yes, the updated parameters are sufficient to fix the assertion and the heap issue. |
Good, thank you. I'm not well aware of tesseract testing, but I think it is worth to add some of those assert triggering images into the test suite. |
They fix these issues, but cause failures of recodebeam_test and intsimdmatrix_test. |
The problem is that 2,3 and 4 args to template become like this
|
That does not fix the unittests. I hope I can fix it tomorrow. |
It seems we cannot express So the old code did exactly it plus converting a result into 32 bit int (taking first 31 bits).
But when I change my code to this snippet, the test does not work either. |
It is even more clear now. Our tests depends on random generator (default seed). Output is always different - unreliable. Tesseract execution depends on random generator. Is random even seeded there? |
I restored the old random generator (and the related unittest code) now. This fixes the unittests and the reported issues. |
No, it does not fix the problem. Which is the problem around those lists, asserts and delete. |
I don't understand that. Do you have a test case which still triggers the assertion? |
Code behavior depends on random generator. That means that the problem is not in rng, but in that code itself. You can try to change initial seed ( |
The initial seed 1 is not used, because the recognition sets a new seed in |
The only place during recognition is https://github.com/tesseract-ocr/tesseract/blob/main/src/lstm/lstmrecognizer.h#L290 |
I cannot even call it 'random'. Not a fair one. I can understand if they are time based so they change each run, but not like we have it now. |
It's good that the series is not random, but pseudorandom, because otherwise some results would also be random (which is not desired). And I think that I now know why commit 32fee19 still produced buggy code: the implementation of |
I see. Indeed, |
It is not even pseudorandom. It is just a fixed series. Always the same. And it hides other code issues. |
@stweil @egorpugin new data point, new segv after these changes: #4146 (comment) Overall the stability is improved, but fwiw the image in that comment worked ok in 5.5.0 tag. |
@marcreichman-pfi, I don't see such a regression, only a new error which you reported ("ELIST2_ITERATOR::add_before_stay_put:Error:Attempting to add an element with non nullptr links, to a list"). |
This reverts commit 2252936.
Fixes: #4146, #4148, #4270