-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InferenceSession - Catastrophic Error or Unspecified Error is thrown #22815
Comments
Bit more context: I am the developer of Amuse.ai, our app has been out for about a year running DirectML inference without issue a few months back we upgraded from 1.18.1 to 1.19.0, then we started getting a few error reports of "Catastrophic Error" when the user tried to load a model However it is now 10-20 reports a day, so its somehow getting worse? windows update? After upgrading to 1.20.0 we now also get this new error, actually hoping its the root cause of |
2024-11-13 09:19:13.0746439 [E:onnxruntime:, inference_session.cc:2118 onnxruntime::InferenceSession::Initialize::<lambda_a18664140bfa1274480334618139aa6c>::operator ()] Exception during initialization: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime.DLL!00007FFA288EE903: (caller: 00007FFA2886E449) Exception(1) tid(87c) 8000FFFF Catastrophic failure 2024-11-13 09:19:13.9065553 [E:onnxruntime:, inference_session.cc:2118 onnxruntime::InferenceSession::Initialize::<lambda_a18664140bfa1274480334618139aa6c>::operator ()] Exception during initialization: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime.DLL!00007FFA288EE903: (caller: 00007FFA2886E449) Exception(2) tid(1a78) 80004005 Unspecified error |
AFAIK "com.microsoft.extensions" is used by onnxruntime-extensions. The extensions have to be manually registered by calling SessionOptions.RegisterOrtExtensions. If you call that multiple times you'll get an error about the DomainToVersion map. However that seems completely unrelated to any DML issues. |
Ok, then that error is a new one and unrelated to the other 2 Was hoping this new exception was the cause, but just looks like a brand new issue that bricks OnnxRuntime, sigh We are unable to rollback to 1.18.1 as Flux and SD3-Large models do not run on the lower opset |
Seems to be system dependent, some systems do it some don't, we have about 3000 concurrent active users and maybe 4% face this issue I only have 1 Laptop PC that does it, sometimes, no rhyme or reason, same OS, same everything There is not state stored by the app that would affect DirectML initialization, just seems to be a race condition inside the DML EP during initialization |
Some debugging questions:
|
Describe the issue
Version 1.19.0
Sometimes when starting an
InferenceSession
this exception,Catastrophic Error
orUnspecified Error
is thrownNo other sessions will work at all until the application is stopped/started
New Unrelated Issue from Version 1.20.0
[ErrorCode:Fail] Trying to add a domain to DomainToVersion map, but the domain is already exist with version range (1, 1000). domain: "com.microsoft.extensions"
This is new to 1.20.0 happens at random like the other 2 error, however seems to be unrelated per the comments below, I upgraded to 1.20.0 to see if the first 2 error were resolved, but it has not, and has introduced this new one
To reproduce
new InferenceSession("Model.onnx") with a known working model
This is extremely hard to replicate, but we are getting plenty of error reports, in most cases it happens the first time after a system reboot, sometimes it just happens randomly
Urgency
Urgent, live application that has started failing globally
Platform
Windows
OS Version
10 & 11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
C#
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
1.19.0
The text was updated successfully, but these errors were encountered: