Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"QNN Engine is offline." when using a Snapdragon X/NPU #2962

Closed
barealek opened this issue Jan 10, 2025 · 55 comments
Closed

"QNN Engine is offline." when using a Snapdragon X/NPU #2962

barealek opened this issue Jan 10, 2025 · 55 comments
Assignees
Labels
core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@barealek
Copy link

barealek commented Jan 10, 2025

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

When trying to inference on any QNN model on a Snapdragon X Plus laptop, the issue below occurs.
image

The logs specifies that the required CPU/NPU is not found:

{
  "level": "info",
  "message": "\u001b[36m[QnnNativeEmbedder]\u001b[0m QNN API server is not supported on this platform - no valid CPU/NPU found. {\"validCores\":[\"Snapdragon(R) X Elite\"],\"cores\":[\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\",\"Snapdragon(R) X 10-core X1P64100 @ 3.40 GHz\"]}",
  "service": "backend"
}

image

Starting AnythingLLM and reproducing the error, the full log looks like this:

{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY ENABLED]\u001b[0m Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.","service":"backend"}
{"level":"info","message":"prisma:info Starting a sqlite pool with 21 connections.","service":"backend"}
{"level":"info","message":"prisma:info Started query engine http server on http://127.0.0.1:51049","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"server_boot\",\"distinctId\":\"ea8cb903-cdc7-4dbc-898a-f0c70402eefb\",\"properties\":{\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"Skipping preloading of AnythingLLMOllama - LLM_PROVIDER is qnnengine.","service":"backend"}
{"level":"info","message":"Hot loading of QnnEngine - LLM_PROVIDER is qnnengine with model llama_v3_2_3b_chat_8k.","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 4096","service":"backend"}
{"level":"info","message":"\u001b[36m[CommunicationKey]\u001b[0m RSA key pair generated for signed payloads within AnythingLLM services.","service":"backend"}
{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"[production] AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: false","service":"backend"}
{"level":"info","message":"\u001b[36m[BackgroundWorkerService]\u001b[0m Feature is not enabled and will not be started.","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 4096","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n    at s.checkReady (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:1604)\n    at async s.streamGetChatCompletion (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:2741)\n    at async SL (C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:2892)\n    at async C:\\Users\\aleks\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:4507","service":"backend"}

Are there known steps to reproduce?

No response

@barealek barealek added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 10, 2025
@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

Same issue on snapdragon x elite for me

@timothycarambat
Copy link
Member

Same issue on snapdragon x elite for me

Is this after downloading a model? Also have you tried a reboot post-download of the model?

Second, @barealek - I just got confirmation that we can run Elite compiled models on Plus chipsets, so we will patch that and re-release 1.7.2

@lachlanharrisdev
Copy link

Is this after downloading a model? Also have you tried a reboot post-download of the model?

I downloaded and tried to run a model, choosing the qualcomm LLM provider and NPU embedder, but it came up with the error. It failed to work after fully rebooting the app and restarting my computer, then tried all of the same things after uninstalling and reinstalling the app, which still didn't work.

I haven't done any additional setup of the NPU or anything outside of AnythingLLM, so I'm wondering if there is some driver(s) I'm missing? I'll let the experts figure it out.

@timothycarambat
Copy link
Member

timothycarambat commented Jan 11, 2025

@lachlanharrisdev - we just pushed a new build for arm64 1.7.2-r2-arm64 (version is located in top right of app window). If you don't have that version installed, download the new build and you should be okay now.

Also what device + chipset are you on? Plus, Elite, etc

@timothycarambat timothycarambat self-assigned this Jan 11, 2025
@timothycarambat timothycarambat added core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue labels Jan 11, 2025
@lachlanharrisdev
Copy link

@timothycarambat I've just installed the new build and it's still failing but it's behaving differently. After I upgraded to the new version and sent a chat, it came up with the error QNN Engine is booting. Please wait for it to finish and try again. I gave it a couple seconds, and sent a chat again, and after ~14 seconds of loading, it came up with the error from before, QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.

What I noticed is that when booting up AnythingLLM, right before the loading screen switches to the home UI, I can see a task pop up for a split second in task manager called "AnythingLLMQnnEngine", but it seems to end itself very quickly. Same task also pops up after I send a chat, after the QNN Engine "boots", but then again it quickly closes itself.

I'm currently on a Surface Laptop 7 15", running the X elite X1E-80-100.

@timothycarambat
Copy link
Member

timothycarambat commented Jan 11, 2025

@lachlanharrisdev I wrote this up to debug the engine directly (app should be closed)
https://docs.google.com/document/d/1Uk9WKCXz0a6tuKeWbaoSD1gDUGglBVycNgJBsDZJB2k/edit?usp=sharing

I have the same chipset on a Dell Latitude,

@lachlanharrisdev
Copy link

@timothycarambat yep, that found the issue

[WARN]  "Unable to initialize logging in the backend."
[ERROR] "Could not initialize backend due to error = 4000"
[ERROR] "Qnn initializeBackend FAILED!"
Failure to initialize model
Error: Failed to create the Genie Dialog.

If it's relevant, this was using llama 3.1 8b, not 3.2 3b.

@timothycarambat
Copy link
Member

@lachlanharrisdev Now this is a very different issue from other then. If you run the command as administrator does it still fail to initialize? I am wondering how/why you would require admin to execute the LLM engine, but someone else had success with that and I have to determine why that would ever be the case for anyone since that should not be required to start the QNN LLM API.

@timothycarambat
Copy link
Member

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

If you run the command as administrator does it still fail to initialize?

@timothycarambat nope, running it as admin now works and I do see QNN running on localhost.

[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 609845760 across 10 buffers"
AnythingLLMQnnEngine API Server: Starting chat API on host 127.0.0.1:8000
Build: 1.0.1 80b0117 Fri Dec 27 13:07:34 2024

I tried running AnythingLLM as administrator and, after QNN engine boots, I can successfully chat. This works for me, but I'm more than happy to keep testing things out for you, I'd love to contribute in any way I can. Should we create a new issue and continue there?

@AlphaEcho11
Copy link

@lachlanharrisdev - we just pushed a new build for arm64 1.7.2-r2-arm64 (version is located in top right of app window). If you don't have that version installed, download the new build and you should be okay now.

Build 1.7.2-r2-arm64 seems to be working well while running in Administrator mode.
QNN engine still appears to fail, especially in build 1.7.2-arm64, with an additional Boot failure of port 8080 (on some devices;cannot confirm all SoC in play).

Happy hunting, everyone!

@lachlanharrisdev
Copy link

lachlanharrisdev commented Jan 11, 2025

@timothycarambat I've just restarted my PC and now it seems to no longer work even with administrator mode... I'm guessing the same QNN Engine instance stayed online from the instructinos in the google doc, and AnythingLLM used that instance instead of booting another one (if that's even possible, I know barely anything about AI and Qualcomm). Hopefull that clears up any confusion.

@AlphaEcho11 interesting, what device are you using? Just wondering if this is only a surface laptop thing

@AlphaEcho11
Copy link

@lachlanharrisdev - Surface Pro 11 here, on the X Elite. After several device reboots and AnythingLLM refreshes, it's been working without issue.
What's the output of the backend logs when you have the device rebooted and attempting to get QNN engine up? Curious if it's failing or something further. Thanks in advance!

@barealek
Copy link
Author

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs:
backend-2025-01-11.log

@AlphaEcho11
Copy link

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs:
backend-2025-01-11.log

Thank you for the logs! Yes, seeing the QNN engine fail to get online here; going to check one more area and see if another variable is at play.

@AlphaEcho11
Copy link

From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different

I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs: backend-2025-01-11.log

Can you reattempt this with the 8B model as well? Following @timothycarambat 's previous recommendations and tweaking:

  1. Download & unpack the model
  2. Restart Anything LLM (in administrator mode)
  3. Load up workspace running the Qualcomm QNN engine with the model requested
  4. Watch for NPU performance, check backend log for QNN engine data

Let us know the results!

@SpontaneousDuck
Copy link

Surface Laptop 7 with X Elite chip here. Just spun up 3b on version v1.7.2-r2-arm64 and had the same error. Restarted in Administrator mode and it worked! Restarted in user mode and it failed again. Seems to only work in admin mode currently.

@lachlanharrisdev
Copy link

Here's my backend logs with the 8b model loaded, after launching as administrator. No luck, still tells me to reboot QNN Engine or AnythingLLM. Task manager did however show that QNN Engine was online and was processing (between 20-40% CPU usage).

BUT when i switched to the 3b model, launching as administrator did work. @AlphaEcho11 @SpontaneousDuck would you be able to try running the 8b model and see if it completely fails to work with you as well (in administrator)? Maybe we're dealing with two separate problems?

backend-2025-01-12.log

@SpontaneousDuck
Copy link

Same performance with 8b for me! Won't work in user mode, works fine on NPU with Admin mode.

@timothycarambat
Copy link
Member

timothycarambat commented Jan 11, 2025

@lachlanharrisdev One detail worth mentioning is the memory requirements for preloading the model can be a lot for some devices - ARM64 is unified memory and the NPU has lower memory bandwidth than what the CPU can leverage.

This is why you can run larger models on the CPU but not on the NPU - the NPU has less available to use. I dont recall seeing this in the thread - but how much RAM is available on the system? These are 8K content window models so it can be pretty demanding. Perhaps we can publish the default 4K content models to save on memory.

However, if you are, for example, on a 16GB RAM device - the 8B with 8K context can be too large and fail to allocate. You can see this by doing the debugging process of:
https://docs.google.com/document/d/1Uk9WKCXz0a6tuKeWbaoSD1gDUGglBVycNgJBsDZJB2k/edit?usp=sharing

The devices I have are 32GB memory - so pretty large. It may not be the end cause, but it is a detail for sure.

Outside of that, the admin mode detail is odd as I cannot replicate that error. If that is being encountered the following questions would help to be answered:

  • Is your device administrated by a corporation/IT or remote
  • Does your device have multiple user accounts on it

That would help make headway that way.

@timothycarambat
Copy link
Member

Going to close this, since this thread has multiple answers and ways to debug, but going to pin it so it is not duplicated. Will keep the conversation open for now until we know for sure the solution. It might just be something solvable wit documentation.

@timothycarambat timothycarambat pinned this issue Jan 11, 2025
@timothycarambat timothycarambat changed the title [BUG]: "QNN Engine is offline." when using a Snapdragon X Plus laptop [BUG]: "QNN Engine is offline." when using a Snapdragon X Jan 11, 2025
@1key
Copy link

1key commented Jan 11, 2025

I'm still having issues. What I've tried:

Error is "QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app."

On a Lenovo Yoga Slim 9x with a Snapdragon X Elite (with 32GB memory)

Edit:
It is a company administrated laptop (my own company, I'm the administrator). No other users on the system.

@1key
Copy link

1key commented Jan 11, 2025

Here is my log:

{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY ENABLED]\u001b[0m Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.","service":"backend"}
{"level":"info","message":"prisma:info Starting a sqlite pool with 25 connections.","service":"backend"}
{"level":"info","message":"prisma:info Started query engine http server on http://127.0.0.1:56827","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"server_boot\",\"distinctId\":\"11aa8bfa-4c82-47c5-a7f5-2f23ed2a68c0\",\"properties\":{\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"Skipping preloading of AnythingLLMOllama - LLM_PROVIDER is qnnengine.","service":"backend"}
{"level":"info","message":"Hot loading of QnnEngine - LLM_PROVIDER is qnnengine with model llama_v3_2_3b_chat_8k.","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"\u001b[36m[CommunicationKey]\u001b[0m RSA key pair generated for signed payloads within AnythingLLM services.","service":"backend"}
{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"[production] AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: false","service":"backend"}
{"level":"info","message":"\u001b[36m[BackgroundWorkerService]\u001b[0m Feature is not enabled and will not be started.","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n    at s.checkReady (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:1604)\n    at async s.streamGetChatCompletion (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:2741)\n    at async SL (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:2892)\n    at async C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:4507","service":"backend"}

One thing the log shows is 'NPU detected: False'.
But when using the Google Doc to test QNN, the webpage shows "Qnn Engine is running." and while the process is starting, Windows Taskmanager is showing some load on the NPU also.
So it seems that AnythingLLM doesn't do a correct test to check for an NPU.

@AlphaEcho11
Copy link

Here is my log:

{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY ENABLED]\u001b[0m Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.","service":"backend"}
{"level":"info","message":"prisma:info Starting a sqlite pool with 25 connections.","service":"backend"}
{"level":"info","message":"prisma:info Started query engine http server on http://127.0.0.1:56827","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"server_boot\",\"distinctId\":\"11aa8bfa-4c82-47c5-a7f5-2f23ed2a68c0\",\"properties\":{\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"Skipping preloading of AnythingLLMOllama - LLM_PROVIDER is qnnengine.","service":"backend"}
{"level":"info","message":"Hot loading of QnnEngine - LLM_PROVIDER is qnnengine with model llama_v3_2_3b_chat_8k.","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"\u001b[36m[CommunicationKey]\u001b[0m RSA key pair generated for signed payloads within AnythingLLM services.","service":"backend"}
{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"[production] AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: false","service":"backend"}
{"level":"info","message":"\u001b[36m[BackgroundWorkerService]\u001b[0m Feature is not enabled and will not be started.","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Boot failure for port 8080","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n    at s.checkReady (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:1604)\n    at async s.streamGetChatCompletion (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:31:2741)\n    at async SL (C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:2892)\n    at async C:\\Users\\rob\\AppData\\Local\\Programs\\AnythingLLM\\resources\\backend\\server.js:236:4507","service":"backend"}

One thing the log shows is 'NPU detected: False'. But when using the Google Doc to test QNN, the webpage shows "Qnn Engine is running." and while the process is starting, Windows Taskmanager is showing some load on the NPU also. So it seems that AnythingLLM doesn't do a correct test to check for an NPU.

@1key , are you running AnythingLLM on v1.7.2-r2-arm?
The issues with the boot failure of port 8080, the NPU not recognized, and then the QNN engine failing to come online all indicate you're not on the revision yet. Can you confirm?

@AlphaEcho11
Copy link

32gb of memory for me, X1E-80-100. Not administered by a company, and although I don't have any other user accounts, I did accidentally choose to install AnythingLLM for all users.

When I'm not running as administrator, I can see that QNN goes offline right after managing citations:

{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Got 1 embeddings that are 384 long","service":"backend"}
{"level":"info","message":"\u001b[36m[fillSourceWindow]\u001b[0m Need to backfill 4 chunks to fill in the source window for RAG!","service":"backend"}
{"level":"info","message":"\u001b[36m[fillSourceWindow]\u001b[0m Citations backfilled to 4 references from 0 original citations.","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 1/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 2/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 3/5","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n

Or, when there are no citations, it goes offline right after initializing QNN with a model:

{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m NPU is enabled, using QnnSDK backend at C:\\Users\\lachi\\AppData\\Local\\Programs\\AnythingLLM\\resources\\QnnSDK\\aarch64-windows-msvc\\QnnHtp.dll","service":"backend"}
{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Initialized all-minilm-l6-v2 model","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 1/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 2/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 3/5","service":"backend"}
{"level":"error","message":"Error: QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app.\n

Here's what the logs look like after doing the same thing but running as administrator. QNN goes offline, but the second attempt to reboot it works:

{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Initialized all-minilm-l6-v2 model","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 1/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 2/5","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"sent_chat\",\"distinctId\":\"d0fd9e5f-1971-49f9-8d86-3ca70f2b4d85\",\"properties\":{\"multiUserMode\":false,\"LLMSelection\":\"anythingllm_ollama\",\"Embedder\":\"qnn-native\",\"VectorDbSelection\":\"lancedb\",\"multiModal\":false,\"TTSSelection\":\"native\",\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"\u001b[32m[Event Logged]\u001b[0m - sent_chat","service":"backend"}

So from what I see, something's happening on the second attempt of booting QNN Engine that requires administrator, and no matter what something is causing QNN to go offline after initializing a model.

I also had tested installation against 'all users' on the device (which, in reality, is only myself...one man team! But I digress) and test the app.
It seems that possibly the new revision might not have this prompt, but I'd have to have @timothycarambat validate that. 1.7.2-arm installation is still good for system-wide local installation.
However, under 1.7.2-arm, I was not able to get QNN engine to arrive online regardless of whether installation was for self or system-wide.

I may test via CMD at some point, if we're still uncertain of user vs. system-wide installation has any gravity to this.

"C:\Users$USER$\Downloads\AnythingLLMDesktop-v1.7.2-r2-Arm64.exe" /allusers

@timothycarambat
Copy link
Member

Indeed 1.7.2-r2 removes this option for system wide vs current user installs. This was one way I was able to replicate the boot failure so I elimated the option to prevent fresh installs from coming across it. However the qnn binary is always bundled in the app, so a reinstall should be using the executable installed from the installer and have the appropriate permissions.

As I know it, the only reproduction I could emulate on a totally formatted machine was when I installed for all users on machine and not as the current user.

Outside of that, anything else needs to be investigated or reproduced. If the engine can run via terminal, but not when invoked by the AnythingLLM application thread, then that is at least a starting point, but I cannot replicate on the machines I have locally atm

@AlphaEcho11
Copy link

Indeed 1.7.2-r2 removes this option for system wide vs current user installs. This was one way I was able to replicate the boot failure so I elimated the option to prevent fresh installs from coming across it. However the qnn binary is always bundled in the app, so a reinstall should be using the executable installed from the installer and have the appropriate permissions.

As I know it, the only reproduction I could emulate on a totally formatted machine was when I installed for all users on machine and not as the current user.

Outside of that, anything else needs to be investigated or reproduced. If the engine can run via terminal, but not when invoked by the AnythingLLM application thread, then that is at least a starting point, but I cannot replicate on the machines I have locally atm

I'm currently re-installing under all users with 1.7.2-r2-arm, I'll let y'all know the results of my testing soon.

@AlphaEcho11
Copy link

Installed system-wide with 1.7.2-r2-arm, and we're good on NPU usage, inference all running smoothly:
Screenshot 2025-01-11 195838

@AlphaEcho11
Copy link

Having a difficult time replicating this, again...
I ran the 1.7.2-r2-arm installer system-wide, both under 'conventional' x86, as well as x64. Both times were successful in launching the app and calling the QNN engine after everything else was met.
Granted, on both instances, I did manage to capture in backend log that the QNN engine was offline initially - however, it only retried 2-3 times before successfully coming online and inference was successful with NPU showing usage.

Theory I had before this was: could the x86 RAM ceiling somehow be causing the QNN engine to fail prematurely, at least at/during the loadup of the model?
Reading just slightly over 7Gb RAM in use when AnythingLLM and the QNN engine are loaded, particularly with the 8B LLama model in the QNN engine. Will assume the 3B model will take ~5Gb RAM, so we're hitting x86 thresholds. But, just a hunch.

@timothycarambat
Copy link
Member

@AlphaEcho11 This is great analysis. I do think some issues people are running into are RAM-related, but not so much on the larger 32GB devices. That being said, the app is shipped without needing x86 emulation. In services you can see the app is running as Arm64 and not x86, however, the idea of the RAM availability still very much applies either way.

There is another, what I think is more common, factor - people using IT-managed/BitLocker devices. These often have much more controls around running applications and since we had to build our own C++ interface to get this all to work - I am theorizing the QNN C++ library is probably blocked for some devices. That might explain the reason it works in admin mode, but not when run as the user and also why people with a sole-user system on a "non-corporate" device can run it without issue.

In this case, there is an "IT acceptance" thing but also the app is not codesigned, which for sure is going to be blocked. So now it is more prudent than previously for us to get that done and pipeline setup. It needed to be done ages ago anyway. Ill make that a task for next week.

@AlphaEcho11
Copy link

AlphaEcho11 commented Jan 12, 2025

There is another, what I think is more common, factor - people using IT-managed/BitLocker devices. These often have much more controls around running applications and since we had to build our own C++ interface to get this all to work - I am theorizing the QNN C++ library is probably blocked for some devices. That might explain the reason it works in admin mode, but not when run as the user and also why people with a sole-user system on a "non-corporate" device can run it without issue.

Suspended BitLocker service on local device, launched app as system administrator (confirmed with <net localgroup administrators) on a non-corporate device, but not set 'Run as administrator'. Launched AnythingLLM and still receiving QNN engine failure to run successfully.
However, runs perfectly fine when launching under administrator account, running the program as administrator.
I think there's something else at play here.

@1key
Copy link

1key commented Jan 12, 2025

@1key , are you running AnythingLLM on v1.7.2-r2-arm? The issues with the boot failure of port 8080, the NPU not recognized, and then the QNN engine failing to come online all indicate you're not on the revision yet. Can you confirm?

Yes, I'm 100% running R2.

@timothycarambat
Copy link
Member

@AlphaEcho11 From re-reading the comments though it sounds like you installed the app "For all users on this computer" vs "current user" only. 1.7.2-r2 installer forces the install to be current user only and removed the option - so that shouldnt be an option any longer.

I ran the 1.7.2-r2-arm installer system-wide,

This line made me think that. Which should put the QNN LLM executable at a different permission level since it was installed by admin, thus requiring admin to spawn it.

I want to try to eliminate as many variables as possible is all

@talynone
Copy link

talynone commented Jan 12, 2025

@1key , are you running AnythingLLM on v1.7.2-r2-arm? The issues with the boot failure of port 8080, the NPU not recognized, and then the QNN engine failing to come online all indicate you're not on the revision yet. Can you confirm?

Yes, I'm 100% running R2.

Just wanted to add I just installed AnythingLLM 30 minutes ago and I'm also running on a Lenovo Slim 7x and have the exact same symptoms/log errors as you (even running as Administrator).

image

@AlphaEcho11
Copy link

@AlphaEcho11 From re-reading the comments though it sounds like you installed the app "For all users on this computer" vs "current user" only. 1.7.2-r2 installer forces the install to be current user only and removed the option - so that shouldnt be an option any longer.

I ran the 1.7.2-r2-arm installer system-wide,

This line made me think that. Which should put the QNN LLM executable at a different permission level since it was installed by admin, thus requiring admin to spawn it.

I want to try to eliminate as many variables as possible is all

I forced install on local system to run the .exe under all users, using
_"file_path_1.7.2-r2-arm.exe" /all users

I was attempting to see if there were any hiccups with installing this version for any/all users on the system. But I couldn't get any replicated issues.

The previous installs I had done for v1.7.2-arm had been both under system-wide and single user, and the single user was my personal, administrator profile on the PC. So v1.7.2-arm was still failing QNN engine run, but also had the Boot failure on port 8080, as well as NPU not detected (with X Elite SoC).

@timothycarambat
Copy link
Member

@talynone - what device are you on? Seems like some kind of unicode symbols are attached to cpu.model instead of plaintext like (R) it will use ®.

Patching to r3 right now for this detection + added logging

@talynone
Copy link

talynone commented Jan 12, 2025

@talynone - what device are you on? Seems like some kind of unicode symbols are attached to cpu.model instead of plaintext like (R) it will use ®.

Patching to r3 right now for this detection + added logging

As mentioned before the Lenovo Yoga Slim 7x, thanks.

image

@timothycarambat
Copy link
Member

timothycarambat commented Jan 12, 2025

@talynone Okay 1.7.2-r3 is now live - this should be able to parse your CPU model correctly as the OEM reports it. Please download and run the app and see if the engine boots. Additionally, post the backend logs as well.
https://cdn.useanything.com/latest/AnythingLLMDesktop-Arm64.exe

@talynone
Copy link

@talynone Okay 1.7.2-r3 is now live - this should be able to parse your CPU model correctly as the OEM reports it. Please download and run the app and see if the engine boots. Additionally, post the backend logs as well. https://cdn.useanything.com/latest/AnythingLLMDesktop-Arm64.exe

Thanks! Seems to work now, even in non admin mode (though I ran it in admin mode at first). Backend log attached. @1key you should try the new build too since you also have the Lenovo Slim 7x.

backend-2025-01-12.zip

@SpontaneousDuck
Copy link

Using new r3, still not starting in user mode on my Surface Laptop 7. Logs below:

{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY ENABLED]\u001b[0m Anonymous Telemetry enabled. Telemetry helps Mintplex Labs Inc improve AnythingLLM.","service":"backend"}
{"level":"info","message":"prisma:info Starting a sqlite pool with 25 connections.","service":"backend"}
{"level":"info","message":"prisma:info Started query engine http server on http://127.0.0.1:53377","service":"backend"}
{"level":"info","message":"\u001b[32m[TELEMETRY SENT]\u001b[0m {\"event\":\"server_boot\",\"distinctId\":\"cd10a950-d7f5-41e0-8da7-154d589c0101\",\"properties\":{\"runtime\":\"desktop\"}}","service":"backend"}
{"level":"info","message":"Skipping preloading of AnythingLLMOllama - LLM_PROVIDER is qnnengine.","service":"backend"}
{"level":"info","message":"Hot loading of QnnEngine - LLM_PROVIDER is qnnengine with model llama_v3_2_3b_chat_8k.","service":"backend"}
{"level":"info","message":"\u001b[36m[NativeEmbedder]\u001b[0m Initialized","service":"backend"}
{"level":"info","message":"\u001b[36m[QNN Engine]\u001b[0m Initialized with model: llama_v3_2_3b_chat_8k. Context window: 8192","service":"backend"}
{"level":"info","message":"\u001b[36m[CommunicationKey]\u001b[0m RSA key pair generated for signed payloads within AnythingLLM services.","service":"backend"}
{"level":"info","message":"\u001b[36m[EncryptionManager]\u001b[0m Loaded existing key & salt for encrypting arbitrary data.","service":"backend"}
{"level":"info","message":"[production] AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: true OS: windows (arm64)","service":"backend"}
{"level":"info","message":"\u001b[36m[BackgroundWorkerService]\u001b[0m Feature is not enabled and will not be started.","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 1/5","service":"backend"}
{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m NPU is enabled, using QnnSDK backend at C:\\Users\\k.witham\\AppData\\Local\\Programs\\AnythingLLM\\resources\\QnnSDK\\aarch64-windows-msvc\\QnnHtp.dll","service":"backend"}
{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Checking for valid NPU cores.","service":"backend"}
{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Checking core Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz - MATCH","service":"backend"}
{"level":"info","message":"\u001b[36m[QnnNativeEmbedder]\u001b[0m Initialized all-minilm-l6-v2 model","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 2/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 3/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 4/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 5/5","service":"backend"}

@timothycarambat
Copy link
Member

timothycarambat commented Jan 12, 2025

@SpontaneousDuck What is your device + Specs?

AnythingLLM Standalone Backend listening on port 3001. Network discovery is disabled. NPU Detected: true OS: windows (arm64)","service":"backend"}

Means we detect a valid NPU - so everything is good to go.

{"level":"info","message":"QNNEngine offline - retrying. 2/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 3/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 4/5","service":"backend"}
{"level":"info","message":"QNNEngine offline - retrying. 5/5","service":"backend"}

However means the engine did not boot in a given fixed timeframe (which quite long). This can be increased if your device has less available RAM than others, but I want to confirm that first.

One way to test that is to run the Google Drive debugging doc posted in this thread.
https://docs.google.com/document/d/1Uk9WKCXz0a6tuKeWbaoSD1gDUGglBVycNgJBsDZJB2k/edit?usp=sharing

@SpontaneousDuck
Copy link

@timothycarambat I have a Surface Laptop 7 w/ X Elite and 32GB of RAM. I posted the results of running the debugging steps below. Same behavior it errors out in use mode but starts in admin mode. It did take quite a while to start in admin mode. The 3b model took about 8 seconds to start and 8b took about 12 seconds (by my rough counting).

❯ AnythingLLMQnnEngine.exe --config llama_v3_1_8b_chat_8k
[WARN]  "Unable to initialize logging in the backend."
[ERROR] "Could not initialize backend due to error = 4000"
[ERROR] "Qnn initializeBackend FAILED!"
❯ sudo AnythingLLMQnnEngine.exe --config llama_v3_1_8b_chat_8k
[INFO]  "Using create From Binary"
[INFO]  "Allocated total size = 609845760 across 10 buffers"
AnythingLLMQnnEngine API Server: Starting chat API on host 127.0.0.1:8000
Build: 1.0.1 80b0117 Fri Dec 27 13:07:34 2024

@timothycarambat
Copy link
Member

timothycarambat commented Jan 12, 2025

This is a personal device or a corporate-managed device you have admin perms on? I have an X Elite w 32GB and that is an insanely long boot time. If you wanted to try, does AnythingLLM running as admin result in failure to boot the QNN Engine?

@SpontaneousDuck
Copy link

@timothycarambat It is a corporate managed device w/ Bitlocker that I have admin privileges for. It is pretty light control though and basically just has company login and Bitlocker enforced. Running AnythingLLM as admin does work just fine 😊

@timothycarambat
Copy link
Member

Obviously not ideal, but at least that is a way to go for the interim while we figure out what is blocking running as current user.

Just another data point to work with

@AlphaEcho11
Copy link

@talynone Okay 1.7.2-r3 is now live - this should be able to parse your CPU model correctly as the OEM reports it. Please download and run the app and see if the engine boots. Additionally, post the backend logs as well. https://cdn.useanything.com/latest/AnythingLLMDesktop-Arm64.exe

Surface Pro 11 | Snapdragon X Elite X1E-80-100

backend-2025-01-12.log
Updated to 1.7.2-r3 and ran at first as admin, then relaunched without admin, then again as admin. QNN engine did initialize successfully each time while running r3 as admin, but failed (as seen in the backend logs) during the 2nd/non-admin run of the app.
Reviewed in real-time services on host PC and showed the QNN engine briefly running as service, then dropping, when running as non-admin.

Quite interesting, but still able to run. @timothycarambat if you want me to pull anything else beyond the backend log, just let me know!

@snickler
Copy link

snickler commented Jan 15, 2025

To tag on to @AlphaEcho11 's discovery - bits and pieces of the crash dump from my AnythingLLMQnnEngine.exe process running 1.7.2-r3. The NPU feature works only when AnythingLLM is executed under elevation from my Surface Laptop 7 (X1E-80-100).

Dump
CONTEXT:  (.ecxr)
 x0=0000000000000023   x1=000000b1f030c920   x2=0000000000000000   x3=00007ffaf2eefef0
 x4=000000b1f030c9b0   x5=00007ffaf2eeff80   x6=0000000000000000   x7=000000000000016b
 x8=0000000000000000   x9=00007ffaf2eef9f0  x10=000000b1f030c9d0  x11=00007ffaf2cd0208
x12=00007ffaf2eb7d24  x13=0000000000000001  x14=0000000000000010  x15=0000000000000000
x16=00008ec0075ed2a2  x17=000000b1f030c320  x18=000000b1f042d000  x19=000000000000001d
x20=000000b1f1000000  x21=000000b1f11305a0  x22=0000000000000000  x23=000000000000001d
x24=000000b1f030caa0  x25=000000000000001d  x26=000000b1f030ea10  x27=00007ffaf2ef5000
x28=0000000000000000   fp=000000b1f030bea0   lr=00007ffaf2c6d3c0   sp=000000b1f030bea0
 pc=00007ffaf2c6d3cc  psr=60001040 -ZC- EL0
ntdll!RtlpHeapFatalExceptionFilter+0x1c:
00007ffa`f2c6d3cc d43e0060 brk         #0xF003
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffaf2c6d3cc (ntdll!RtlpHeapFatalExceptionFilter+0x000000000000001c)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000023
Subcode: 0x23 FAST_FAIL_UNEXPECTED_HEAP_EXCEPTION 

PROCESS_NAME:  AnythingLLMQnnEngine.exe

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

EXCEPTION_CODE_STR:  c0000409

EXCEPTION_PARAMETER1:  0000000000000023

STACK_TEXT:  
000000b1`f030bea0 00007ffa`f2cd2e70     : 000000b1`f030bf10 d44a7ffa`f2cd2e70 000000b1`f030bf10 206bfffa`f2cd0d38 : ntdll!RtlpHeapFatalExceptionFilter+0x1c
000000b1`f030beb0 00007ffa`f2cd0d38     : 000000b1`f030bf10 206bfffa`f2cd0d38 00000000`00000000 00007ffa`f2eb7d38 : ntdll!RtlFreeHeap$filt$0+0x18
000000b1`f030bec0 00007ffa`f2c50dec     : 00000000`00000000 00007ffa`f2eb7d38 00007ffa`f2eb7d38 000000b1`f030c030 : ntdll!_C_ExecuteExceptionFilter+0x38
000000b1`f030bf20 00007ffa`f2cd0270     : 000000b1`f030bf80 c96bfffa`f2cd0270 000000b1`f030c9d0 000000b1`f030c620 : ntdll!_C_specific_handler+0xbc
000000b1`f030bf80 00007ffa`f2cd0ac4     : 000000b1`f030bfb0 4f647ffa`f2cd0ac4 00000000`00000000 000000b1`f030c9d0 : ntdll!_GSHandlerCheck_SEH+0x68
000000b1`f030bfb0 00007ffa`f2bb37fc     : 000000b1`f030c5c0 00007ffa`f2bb37fc 000000b1`f030c030 00000000`00000000 : ntdll!RtlpExecuteHandlerForException+0x14
000000b1`f030bfd0 00007ffa`f2cd0924     : 00000000`00000000 00000000`00000000 000000b1`f030c020 00000000`00000000 : ntdll!RtlDispatchException+0x2e4
000000b1`f030c620 00007ffa`f2b84830     : 80001040`0040000f 000000b1`f1000140 00000000`00000000 00000000`0000001d : ntdll!KiUserExceptionDispatch+0x24
000000b1`f030caa0 00007ffa`eea211d8     : 00000004`006b0076 86b6bced`b08b8014 ffffffff`fffffffe 000000b1`f7f72058 : ntdll!RtlFreeHeap+0x170
000000b1`f030cb70 00007ffa`1f94840c     : 000000b1`f030cbd0 431e7ffa`1f94840c 000000b1`f11265e0 000000b1`f030cc20 : ucrtbase!free_base+0x28
000000b1`f030cb90 00007ffa`1f937a34     : 000000b1`f11265e0 000000b1`f11265e0 000000b1`f11305a0 000000b1`f030f2d8 : Genie!onigenc_strlen_null+0x5603c
000000b1`f030cbd0 00007ffa`1f9066dc     : 000000b1`f030ccc0 00007ffa`1f9066dc ffffffff`fffffffe 000000b1`f030cc20 : Genie!onigenc_strlen_null+0x45664
000000b1`f030cc20 00007ffa`1f8c2bd0     : 000013a5`576ef4c8 ffffffff`fffffffe 00007ffa`1fa95570 000000b1`f030cc20 : Genie!onigenc_strlen_null+0x1430c
000000b1`f030cd00 00007ffa`1f95df98     : 00000000`00000000 00007ffa`1f95df98 000000b1`f030efd0 00007ffa`1f7fab20 : Genie!GenieDialog_embeddingQuery+0x75080
000000b1`f030cd10 00007ffa`1f7fab20     : 000000b1`f030efd0 00007ffa`1f7fab20 000000b1`f030f030 f57ffffa`1f93116c : Genie!onigenc_strlen_null+0x6bbc8
000000b1`f030cd20 00007ffa`1f932874     : 000000b1`f030f030 f57ffffa`1f93116c 000000b1`f030cd90 00007ffa`1f932874 : Genie!onig_builtin_fail+0x9bf00
000000b1`f030cd90 00007ffa`1f9302f8     : 000000b1`f030ce20 1534fffa`1f9302f8 00000003`ffffffff 475bfffa`00000004 : Genie!onigenc_strlen_null+0x404a4
000000b1`f030ce20 00007ffa`1f9322a0     : 000000b1`f030ce60 03277ffa`1f9322a0 000000b1`f030f030 000000b1`f030d590 : Genie!onigenc_strlen_null+0x3df28
000000b1`f030ce60 00007ffa`1f930810     : 000000b1`f030cec0 1e39fffa`1f930810 00000000`00000000 000000b1`f030cfe0 : Genie!onigenc_strlen_null+0x3fed0
000000b1`f030cec0 00007ffa`1f92aef0     : 000000b1`f030cf00 b13efffa`1f92aef0 000000b1`f030f030 000000b1`f030d190 : Genie!onigenc_strlen_null+0x3e440
000000b1`f030cf00 00007ffa`f2cd0ae4     : 000000b1`f030cf40 0c62fffa`f2cd0ae4 000000b1`f030db30 00007ffa`1f650000 : Genie!onigenc_strlen_null+0x38b20
000000b1`f030cf40 00007ffa`f2bb4158     : 000000b1`f030d530 00007ffa`f2bb4158 000000b1`f030cfe0 00000000`00000000 : ntdll!RtlpExecuteHandlerForUnwind+0x14
000000b1`f030cf60 00007ffa`1f930620     : 00000000`00000000 00000000`00000000 000000b1`f030cfb0 00000000`00000000 : ntdll!RtlUnwindEx+0x2e8
000000b1`f030d590 00007ffa`1f93185c     : 00000002`80000029 00000000`00000000 00000000`00000000 00000000`0000000f : Genie!onigenc_strlen_null+0x3e250
000000b1`f030d640 00007ffa`1f931c04     : 000000b1`f030d790 000000b1`f030d940 000000b1`f030d600 29737ffa`1f931100 : Genie!onigenc_strlen_null+0x3f48c
000000b1`f030d6d0 00007ffa`1f932398     : 00000000`00000000 00000000`00000000 00000000`00000000 00007ffa`1fa93e00 : Genie!onigenc_strlen_null+0x3f834
000000b1`f030d820 00007ffa`1f930810     : 000000b1`f030d880 e45afffa`1f930810 00007ffa`1f650000 000000b1`f030d940 : Genie!onigenc_strlen_null+0x3ffc8
000000b1`f030d880 00007ffa`f2cd0ac4     : 000000b1`f030d8c0 d1407ffa`f2cd0ac4 000000b1`f030fa80 730bfffa`00000000 : Genie!onigenc_strlen_null+0x3e440
000000b1`f030d8c0 00007ffa`f2bb37fc     : 000000b1`f030ded0 00007ffa`f2bb37fc 000000b1`f030d940 00000000`00000000 : ntdll!RtlpExecuteHandlerForException+0x14
000000b1`f030d8e0 00007ffa`f2bb2654     : 00000000`00000000 00000000`00000000 000000b1`f030d930 00000020`00000000 : ntdll!RtlDispatchException+0x2e4
000000b1`f030df30 00007ffa`ee405928     : 00000000`00000000 00000000`00000000 000000b1`f030df58 00000000`00000000 : ntdll!RtlRaiseException+0xf4
000000b1`f030e410 00007ffa`1f930134     : 00000081`e06d7363 00000000`00000000 00007ffa`ee405928 00000000`00000004 : KERNELBASE!RaiseException+0x58
000000b1`f030e4c0 00007ffa`1f8c4628     : 000000b1`f030e9b0 a4537ffa`1f8c4628 00007ffa`1f650000 00000000`19930520 : Genie!onigenc_strlen_null+0x3dd64
000000b1`f030e520 00007ffa`1f8c242c     : 000013a5`572439ec ffffffff`ffffffff 000000b1`f4593780 000000b1`f030e5d0 : Genie!GenieDialog_embeddingQuery+0x76ad8
000000b1`f030ea10 00007ffa`1f8c3508     : 000000b1`f7f79aa0 000000b1`f030eea4 00000000`f1002600 000000b1`f4332120 : Genie!GenieDialog_embeddingQuery+0x748dc
000000b1`f030f030 00007ffa`1f88d680     : 000000b1`f030f0f0 00007ffa`1f88d680 ffffffff`fffffffe 000000b1`f113da20 : Genie!GenieDialog_embeddingQuery+0x759b8
000000b1`f030f060 00007ffa`1f86b6dc     : 000000b1`f112bb50 866ffffa`1f80e100 000000b1`f112aa30 ffffffff`fffffffe : Genie!GenieDialog_embeddingQuery+0x3fb30
000000b1`f030f140 00007ffa`1f867b2c     : 000000b1`f030f150 000000b1`f112a8a0 00797261`6d697200 00000000`00000000 : Genie!GenieDialog_embeddingQuery+0x1db8c
000000b1`f030f2a0 00007ffa`1f89e788     : 000000b1`f1001640 000000b1`f1115000 00000000`00000000 000000b1`f1123dd0 : Genie!GenieDialog_embeddingQuery+0x19fdc
000000b1`f030f5f0 00007ffa`1f89eb1c     : 000000b1`f112f890 000000b1`f112f880 ffffffff`fffffffe 000000b1`f1123dd0 : Genie!GenieDialog_embeddingQuery+0x50c38
000000b1`f030f6d0 00007ffa`1f8716c8     : 000000b1`f030f840 00007ffa`1f8716c8 ffffffff`fffffffe 000000b1`f030f6f8 : Genie!GenieDialog_embeddingQuery+0x50fcc
000000b1`f030f740 00007ffa`1f80fdbc     : 00000000`00000000 00000000`00000000 00000063`69736100 00000000`00000000 : Genie!GenieDialog_embeddingQuery+0x23b78
000000b1`f030f8a0 00007ffa`1f8013d8     : 000000b1`f030fa40 00000050`00000006 00000000`00000028 000000b1`f1131048 : Genie!GenieDialog_tokenQuery+0xd6cc
000000b1`f030fa00 00007ff7`5474a9ac     : 000000b1`f030fd90 00007ff7`5474a9ac 00000002`00000002 000000b1`f1131f10 : Genie!GenieDialog_create+0x128
000000b1`f030fa80 00007ff7`5474dc94     : 000000b1`f1109c50 00000000`00000000 00000000`0000005b 00000000`0000005f : AnythingLLMQnnEngine+0x3a9ac
000000b1`f030fdf0 00007ff7`5474dd2c     : 000000b1`f030fe30 4b6e7ff7`5474dd2c 00000000`00000000 00000000`00000000 : AnythingLLMQnnEngine+0x3dc94
000000b1`f030fe30 00007ffa`f24e87a0     : 000000b1`f030fe40 77207ffa`f24e87a0 000000b1`f030fe50 6e47fffa`f2c3f864 : AnythingLLMQnnEngine+0x3dd2c
000000b1`f030fe40 00007ffa`f2c3f864     : 000000b1`f030fe50 6e47fffa`f2c3f864 00000000`00000000 cb7f8000`00000000 : kernel32!BaseThreadInitThunk+0x30
000000b1`f030fe50 00000000`00000000     : 00000000`00000000 cb7f8000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x44


STACK_COMMAND:  ~0s; .ecxr ; kb

SYMBOL_NAME:  ucrtbase!free_base+28

MODULE_NAME: ucrtbase

IMAGE_NAME:  ucrtbase.dll

FAILURE_BUCKET_ID:  FAIL_FAST_UNEXPECTED_HEAP_EXCEPTION_c0000409_ucrtbase.dll!free_base

OS_VERSION:  10.0.27768.1000

BUILDLAB_STR:  rs_prerelease

OSPLATFORM_TYPE:  arm64

OSNAME:  Windows 10

IMAGE_VERSION:  10.0.27768.1000

@timothycarambat
Copy link
Member

@snickler What is your RAM availability on the machine?

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application.

Would indicate that the model could not be loaded into memory and therefore was throw out and dumped for security reasons that dont apply when running as admin.

@snickler
Copy link

@snickler What is your RAM availability on the machine?

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application.

Would indicate that the model could not be loaded into memory and therefore was throw out and dumped for security reasons that dont apply when running as admin.

Naturally, I can't reproduce the crash dump when I need to, but still receive the "QNN Engine is offline" when running as a regular user. I have 15GB available out of 32GB.

As I was typing out the information for this, I decided to use procdump to make a mem dump when the AnythingLLMQnnEngine.exe process crashed.

results
COMMENT:  
*** procdump  -accepteula -ma -e -w -l AnythingLLMQnnEngine.exe
*** Unhandled exception: C0000005.ACCESS_VIOLATION

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

CONTEXT:  (.ecxr)
 x0=000000b482144540   x1=0000000000000000   x2=000000b485200440   x3=0000000000000000
 x4=0000000000000000   x5=0000000000000000   x6=0000000000000000   x7=0000000000000000
 x8=000291000000008f   x9=000000b485222010  x10=0000000000000001  x11=0000005800020040
x12=000000b4820018c0  x13=0000000000000000  x14=0000000000000000  x15=00007ffa18444a90
x16=fffffffffffffffe  x17=0000ce858fcfe6f5  x18=000000b4810ee000  x19=000000b482144260
x20=000000b482144260  x21=000000b482130a20  x22=000000b4812fefb8  x23=000000000000000f
x24=0000000000000000  x25=000000b482144540  x26=000000b4812fe6f0  x27=0000000000000016
x28=0000000000000000   fp=000000b4812fc900   lr=00007ffa18447a34   sp=000000b4812fc8c0
 pc=00007ffa18458404  psr=60000040 -ZC- EL0
Genie!onigenc_strlen_null+0x56034:
00007ffa`18458404 f9400100 ldr         x0,[x8]
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffa18458404 (Genie!onigenc_strlen_null+0x0000000000056034)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 000291000000008f
Attempt to read from address 000291000000008f

PROCESS_NAME:  AnythingLLMQnnEngine.exe

READ_ADDRESS:  000291000000008f 

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  000291000000008f

STACK_TEXT:  
000000b4`812fc8c0 00007ffa`18447a34     : 000000b4`82144260 000000b4`82144260 000000b4`82130a20 000000b4`812fefb8 : Genie!onigenc_strlen_null+0x56034
000000b4`812fc900 00007ffa`184166dc     : 000000b4`812fc9f0 00007ffa`184166dc ffffffff`fffffffe 000000b4`812fc950 : Genie!onigenc_strlen_null+0x45664
000000b4`812fc950 00007ffa`183d2bd0     : 00001a07`d0a1e180 ffffffff`fffffffe 00007ffa`185a5570 000000b4`812fc950 : Genie!onigenc_strlen_null+0x1430c
000000b4`812fca30 00007ffa`1846df98     : 00000000`00000000 00007ffa`1846df98 000000b4`812fecb0 00007ffa`1830ab20 : Genie!GenieDialog_embeddingQuery+0x75080
000000b4`812fca40 00007ffa`1830ab20     : 000000b4`812fecb0 00007ffa`1830ab20 000000b4`812fed10 100b7ffa`1844116c : Genie!onigenc_strlen_null+0x6bbc8
000000b4`812fca50 00007ffa`18442874     : 000000b4`812fed10 100b7ffa`1844116c 000000b4`812fcac0 00007ffa`18442874 : Genie!onig_builtin_fail+0x9bf00
000000b4`812fcac0 00007ffa`184402f8     : 000000b4`812fcb50 39187ffa`184402f8 00000003`ffffffff cf50fffa`00000004 : Genie!onigenc_strlen_null+0x404a4
000000b4`812fcb50 00007ffa`184422a0     : 000000b4`812fcb90 1a667ffa`184422a0 000000b4`812fed10 000000b4`812fd2c0 : Genie!onigenc_strlen_null+0x3df28
000000b4`812fcb90 00007ffa`18440810     : 000000b4`812fcbf0 8e147ffa`18440810 00000000`00000000 000000b4`812fcd10 : Genie!onigenc_strlen_null+0x3fed0
000000b4`812fcbf0 00007ffa`1843aef0     : 000000b4`812fcc30 923cfffa`1843aef0 000000b4`812fed10 000000b4`812fcec0 : Genie!onigenc_strlen_null+0x3e440
000000b4`812fcc30 00007ffa`f2cd0ae4     : 000000b4`812fcc70 72557ffa`f2cd0ae4 000000b4`812fd860 00007ffa`18160000 : Genie!onigenc_strlen_null+0x38b20
000000b4`812fcc70 00007ffa`f2bb4158     : 000000b4`812fd260 00007ffa`f2bb4158 000000b4`812fcd10 00000000`00000000 : ntdll!RtlpExecuteHandlerForUnwind+0x14
000000b4`812fcc90 00007ffa`18440620     : 00000000`00000000 00000000`00000000 000000b4`812fcce0 00000000`00000000 : ntdll!RtlUnwindEx+0x2e8
000000b4`812fd2c0 00007ffa`1844185c     : 00000002`80000029 00000000`00000000 00000000`00000000 00000000`0000000f : Genie!onigenc_strlen_null+0x3e250
000000b4`812fd370 00007ffa`18441c04     : 000000b4`812fd4c0 000000b4`812fd670 000000b4`812fd300 3a58fffa`18441100 : Genie!onigenc_strlen_null+0x3f48c
000000b4`812fd400 00007ffa`18442398     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : Genie!onigenc_strlen_null+0x3f834
000000b4`812fd550 00007ffa`18440810     : 000000b4`812fd5b0 b1747ffa`18440810 00007ffa`18160000 000000b4`812fd670 : Genie!onigenc_strlen_null+0x3ffc8
000000b4`812fd5b0 00007ffa`f2cd0ac4     : 000000b4`812fd5f0 d807fffa`f2cd0ac4 000000b4`812ff760 dd4d7ffa`00000000 : Genie!onigenc_strlen_null+0x3e440
000000b4`812fd5f0 00007ffa`f2bb37fc     : 000000b4`812fdc00 00007ffa`f2bb37fc 000000b4`812fd670 00000000`00000000 : ntdll!RtlpExecuteHandlerForException+0x14
000000b4`812fd610 00007ffa`f2cd0924     : 00000000`00000000 00000000`00000000 000000b4`812fd660 000000b4`00000000 : ntdll!RtlDispatchException+0x2e4
000000b4`812fdc60 00007ffa`ee405928     : 60000040`0040000f 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!KiUserExceptionDispatch+0x24
000000b4`812fe0f0 00007ffa`18440134     : 00000081`e06d7363 00000000`00000000 00007ffa`ee405928 00000000`00000004 : KERNELBASE!RaiseException+0x58
000000b4`812fe1a0 00007ffa`183d4628     : 000000b4`812fe690 cd7dfffa`183d4628 00007ffa`18160000 00000000`19930520 : Genie!onigenc_strlen_null+0x3dd64
000000b4`812fe200 00007ffa`183d242c     : 00001a07`cebca9e0 ffffffff`ffffffff 000000b4`8559b720 000000b4`812fe2b0 : Genie!GenieDialog_embeddingQuery+0x76ad8
000000b4`812fe6f0 00007ffa`183d3508     : 000000b4`88e916e0 000000b4`812feb84 00000000`82002600 000000b4`85343120 : Genie!GenieDialog_embeddingQuery+0x748dc
000000b4`812fed10 00007ffa`1839d680     : 000000b4`812fedd0 00007ffa`1839d680 ffffffff`fffffffe 000000b4`82140a20 : Genie!GenieDialog_embeddingQuery+0x759b8
000000b4`812fed40 00007ffa`1837b6dc     : 000000b4`8212cc30 72297ffa`1831e100 000000b4`8212b9e0 ffffffff`fffffffe : Genie!GenieDialog_embeddingQuery+0x3fb30
000000b4`812fee20 00007ffa`18377b2c     : 000000b4`812fee30 000000b4`8212b9d0 00797261`6d697200 00000000`00000000 : Genie!GenieDialog_embeddingQuery+0x1db8c
000000b4`812fef80 00007ffa`183ae788     : 000000b4`82001640 000000b4`82116000 00000000`00000000 000000b4`82125140 : Genie!GenieDialog_embeddingQuery+0x19fdc
000000b4`812ff2d0 00007ffa`183aeb1c     : 000000b4`82133b90 000000b4`82133b80 ffffffff`fffffffe 000000b4`82125140 : Genie!GenieDialog_embeddingQuery+0x50c38
000000b4`812ff3b0 00007ffa`183816c8     : 000000b4`812ff520 00007ffa`183816c8 ffffffff`fffffffe 000000b4`812ff3d8 : Genie!GenieDialog_embeddingQuery+0x50fcc
000000b4`812ff420 00007ffa`1831fdbc     : 00000000`00000000 00000000`00000000 00000063`69736100 00000000`00000000 : Genie!GenieDialog_embeddingQuery+0x23b78
000000b4`812ff580 00007ffa`183113d8     : 000000b4`812ff720 00000050`0000000e 00000000`00000028 000000b4`82131048 : Genie!GenieDialog_tokenQuery+0xd6cc
000000b4`812ff6e0 00007ff7`5474a9ac     : 000000b4`812ffa70 00007ff7`5474a9ac 00000002`00000002 000000b4`82131f60 : Genie!GenieDialog_create+0x128
000000b4`812ff760 00007ff7`5474dc94     : 000000b4`82115d10 00000000`00000000 00000000`0000005b 00000000`0000005f : AnythingLLMQnnEngine+0x3a9ac
000000b4`812ffad0 00007ff7`5474dd2c     : 000000b4`812ffb10 d0327ff7`5474dd2c 00000000`00000000 00000000`00000000 : AnythingLLMQnnEngine+0x3dc94
000000b4`812ffb10 00007ffa`f24e87a0     : 000000b4`812ffb20 0a39fffa`f24e87a0 000000b4`812ffb30 e724fffa`f2c3f864 : AnythingLLMQnnEngine+0x3dd2c
000000b4`812ffb20 00007ffa`f2c3f864     : 000000b4`812ffb30 e724fffa`f2c3f864 00000000`00000000 ab6c8000`00000000 : kernel32!BaseThreadInitThunk+0x30
000000b4`812ffb30 00000000`00000000     : 00000000`00000000 ab6c8000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x44


STACK_COMMAND:  ~0s; .ecxr ; kb

SYMBOL_NAME:  Genie+56034

MODULE_NAME: Genie

IMAGE_NAME:  Genie.dll

FAILURE_BUCKET_ID:  INVALID_POINTER_READ_c0000005_Genie.dll!Unknown

OS_VERSION:  10.0.27768.1000

BUILDLAB_STR:  rs_prerelease

OSPLATFORM_TYPE:  arm64

OSNAME:  Windows 10

@timothycarambat
Copy link
Member

@snickler Ah, that is perfect. I know what this is. The underlying Genie library (which does a lot of the malloc and free has a bug in it. I have already informed their eng team about it. TLDR; they have a bug and so we have one too for whatever causes this to not happen in admin mode.

When they bump the underlying SDK I will patch it and we should be good. All of this stuff is new so I appreciate the patience.

Seriously, thanks for helping grab that procdump - so many times I am left with a very vague explanation of what happened and cant replicate since I dont own every device on Earth haha

@Genius-Tools
Copy link

Here's what worked for me

  1. Download and install v1.7.3-r2 of AnythingLLM or the latest version.
  2. Download the latest QUALCOMM drivers here: Qualcomm Developer
    Specifically, the:
  1. Restart your Snapdragon X Elite Computer
  2. Restart AnythingLLM in Admin mode
  3. Run your selected and newly downloaded QualcommQNN Models locally! Youve got two Meta Llama models to choose from with more to come in due time as Qualcomm CEO is bullish on Deepseek and is integrating it imminently! (Microsoft added it to VS Code BTW)

Enjoy NPU usage and snappy performance, private and offline.
Anecdotal observation: Noticed a considerable speed increase on the computer's general performance after installing these Qualcomm drivers.

@timothycarambat
Copy link
Member

@Genius-Tools I looked into the MSFT deepseek 1.5B Qwen distill they support via VSCode. It is onnxruntime, but the official oss library they use in that plugin have for that does not support QNN

So it seems like a private fork of onnxruntime-genai currently so we cannot add our support like MSFT has until they hopefully decide to merge that library support in. Very disappointing. From what I know the deepseek model wont be on the Qualcomm AI hub for a few months unless they go into overdrive or reprioritize it. Which by then something better than deepseek will exist.

I wanted to deliver this for everyone without them needing VSCode or something crazy, but blocked at every turn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-team-only Desktop investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

9 participants