-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"QNN Engine is offline." when using a Snapdragon X/NPU #2962
Comments
Same issue on snapdragon x elite for me |
Is this after downloading a model? Also have you tried a reboot post-download of the model? Second, @barealek - I just got confirmation that we can run Elite compiled models on Plus chipsets, so we will patch that and re-release 1.7.2 |
I downloaded and tried to run a model, choosing the qualcomm LLM provider and NPU embedder, but it came up with the error. It failed to work after fully rebooting the app and restarting my computer, then tried all of the same things after uninstalling and reinstalling the app, which still didn't work. I haven't done any additional setup of the NPU or anything outside of AnythingLLM, so I'm wondering if there is some driver(s) I'm missing? I'll let the experts figure it out. |
@lachlanharrisdev - we just pushed a new build for arm64 Also what device + chipset are you on? Plus, Elite, etc |
@timothycarambat I've just installed the new build and it's still failing but it's behaving differently. After I upgraded to the new version and sent a chat, it came up with the error What I noticed is that when booting up AnythingLLM, right before the loading screen switches to the home UI, I can see a task pop up for a split second in task manager called "AnythingLLMQnnEngine", but it seems to end itself very quickly. Same task also pops up after I send a chat, after the QNN Engine "boots", but then again it quickly closes itself. I'm currently on a Surface Laptop 7 15", running the X elite X1E-80-100. |
@lachlanharrisdev I wrote this up to debug the engine directly (app should be closed) I have the same chipset on a Dell Latitude, |
@timothycarambat yep, that found the issue
If it's relevant, this was using llama 3.1 8b, not 3.2 3b. |
@lachlanharrisdev Now this is a very different issue from other then. If you run the command as administrator does it still fail to initialize? I am wondering how/why you would require admin to execute the LLM engine, but someone else had success with that and I have to determine why that would ever be the case for anyone since that should not be required to start the QNN LLM API. |
From the recent patch that seemed to solve most issues people had (most Plus support was not enabled) but this is certainly something different |
@timothycarambat nope, running it as admin now works and I do see QNN running on localhost.
I tried running AnythingLLM as administrator and, after QNN engine boots, I can successfully chat. This works for me, but I'm more than happy to keep testing things out for you, I'd love to contribute in any way I can. Should we create a new issue and continue there? |
Build 1.7.2-r2-arm64 seems to be working well while running in Administrator mode. Happy hunting, everyone! |
@timothycarambat I've just restarted my PC and now it seems to no longer work even with administrator mode... I'm guessing the same QNN Engine instance stayed online from the instructinos in the google doc, and AnythingLLM used that instance instead of booting another one (if that's even possible, I know barely anything about AI and Qualcomm). Hopefull that clears up any confusion. @AlphaEcho11 interesting, what device are you using? Just wondering if this is only a surface laptop thing |
@lachlanharrisdev - Surface Pro 11 here, on the X Elite. After several device reboots and AnythingLLM refreshes, it's been working without issue. |
I am still having issues, even when launching as an administrative account. It seems like it's starting up now, I get a message that roughly says "QNN is still booting, please wait", but then it just crashes and the QNN engine goes offline. Here's my logs: |
Thank you for the logs! Yes, seeing the QNN engine fail to get online here; going to check one more area and see if another variable is at play. |
Can you reattempt this with the 8B model as well? Following @timothycarambat 's previous recommendations and tweaking:
Let us know the results! |
Surface Laptop 7 with X Elite chip here. Just spun up 3b on version |
Here's my backend logs with the 8b model loaded, after launching as administrator. No luck, still tells me to reboot QNN Engine or AnythingLLM. Task manager did however show that QNN Engine was online and was processing (between 20-40% CPU usage). BUT when i switched to the 3b model, launching as administrator did work. @AlphaEcho11 @SpontaneousDuck would you be able to try running the 8b model and see if it completely fails to work with you as well (in administrator)? Maybe we're dealing with two separate problems? |
Same performance with 8b for me! Won't work in user mode, works fine on NPU with Admin mode. |
@lachlanharrisdev One detail worth mentioning is the memory requirements for preloading the model can be a lot for some devices - ARM64 is unified memory and the NPU has lower memory bandwidth than what the CPU can leverage. This is why you can run larger models on the CPU but not on the NPU - the NPU has less available to use. I dont recall seeing this in the thread - but how much RAM is available on the system? These are 8K content window models so it can be pretty demanding. Perhaps we can publish the default 4K content models to save on memory. However, if you are, for example, on a 16GB RAM device - the 8B with 8K context can be too large and fail to allocate. You can see this by doing the debugging process of: The devices I have are 32GB memory - so pretty large. It may not be the end cause, but it is a detail for sure. Outside of that, the admin mode detail is odd as I cannot replicate that error. If that is being encountered the following questions would help to be answered:
That would help make headway that way. |
Going to close this, since this thread has multiple answers and ways to debug, but going to pin it so it is not duplicated. Will keep the conversation open for now until we know for sure the solution. It might just be something solvable wit documentation. |
I'm still having issues. What I've tried:
Error is "QNN Engine is offline. Please reboot QNN Engine or AnythingLLM app." On a Lenovo Yoga Slim 9x with a Snapdragon X Elite (with 32GB memory) Edit: |
Here is my log:
One thing the log shows is 'NPU detected: False'. |
@1key , are you running AnythingLLM on v1.7.2-r2-arm? |
I also had tested installation against 'all users' on the device (which, in reality, is only myself...one man team! But I digress) and test the app. I may test via CMD at some point, if we're still uncertain of user vs. system-wide installation has any gravity to this.
|
Indeed 1.7.2-r2 removes this option for system wide vs current user installs. This was one way I was able to replicate the boot failure so I elimated the option to prevent fresh installs from coming across it. However the qnn binary is always bundled in the app, so a reinstall should be using the executable installed from the installer and have the appropriate permissions. As I know it, the only reproduction I could emulate on a totally formatted machine was when I installed for all users on machine and not as the current user. Outside of that, anything else needs to be investigated or reproduced. If the engine can run via terminal, but not when invoked by the AnythingLLM application thread, then that is at least a starting point, but I cannot replicate on the machines I have locally atm |
I'm currently re-installing under all users with 1.7.2-r2-arm, I'll let y'all know the results of my testing soon. |
Having a difficult time replicating this, again... Theory I had before this was: could the x86 RAM ceiling somehow be causing the QNN engine to fail prematurely, at least at/during the loadup of the model? |
@AlphaEcho11 This is great analysis. I do think some issues people are running into are RAM-related, but not so much on the larger 32GB devices. That being said, the app is shipped without needing x86 emulation. In services you can see the app is running as Arm64 and not x86, however, the idea of the RAM availability still very much applies either way. There is another, what I think is more common, factor - people using IT-managed/BitLocker devices. These often have much more controls around running applications and since we had to build our own C++ interface to get this all to work - I am theorizing the QNN C++ library is probably blocked for some devices. That might explain the reason it works in admin mode, but not when run as the user and also why people with a sole-user system on a "non-corporate" device can run it without issue. In this case, there is an "IT acceptance" thing but also the app is not codesigned, which for sure is going to be blocked. So now it is more prudent than previously for us to get that done and pipeline setup. It needed to be done ages ago anyway. Ill make that a task for next week. |
Suspended BitLocker service on local device, launched app as system administrator (confirmed with <net localgroup administrators) on a non-corporate device, but not set 'Run as administrator'. Launched AnythingLLM and still receiving QNN engine failure to run successfully. |
Yes, I'm 100% running R2. |
@AlphaEcho11 From re-reading the comments though it sounds like you installed the app "For all users on this computer" vs "current user" only. 1.7.2-r2 installer forces the install to be current user only and removed the option - so that shouldnt be an option any longer.
This line made me think that. Which should put the QNN LLM executable at a different permission level since it was installed by admin, thus requiring admin to spawn it. I want to try to eliminate as many variables as possible is all |
Just wanted to add I just installed AnythingLLM 30 minutes ago and I'm also running on a Lenovo Slim 7x and have the exact same symptoms/log errors as you (even running as Administrator). |
I forced install on local system to run the .exe under all users, using I was attempting to see if there were any hiccups with installing this version for any/all users on the system. But I couldn't get any replicated issues. The previous installs I had done for v1.7.2-arm had been both under system-wide and single user, and the single user was my personal, administrator profile on the PC. So v1.7.2-arm was still failing QNN engine run, but also had the Boot failure on port 8080, as well as NPU not detected (with X Elite SoC). |
@talynone - what device are you on? Seems like some kind of unicode symbols are attached to Patching to |
As mentioned before the Lenovo Yoga Slim 7x, thanks. |
@talynone Okay |
Thanks! Seems to work now, even in non admin mode (though I ran it in admin mode at first). Backend log attached. @1key you should try the new build too since you also have the Lenovo Slim 7x. |
Using new r3, still not starting in user mode on my Surface Laptop 7. Logs below:
|
@SpontaneousDuck What is your device + Specs?
Means we detect a valid NPU - so everything is good to go.
However means the engine did not boot in a given fixed timeframe (which quite long). This can be increased if your device has less available RAM than others, but I want to confirm that first. One way to test that is to run the Google Drive debugging doc posted in this thread. |
@timothycarambat I have a Surface Laptop 7 w/ X Elite and 32GB of RAM. I posted the results of running the debugging steps below. Same behavior it errors out in use mode but starts in admin mode. It did take quite a while to start in admin mode. The 3b model took about 8 seconds to start and 8b took about 12 seconds (by my rough counting).
|
This is a personal device or a corporate-managed device you have admin perms on? I have an X Elite w 32GB and that is an insanely long boot time. If you wanted to try, does AnythingLLM running as admin result in failure to boot the QNN Engine? |
@timothycarambat It is a corporate managed device w/ Bitlocker that I have admin privileges for. It is pretty light control though and basically just has company login and Bitlocker enforced. Running AnythingLLM as admin does work just fine 😊 |
Obviously not ideal, but at least that is a way to go for the interim while we figure out what is blocking running as current user. Just another data point to work with |
Surface Pro 11 | Snapdragon X Elite X1E-80-100 backend-2025-01-12.log Quite interesting, but still able to run. @timothycarambat if you want me to pull anything else beyond the backend log, just let me know! |
To tag on to @AlphaEcho11 's discovery - bits and pieces of the crash dump from my Dump
|
@snickler What is your RAM availability on the machine?
Would indicate that the model could not be loaded into memory and therefore was throw out and dumped for security reasons that dont apply when running as admin. |
Naturally, I can't reproduce the crash dump when I need to, but still receive the "QNN Engine is offline" when running as a regular user. I have 15GB available out of 32GB. As I was typing out the information for this, I decided to use results
|
@snickler Ah, that is perfect. I know what this is. The underlying When they bump the underlying SDK I will patch it and we should be good. All of this stuff is new so I appreciate the patience. Seriously, thanks for helping grab that procdump - so many times I am left with a very vague explanation of what happened and cant replicate since I dont own every device on Earth haha |
Here's what worked for me
Enjoy NPU usage and snappy performance, private and offline. |
@Genius-Tools I looked into the MSFT deepseek 1.5B Qwen distill they support via VSCode. It is onnxruntime, but the official oss library they use in that plugin have for that does not support QNN So it seems like a private fork of onnxruntime-genai currently so we cannot add our support like MSFT has until they hopefully decide to merge that library support in. Very disappointing. From what I know the deepseek model wont be on the Qualcomm AI hub for a few months unless they go into overdrive or reprioritize it. Which by then something better than deepseek will exist. I wanted to deliver this for everyone without them needing VSCode or something crazy, but blocked at every turn. |
How are you running AnythingLLM?
AnythingLLM desktop app
What happened?
When trying to inference on any QNN model on a Snapdragon X Plus laptop, the issue below occurs.
The logs specifies that the required CPU/NPU is not found:
Starting AnythingLLM and reproducing the error, the full log looks like this:
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered: