Add OpenWeb UI instructions #6

mecattaf · 2025-02-19T16:36:38Z

howcase full workflow with easy-to-replicate steps to run OpenArc model inference locally with OpenWeb UI front-end.
Prepare dockerfile for testing
Package OpenArc for Linux (happy to take over Fedora packaging)
Write blog post for r/localllama community
Respond to ollama open issues and discord as several users asked for NPU compatibility:

SearchSavior · 2025-02-19T17:52:33Z

This an excellent set of resources! We don't need IPEX-LLM to make this work; it's out of scope for now. However being a project which supports all things intel is a long term goal and I am very grateful you are contributing.

I will add some docs laterOpenVINO natively supports NPU albiet with oneapi toolkit dependencies. So this issue requires a lot of test code to evaluate different high level performance hints exposed by ov genai for each different pipeline class, then to decide how to build an api to make enabling set-and-forget it style optimization testing

Also, I drafted an implementation of a proxy specifically for openwebui that I think will be pretty close to what's used for LM Studio so we don't need to mess around with anything lower level than that. Leveraging other openwebui features is TBD. In this way adding an openai proxy is more about integrating community tooling than changing OpenArc to use what tools others have discovered work when hacked together

From a high level I want OpenArc to be self contained to reflect the design intention of realizing performance gains simply not possible with llama.cpp and ipex, mostly due to how OpenVINO runtime manages memory. Im still building my understanding of how these work. Some strategies include baked in handling of NUMA architecture and expimental datatypes which offer better accuracy/speed tradeoffs for production deployments across different generations of hardware most PCs have. That's why you don't see lower than INT4 which isn't even the lowest. Support was recently merged for two fp8 datatype, as well as mxfp4 and nf4 which are only really in the literature and benefit the latest chips like yours and the recent renamed (ugh) xeon. That's the utilities notebook contains the device query snippets, they provide some information on what's appropriate to choose for datatypes and the performance hints. Without setting arguments the runtime automatically selects appropriate settings based on- I think- what's stored the model xml, at least for decoder only compared against hardware 'facts'

Another example- there are hints that let you allocate between performance and efficiency cores and it works from python... might even work with heterogeneous between cpu and gpu, npu I'm not sure yet. It works without pinning or assigning cores through docker compose... just as parameters exposed in ov_config. It gets tricky here though; if it isn't exposed in that stub file which defines the python api layer over cpp then they don't recommend using it as the openvino code which instantiates core isn't tooled for generative ai. It will work but performance might be terrible? Not sure.

Basically Ollama integrations leave performance on the table and THATs what's out of scope. When I was a physics major it was a running joke that engineers would plug and chug

mecattaf · 2025-02-19T17:59:41Z

You might find this repo interesting: https://github.com/xanderlent/intel-npu-driver-rpm
There is a discussion on the dependencies that are needed to package intel NPU firmware/drivers for Fedora. Also relevant for other distributions of course - it looks like Intel is prioritizing Ubuntu here but nothing we can't fix. I would be curious to see what you end up deciding as in-scope vs out-of-scope here.

BTW Feel free to create a shared discord chat for this project, much better than using those random Gh issues

SearchSavior · 2025-02-20T01:44:22Z

https://discord.gg/7KaKnjtF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenWeb UI instructions #6

Add OpenWeb UI instructions #6

mecattaf commented Feb 19, 2025

SearchSavior commented Feb 19, 2025 •

edited

Loading

mecattaf commented Feb 19, 2025

SearchSavior commented Feb 20, 2025

Add OpenWeb UI instructions #6

Add OpenWeb UI instructions #6

Comments

mecattaf commented Feb 19, 2025

SearchSavior commented Feb 19, 2025 • edited Loading

mecattaf commented Feb 19, 2025

SearchSavior commented Feb 20, 2025

SearchSavior commented Feb 19, 2025 •

edited

Loading