How to get a minimal model running on HA device #90

Erudition · 2024-03-19T06:19:04Z

Erudition
Mar 19, 2024

I followed instructions for installing the Addon, hoping it contained everything that would be in the manual instructions. I'm trying to run on Home Assistant Green, which should be at least as powerful as a Pi, but still no GPU so cpp models are probably the only realistic choice.

When I loaded the addon, the Web UI shows up, but there are no models. So I found the manual install instructions to load one from the /dist folder in this repo (though the docs incorrectly link to /docs/dist, probably a relative URL path mistake) which I have to download and add to addon_configs (why not include it in the addon?).

But then I still don't have it in the Web UI, that needs models in a different folder (though putting the model there and not knowing what settings to use, there was errors and a stack trace in the web ui when attempting to load it).

So I guess I have to set up the integration as well, the custom component from HACS. There I get the "Select Backend" prompt and based on the docs, of the 6 options I'd just choose "Llama.cpp (HuggingFace)". Since it's a local file I'd think "Llama.cpp (local model)" would make more sense (since I have it locally, and didn't get it from HuggingFace), but when typing in the name of the model file there, it can't find it.

After that I'm prompted "please configure llama.cpp for the model" with a bunch of options about quantization. I assume this information needs to be correctly set to the specifics of the downloaded file? Why can't it detect it automatically? I have no idea what to put here but the field "HuggingFace Model" is set to TheBloke/phi-2-GGUF which leads me to believe this will download a remote model from huggingface, not use the local one - though the docs say to go this path by default.

No matter which Quantization I choose, however, upon submitting it hangs for a long while and then "Unknown Error Occured".

I'm also thinking that this model loading scheme wouldn't use the addon at all, making the addon pointless, but that's not how the docs guided me, so idk. The component setup path for using text-generation-webui says "for a remote instance" in the docs, as in, specifically not a local one, so I guess I wouldn't use that. But I tried it, and couldn't get past the config screen where I needed a bunch of things the addon doesn't mention, like an OpenAI Api key and the model name.

When I tried loading the provided model in the addon web ui, and used the "llama.cpp" model loader, I got:

01:27:54-619818 INFO     Loading                                                
                         "llama_cpp_python-0.2.50-cp312-cp312-musllinux_1_2_aarc
                         h64.whl"                                               
01:27:54-819081 INFO     llama.cpp weights detected:                            
                         "/config/models/llama_cpp_python-0.2.50-cp312-cp312-mus
                         llinux_1_2_aarch64.whl"                                
gguf_init_from_file: invalid magic characters 'PK��'
llama_model_load: error loading model: llama_model_loader: failed to load model from /config/models/llama_cpp_python-0.2.50-cp312-cp312-musllinux_1_2_aarch64.whl

llama_load_model_from_file: failed to load model
01:27:54-851627 ERROR    Failed to load the model.                              
Traceback (most recent call last):
  File "/app/modules/ui_model_menu.py", line 242, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
  File "/app/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
  File "/app/modules/models.py", line 250, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
  File "/app/modules/llamacpp_model.py", line 102, in from_pretrained
    result.model = Llama(**params)
  File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 314, in __init__
    self._model = _LlamaModel(
  File "/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /config/models/llama_cpp_python-0.2.50-cp312-cp312-musllinux_1_2_aarch64.whl

Exception ignored in: <function LlamaCppModel.__del__ at 0xffff47cc9510>
Traceback (most recent call last):
  File "/app/modules/llamacpp_model.py", line 58, in __del__
    del self.model
AttributeError: model

What's the simplest possible setup to get this thing running?

acon96 · 2024-03-21T01:45:27Z

acon96
Mar 21, 2024
Maintainer

Yeah I think the Quickstart guide needs another revamp into an actual guide instead of a pile of different setup options. For now I would recommend using text-generation-webui as the backend along with the addon. I should be able to provide some better instructions in the next few days.

2 replies

acon96 Mar 24, 2024
Maintainer

@Erudition I've updated the quickstart guide: https://github.com/acon96/home-llm/blob/develop/docs/Setup.md

Erudition Apr 23, 2024
Author

Thank you! I notice that the new setup guide does not mention the text-generation-webui that you just recommended.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a minimal model running on HA device #90

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to get a minimal model running on HA device #90

Erudition Mar 19, 2024

Replies: 1 comment · 2 replies

acon96 Mar 21, 2024 Maintainer

acon96 Mar 24, 2024 Maintainer

Erudition Apr 23, 2024 Author

Erudition
Mar 19, 2024

Replies: 1 comment 2 replies

acon96
Mar 21, 2024
Maintainer

acon96 Mar 24, 2024
Maintainer

Erudition Apr 23, 2024
Author