picoLLM Inference Engine Demos for iOS

Made in Vancouver, Canada by Picovoice

picoLLM Inference Engine

picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models. picoLLM Inference Engine is:

Accurate; picoLLM Compression improves GPTQ by significant margins
Private; LLM inference runs 100% locally.
Cross-Platform
Runs on CPU and GPU
Free for open-weight models

Compatibility

Swift 5
iOS 16.0+

Models

picoLLM Inference Engine supports the following open-weight models. The models are available for download on the Picovoice Console.

Gemma
- gemma-2b
- gemma-2b-it
- gemma-7b
- gemma-7b-it
Llama-2
- llama-2-7b
- llama-2-7b-chat
- llama-2-13b
- llama-2-13b-chat
- llama-2-70b
- llama-2-70b-chat
Llama-3
- llama-3-8b
- llama-3-8b-instruct
- llama-3-70b
- llama-3-70b-instruct
Llama-3.2
- llama3.2-1b-instruct
- llama3.2-3b-instruct
Mistral
- mistral-7b-v0.1
- mistral-7b-instruct-v0.1
- mistral-7b-instruct-v0.2
Mixtral
- mixtral-8x7b-v0.1
- mixtral-8x7b-instruct-v0.1
Phi-2
- phi2

AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. You must keep your AccessKey secret. You will need internet connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% offline and completely free for open-weight models. Everyone who signs up for Picovoice Console receives a unique AccessKey.

Setup

Download your desired model file from the Picovoice Console. If you do not download the file directly from your iOS device, you will need to upload it to the device to use it with the demos. To upload the model, use AirDrop or connect your iOS device to your computer via USB or launch a simulator. Copy your model file to the device.

Usage

There are two demos available: completion and chat. The completion demo accepts a prompt and a set of optional parameters and generates a single completion. It can run all models, whether instruction-tuned or not. The chat demo can run instruction-tuned (chat) models such as llama-3-8b-instruct, phi2, etc. The chat demo enables a back-and-forth conversation with the LLM, similar to ChatGPT.

Completion Demo

Go to the Completion directory. Then run:

pod install

Open the PicoLLMCompletionDemo.xcodeproj in XCode
Replace "${YOUR_ACCESS_KEY_HERE}" in the file VieModel.swift with your AccessKey obtained from Picovoice Console.
Build and run the project on your device.
Press the Load Model button and load the model file from your device's storage.
Enter a prompt that you want a completion for, e.g. "roses are red".
Experiment with the optional parameters by pressing the menu button on the top left.

Chat Demo

Go to the Chat directory. Then run:

pod install

Open the PicoLLMChatDemo.xcodeproj in XCode
Replace let ACCESS_KEY = "${YOUR_ACCESS_KEY_HERE}" in the file VieModel.swift with your AccessKey obtained from Picovoice Console.
Build and run the project on your device.
Press the Load Model button and load the model file from your device's storage.
Chat back and forth with the LLM using the text box at the bottom.
Use the clear button in the lower right hand of the text box to reset the chat and start a new one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

picoLLM Inference Engine Demos for iOS