Infollama is a Python server that manages a token-protected proxy for Ollama.
Infollama also retrieves and displays in a real time UI useful details about the Ollama server, including available models, running models, file size, RAM usage, and more. It also provides hardware information, particularly GPU and RAM usage.
- Run a proxy to access your Ollama API server, on localhost, LAN and WAN
- Protect your Ollama server with one token by user or usage
- Display usefull details about Ollama server (models, running models, size) and hardware device informations (CPU, GPUS, RAM and VRAM usage).
- Log Ollama API calls in a log file (as an HTTP log file type) with different levels: NEVER, ERROR, INFO, and ALL, including the full JSON prompt request
- Python 3.10 or higher
- Ollama server running on your local machine (See Ollama repository)
- Tested on Linux Ubuntu, Windows 10/11, macOS with Mx Silicon Chip
-
Clone the repository:
git clone https://github.com/toutjavascript/infollama-proxy.git cd infollama-proxy
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
Run the script with the following command:
python proxy.py
Open the browser and navigate to http://localhost:11430/info
to access the Infollama Proxy web UI.
You can modify launch configuration with theses parameters:
usage: proxy.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT] [--cors CORS] [--anonym ANONYM] [--log LOG]
--base_url BASE_URL The base_url of localhost Ollama server (default: http://localhost:11434)
--host HOST The host name for the proxy server (default: 0.0.0.0)
--port PORT The port for the proxy server (default: 11430)
--cors CORS The cors policy for the proxy server (default: *)
--anonym ANONYM Authorize the proxy server to be accessed anonymously without token (default: False)
--log LOG Define the log level that is stored in proxy.log (default: PROMPT, Could be NEVER|ERROR|INFO|PROMPT|ALL)
This repository is under heavy construction. To update the source code from GitHub, open a terminal in the infollama-proxy
folder and launch a pull request:
git pull
Infollama is not only a proxy server but also a powerfull web UI that displays hardware status, like GPU usage and temperatures, memory usage, and other information.
![GPU RAM usage](https://private-user-images.githubusercontent.com/30899600/407420569-ff44cba3-fd4d-4e3d-8956-45da44554d82.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNDA3NDksIm5iZiI6MTczOTI0MDQ0OSwicGF0aCI6Ii8zMDg5OTYwMC80MDc0MjA1NjktZmY0NGNiYTMtZmQ0ZC00ZTNkLTg5NTYtNDVkYTQ0NTU0ZDgyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjExVDAyMjA0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlNDZmZWY5YjY2MTIxNDIwODk5MGI3ZGVjZTcwYzBiMjU2OTA4NWM2MTg0Mjg2OWE3Y2NmYjhmNmE1MTIzNzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Q617Hfi6R0HUBMHHD9Vpw__-SfNh4nJTuAVu5T1TB0c)
You can now use the proxy to chat with your Ollama server. Infollama works as an OpenAI Compatible LLM Server, You must give ths Base URL with port 11430
:
- base_url is now http://localhost:11430/v1
Do not forget to provide a valid token, starting with pro_
, defined in users.conf
file:
- api_key = "pro_xxxxxxxxxxxxxx"
Token definitions are set in the users.conf
file. During first launch, the users.conf
is created with users.default.conf
file. This text file lists the tokens line by line with this format:
user_type:user_name:token
user_type
can be user
or admin
. An admin
user can access more APIs (like, pull, delete, copy, ...) and can view the full log file in the web UI.
user_name
is a simple string of text
token
is a string that needs to starts with pro_
Parameters are separated with :
If --anonym
parameter is set to something at starts, the users.conf
is ignored and all the accesses are authorised. User name is set to openbar
.
You can log every prompt that are sent to server. Note that responses are not logged to preserve privacy and disk size. This proxy app has several levels of logging:
NEVER
: No logs at all.ERROR
: Log only error and not authorised requests.INFO
: Log usefull access (not api/ps, api/tags, ...), excluding promptsPROMPT
: Log useful access (not api/ps, api/tags, ...), including promptsALL
: Log every event, including prompts
By default, the level is set to PROMPT
.
Log file uses Apache server log format. For example, one line with PROMPT
level looks like this:
127.0.0.1 - user1 [16/Jan/2025:15:53:10] "STREAM /v1/chat/completions HTTP/1.1" 200 {'model': 'falcon3:1b', 'messages': [{'role': 'system', 'content': "You are a helpful web developer assistant and you obey to user's commands"}, {'role': 'user', 'content': ' Give me 10 python web servers. Tell me cons and pros. Conclude by choosing the easiest one. Do not write code.'}], 'stream': True, 'max_tokens': 1048}
Correcting bug and user issues is priority.
- Add buttons to start and stop models
- Add dark/light display mode
- Secure token storage with HTTPOnly cookie or browser keychain if available
- Add a GPU database to compare LLM performances
- Create a more efficient installation process (docker and .bat)
- Add a simple API to that returns the current usage from server (running models, hardware details, Free available VRAM, ...)
- Add a web UI to view or export logs (by user or full log if admin is connected)
- Add integrated support for tunneling to web
- Add a fallback system to access an other LLM provider if the current one is down
- Add an easy LLM speed benchmark
- Add a log file size checker
Beacause I needed two functionnalities :
- Access to Ollama server on LAN and over the web. As Ollama is not protected by token access, I need to manage it in a simple way.
- Realtime view of Ollama server status
If you see this error message Error get_device_info(): no module name 'distutils'
, try to update your install with:
pip install -U pip setuptools wheel
Fully tested with solutions like
-
nGrok
ngrok http http://localhost:11430
-
bore.pub (but no SSL support)
bore local 11430 --to bore.pub
IF YOU OPEN INFOLLAMA OVER THE WEB, DO NOT FORGET TO CHANGE THE DEFAULT TOKENS IN users.conf
FILE
With a web access, the diagram shows you acces from outside your LAN
We welcome contributions from the community. Please feel free to open an issue or a pull request.