Thanks for trying DecisionBox! Extra kudos if you could contribute back to the project, or provide your feedback here: https://github.com/fmops/decisionbox/discussions
If you want to dive right in to SDK install and config, skip to Getting Started below.
Looking for spcification of RESTful APIs? See https://decisionbox.blueteam.ai/swaggerui
If you'd like to see a quick video of the 'why' of the product, or send it to a friend, please check out: https://youtu.be/ErulvNvwKHs
The purpose of this product is to allow you to radically improve your LLM app so it gets smarter with more data.
The Pain Point
Building high-quality LLM applications often hinges on the accuracy of critical decision points within your app. While OpenAI function calls may suffice for quick prototypes, achieving production-level accuracy often necessitates laborious prompt engineering, which yields diminishing returns. Additionally, not every development team has the luxury of a dedicated data science team to support every app they build. So we set out to see if we could simplify setting up a robust Machine Learning environment quickly.
The DecisionBox SDK Success Metric
We're hoping that DecisionBox enables you to demonstrate that your app is making high-accuracy decisions within an LLM-based app, and you have evidence to demonstrate that it's continuously improving with more data. The SDK is meant to streamline the data science process into a simple API, so that extensive data science expertise is not required.
- Install the DecisionBox SDK and create your first Decision site. You'll be asked to give this site a name, and a default 'classifier' will be created for you, called "passthrough." This is actually a passthrough to your hosted LLM provider, so you'll let your app run and collect some data using this method, before replacing this line of code with a local classifier. The difference is that all generated responses will recorded, so you can label them, and this will reveal your baseline accuracy to improve upon.
- Under your Decision Site, create a new Classifier with the appropriate outputs (choices). You'll give this classifier a name and make this "default" rather than the automatically-created passthrough classifier.
- Replace Existing Code: Replace the code in your app which is currently using OpenAI function calls or structured outputs with the DecisionBox API call to your decision site, such that the decision site's default classifier will be invoked.
- As your app runs, you'll be collecting responses associated with the invoked classifier. These are recorded 'decisions' which you can label, if you find they can be improved. Remember, you don't need to label a ton of data to see large improvements in accuracy!
- With at least 5-10 decisions labeled, you'll have the opportunity to "train" your task-specific classifier, and promote it to production.
- After letting the app run a bit more with the newly promoted classifier, you should see improvements in accuracy. You may want to share these accuracy metrics with business stakeholders to demonstrate that your app is getting smarter with more data.
Using docker:
export SECRET_KEY_BASE=$(openssl rand -hex 32)
docker pull ghcr.io/fmops/decisionbox:latest
docker run \
--rm \
-p 4000:4000 \
--env DATABASE_PATH=/app/db/db.sqlite \
--env SECRET_KEY_BASE=$SECRET_KEY_BASE \
--env PHX_HOST=localhost \
ghcr.io/fmops/decisionbox:latest \
sh -c "/app/bin/migrate && /app/bin/server"
In a web browser: http://localhost:4000
Explanation:
- Port maps
4000
on the host to4000
in the container - Volume maps
./db
(sqlite DB) and./checkpoints
(model checkpoints) - Runs idempotent migrations and starts server
The python examples in examples/
show how you can invoke a decision site and submit data.
Dependencies are managed with flake.nix
and nix-direnv
.
Set environment variables in .envrc
. See .envrc.example
for what variables to set.
To start your Phoenix server:
- Run
mix setup
to install and setup dependencies - Start Phoenix endpoint with
mix phx.server
or inside IEx withiex -S mix phx.server
- Run
mix run priv/repo/seeds.exs
to seed some fixture data in order to kickstart development
Now you can visit localhost:4000
from your browser.
The development environment doesn't support cuda on mac
You need at least 16 GB of RAM to train the model seeded in dev mode
EXLA NIF fails to load with SELinux
execstack -c _build/dev/lib/exla/priv/libexla.so
Switching between CPU/GPU
# export XLA_TARGET=cuda12
export XLA_TARGET=cpu
mix deps.clean xla exla && mix deps.get
Ensure db directory is writable so that you can persist data created when running the app via docker
sudo chown your-user:your-user db
chmod 777 db