Flare AI SDK for Consensus Learning.
-
Consensus Learning Implementation A Python implementation of single-node, multi-model Consensus Learning (CL). CL is a decentralized ensemble learning paradigm introduced in arXiv:2402.16157, which is now being generalized to large language models (LLMs).
-
300+ LLM Support Leverages OpenRouter to access over 300 models via a unified interface.
-
Iterative Feedback Loop Employs an aggregation process where multiple LLM outputs are refined over configurable iterations.
-
Modular & Configurable Easily customize models, conversation prompts, and aggregation parameters through a simple JSON configuration.
Before getting started, ensure you have:
- A Python 3.12 environment.
- uv installed for dependency management.
- Docker
- An OpenRouter API Key.
You can deploy Flare AI Consensus using Docker or set up the backend and frontend manually.
- Prepare the Environment File:
Rename
.env.example
to.env
and update the variables accordingly. (e.g. your OpenRouter API Key)
-
Build the Docker Image:
docker build -t flare-ai-consensus .
-
Run the Docker Container:
docker run -p 80:80 -it --env-file .env flare-ai-consensus
-
Access the Frontend: Open your browser and navigate to http://localhost:80/docs to interact with the Chat UI.
Flare AI Consensus is a Python-based backend. Follow these steps for manual setup:
-
Install Dependencies: Use uv to install backend dependencies:
uv sync --all-extras
Verify your available credits and get all supported models with:
uv run python -m tests.credits uv run python -m tests.models
-
Configure CL instance: Configure your CL instance in
src/input.json
, including:- Models: Specify each LLM's OpenRouter
id
, along with parameters likemax_tokens
andtemperature
. - Aggregator Settings: Define the aggregator model, additional context, aggregation prompt, and specify how aggregated responses are handled.
- Iterations: Determine the number of iterations for the feedback loop.
- Models: Specify each LLM's OpenRouter
-
Start the Backend: The backend runs by default on
0.0.0.0:8080
:uv run start-backend
For granular testing, use the following endpoints:
-
Completion Endpoint (Non-Conversational):
uv run python -m tests.completion --prompt "Who is Ash Ketchum?" --model "google/learnlm-1.5-pro-experimental:free"
-
Chat Completion Endpoint (Conversational):
uv run python -m tests.chat_completion --mode default
Tip: In interactive mode, type
exit
to quit.
src/flare_ai_consensus/
├── attestation/ # TEE attestation implementation
│ ├── simulated_token.txt
│ ├── vtpm_attestation.py
│ └── vtpm_validation.py
├── api/ # API layer
│ ├── middleware/ # Request/response middleware
│ └── routes/ # API endpoint definitions
├── consensus/ # Core consensus learning
│ ├── aggregator.py # Response aggregation
│ └── consensus.py # Main CL implementation
├── router/ # API routing and model access
│ ├── base_router.py # Base routing interface
│ └── openrouter.py # OpenRouter implementation
├── utils/ # Utility functions
│ ├── file_utils.py # File operations
│ └── parser_utils.py # Input parsing
├── input.json # Configuration file
├── main.py # Application entry
└── settings.py # Environment settings
Deploy on a Confidential Space using AMD SEV.
-
Google Cloud Platform Account: Access to the
verifiable-ai-hackathon
project is required. -
OpenRouter API Key: Ensure your OpenRouter API key is in your
.env
. -
gcloud CLI: Install and authenticate the gcloud CLI.
-
Set Environment Variables: Update your
.env
file with:TEE_IMAGE_REFERENCE=ghcr.io/flare-research/flare-ai-consensus:main # Replace with your repo build image INSTANCE_NAME=<PROJECT_NAME-TEAM_NAME>
-
Load Environment Variables:
source .env
Reminder: Run the above command in every new shell session. On Windows, we recommend using git BASH to access commands like
source
. -
Verify the Setup:
echo $TEE_IMAGE_REFERENCE # Expected output: Your repo build image
Run the following command:
gcloud compute instances create $INSTANCE_NAME \
--project=verifiable-ai-hackathon \
--zone=us-central1-c \
--machine-type=n2d-standard-2 \
--network-interface=network-tier=PREMIUM,nic-type=GVNIC,stack-type=IPV4_ONLY,subnet=default \
--metadata=tee-image-reference=$TEE_IMAGE_REFERENCE,\
tee-container-log-redirect=true,\
tee-env-OPEN_ROUTER_API_KEY=$OPEN_ROUTER_API_KEY,\
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--service-account=confidential-sa@verifiable-ai-hackathon.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--min-cpu-platform="AMD Milan" \
--tags=flare-ai,http-server,https-server \
--create-disk=auto-delete=yes,\
boot=yes,\
device-name=$INSTANCE_NAME,\
image=projects/confidential-space-images/global/images/confidential-space-debug-250100,\
mode=rw,\
size=11,\
type=pd-standard \
--shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any \
--confidential-compute-type=SEV
-
After deployment, you should see an output similar to:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS consensus-team1 us-central1-b n2d-standard-2 10.128.0.18 34.41.127.200 RUNNING
-
It may take a few minutes for Confidential Space to complete startup checks. You can monitor progress via the GCP Console logs. Click on Compute Engine → VM Instances (in the sidebar) → Select your instance → Serial port 1 (console).
When you see a message like:
INFO: Uvicorn running on http://0.0.0.0:80 (Press CTRL+C to quit)
the container is ready. Navigate to the external IP of the instance (visible in the VM Instances page) to access the docs (
<IP>:80/docs
).
If you encounter issues, follow these steps:
-
Check Logs:
gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon
-
Verify API Key(s): Ensure that all API Keys are set correctly (e.g.
OPEN_ROUTER_API_KEY
). -
Check Firewall Settings: Confirm that your instance is publicly accessible on port
80
.
- Security & TEE Integration:
- Ensure execution within a Trusted Execution Environment (TEE) to maintain confidentiality and integrity.
- Factual Correctness:
- In line with the main theme of the hackathon, one important aspect of the outputs generated by the LLMs is their accuracy. In this regard, producing sources/citations with the answers would lead to higher trust in the setup. Sample prompts that can be used for this purpose can be found in the appendices of arXiv:2305.14627, or in James' Coffee Blog.
- Note: only certain models may be suitable for this purpose, as references generated by LLMs are often inaccurate or not even real!
- Prompt Engineering:
- Our approach is very similar to the Mixture-of-Agents (MoA) introduced in arXiv:2406.04692, which uses iterative aggregations of model responses. Ther github repository does include other examples of prompts that can be used for additional context for the LLMs.
- New iterations of the consensus learning algorithm could have different prompts for improving the previous responses. In this regard, the few-shot prompting techniques introduced by OpenAI in arXiv:2005.14165 work by providing models with a few examples of similar queries and responses in addition to the initial prompt. (See also previous work by Radford et al..)
- Chain of Thought prompting techniques are a linear problem solving approach where each step builds upon the previous one. Google's approach in arXiv:2201.11903 is to augment each prompt with an additional example and chain of thought for an associated answer. (See the paper for multiple examples.)
- Dynamic resource allocation and Semantic Filters:
- An immediate improvement to the current approach would be to use dynamically-adjusted parameters. Namely, the number of iterations and number of models used in the algorithm could be adjusted to the input prompt: e.g. simple prompts do not require too many resources. For this, a centralized model could be used to decide the complexity of the task, prior to sending the prompt to the other LLMs.
- On a similar note, the number of iterations for making progress could adjusted according to how different are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on embeddings. These are commonly used in RAG pipelines, and could also be used here with e.g. cosine similarity. You can get started with GCloud's text embeddings -- see flare-ai-rag for more details.
- The use of LLM-as-a-Judge for evaluating other LLM outputs has shown good progress -- see also this Confident AI blogpost.
- In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering bad responses. LLM-Blender, for instance, introduced in arXiv:2306.02561, uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a cross-attention encoder.
- AI Agent Swarm:
- The structure of the reference CL implementation can be changed to adapt swarm-type algorithms, where tasks are broken down and distributed among specialized agents for parallel processing. In this case a centralized LLM would act as an orchestrator for managing distribution of tasks -- see e.g. swarms repo.