diff --git a/.gitignore b/.gitignore index dc55a9c..c7a03ed 100755 --- a/.gitignore +++ b/.gitignore @@ -165,7 +165,7 @@ cython_debug/ *.pdf *.svg # *.jpeg -*.png +# *.png *.bmp ### VirtualEnv template diff --git a/README.md b/README.md index 7de1c2c..df5cca1 100644 --- a/README.md +++ b/README.md @@ -249,8 +249,8 @@ If you encounter issues, follow these steps: - _Chain of Thought_ prompting techniques are a linear problem solving approach where each step builds upon the previous one. Google's approach in [arXiv:2201.11903](https://arxiv.org/pdf/2201.11903) is to augment each prompt with an additional example and chain of thought for an associated answer. (See the paper for multiple examples.) - **Dynamic resource allocation and Semantic Filters**: - An immediate improvement to the current approach would be to use dynamically-adjusted parameters. Namely, the number of iterations and number of models used in the algorithm could be adjusted to the input prompt: _e.g._ simple prompts do not require too many resources. For this, a centralized model could be used to decide the complexity of the task, prior to sending the prompt to the other LLMs. - - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. [TBC] - the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method). + - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. These are commonly used in RAG pipelines, and could also be used here with _e.g._ cosine similarity. You can get started with [GCloud's text embeddings](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) -- see [flare-ai-rag](https://github.com/flare-foundation/flare-ai-rag/tree/main) for more details. + - The use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method). - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses. LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_. - **AI Agent Swarm**: - The structure of the reference CL implementation can be changed to adapt _swarm_-type algorithms, where tasks are broken down and distributed among specialized agents for parallel processing. In this case a centralized LLM would act as an orchestrator for managing distribution of tasks -- see _e.g._ [swarms repo](https://github.com/kyegomez/swarms). diff --git a/src/README.md b/src/README.md index f0ab770..749fd5d 100644 --- a/src/README.md +++ b/src/README.md @@ -3,6 +3,19 @@ # Flare AI Consensus +## flare-ai-consensus Pipeline + +The flare-ai-consensus template consists of the following components: + +* **Router:** The primary interface that receives user requests, distributes them to the various AI models, and collects their intermediate responses. +* **Aggregator:** synthesizes multiple model responses into a single, coherent output. +* **Consensus Layer:** Defines logic for the consensus algorithm. The reference implementation is setup in the following steps: + * The initial prompt is sent to a set of models, with additional system instructions. + * Initial responses are aggregated by the Aggregator. + * Improvement rounds follow up where aggregated responses are sent as additional context or system instructions to the models. + +flare-ai-consensus + ## OpenRouter Clients We implement two OpenRouter clients for interacting with the OpenRouter API: a standard sync client and an asynchronous client. diff --git a/src/cl_pipeline.png b/src/cl_pipeline.png new file mode 100644 index 0000000..7fee04d Binary files /dev/null and b/src/cl_pipeline.png differ