Skip to content

Commit

Permalink
Updated EVALUATION.md
Browse files Browse the repository at this point in the history
  • Loading branch information
antoninoLorenzo committed Jun 19, 2024
1 parent abbbf2a commit 3598d9f
Showing 1 changed file with 40 additions and 0 deletions.
40 changes: 40 additions & 0 deletions EVALUTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
**Table of Contents**
1. [Agent Evaluation](#agent-evaluation)
2. [RAG Evaluation](#rag-evaluation)

# Agent Evaluation

**TODO**

# RAG Evaluation

## Introduction

Our objective is to monitor and improve the RAG pipeline for **AI-OPS**, that requires context-specific data from
*Cybersecurity* and *Penetration Testing* fields; also we want the evaluation process to be as automated as possible.

The evaluation workflow is split in two steps:

1. **Dataset Generation** ([dataset_generation.ipynb](./test/benchmarks/rag/dataset_generation.ipynb)):
uses Ollama and the data that is ingested into Qdrant (RAG Vector Database) to generate *question* and *ground truth*
(Q&A dataset).

2. **Evaluation** ([evaluation.py](./test/benchmarks/rag/evaluation.py)):
builds the RAG pipeline with the same used to generate the synthetic Q&A dataset, leverages the pipeline to provide
an *answer* to the questions (given *contex*), then performs evaluation of the full evaluation dataset using LLM as a
judge; for performance reasons the evaluation is performed using HuggingFace Inference API.

## Results

### Context Precision

**TODO:** *describe the metric and the prompts used*

![Context Precision Plot](data/rag_eval/results/plots/context_precision.png)

### Context Recall

**TODO:** *describe the metric and the prompts used*


![Context Precision Plot](data/rag_eval/results/plots/context_recall.png)

0 comments on commit 3598d9f

Please sign in to comment.