Skip to content

LLM as an Evaluator is a project that utilizes large language models (LLMs) to assess biomedical question-answering (QA) systems, specifically focusing on the MEDIQA dataset. It provides an evaluation framework for biomedical QA models by preprocessing data, running evaluations, and offering analysis tools in Jupyter notebooks. The project supports

Notifications You must be signed in to change notification settings

aaqib-ahmed-nazir/llm-as-a-evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM as an Evaluator

Overview

This project leverages large language models (LLMs) to evaluate biomedical question-answering datasets. The primary dataset used is MEDIQA. The goal is to provide a robust evaluation framework for biomedical QA systems.

File Structure

/llm-as-a-evaluator
├── notebooks
│   ├── data_preprocessing.py
│   └── results.ipynb
├── src
│   └── app.py
├── README.md
├── requirements.txt

Setup

Prerequisites

  • Python 3.8+
  • pip

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/llm-as-a-evaluator.git
    cd llm-as-a-evaluator
  2. Install the required packages:

    pip install -r requirements.txt
  3. Download the MEDIQA dataset and place it in the data/mediqa directory.

API Key

To use the LLMs, you need to obtain an API key from Groq. Groq offers free API keys. Set the API key as an environment variable or in a .env file.

Obtaining Groq API Key

  1. Visit the Groq Console.
  2. Sign in or create an account.
  3. Navigate to the API Keys section.
  4. Generate a new API key and copy it.
  5. Set the API key as an environment variable:
    export GROQ_API_KEY=your_groq_api_key_here
  6. Alternatively, create a .env file in the project root and add the API key:
    GROQ_API_KEY=your_groq_api_key_here
    

Features

  • Data Preprocessing: Scripts to preprocess the MEDIQA dataset.
  • Model Evaluation: Tools to evaluate QA models using LLMs.
  • Analysis Notebooks: Jupyter notebooks for detailed analysis.

License

This project is licensed under the MIT License - see the MIT License file for details.

Additional Information

For more details on the MEDIQA dataset, visit the official repository.

About

LLM as an Evaluator is a project that utilizes large language models (LLMs) to assess biomedical question-answering (QA) systems, specifically focusing on the MEDIQA dataset. It provides an evaluation framework for biomedical QA models by preprocessing data, running evaluations, and offering analysis tools in Jupyter notebooks. The project supports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published