This project leverages large language models (LLMs) to evaluate biomedical question-answering datasets. The primary dataset used is MEDIQA. The goal is to provide a robust evaluation framework for biomedical QA systems.
/llm-as-a-evaluator
├── notebooks
│ ├── data_preprocessing.py
│ └── results.ipynb
├── src
│ └── app.py
├── README.md
├── requirements.txt
- Python 3.8+
- pip
-
Clone the repository:
git clone https://github.com/yourusername/llm-as-a-evaluator.git cd llm-as-a-evaluator
-
Install the required packages:
pip install -r requirements.txt
-
Download the MEDIQA dataset and place it in the
data/mediqa
directory.
To use the LLMs, you need to obtain an API key from Groq. Groq offers free API keys. Set the API key as an environment variable or in a .env
file.
- Visit the Groq Console.
- Sign in or create an account.
- Navigate to the API Keys section.
- Generate a new API key and copy it.
- Set the API key as an environment variable:
export GROQ_API_KEY=your_groq_api_key_here
- Alternatively, create a
.env
file in the project root and add the API key:GROQ_API_KEY=your_groq_api_key_here
- Data Preprocessing: Scripts to preprocess the MEDIQA dataset.
- Model Evaluation: Tools to evaluate QA models using LLMs.
- Analysis Notebooks: Jupyter notebooks for detailed analysis.
This project is licensed under the MIT License - see the MIT License file for details.
For more details on the MEDIQA dataset, visit the official repository.