evaluation-metrics

Star

Here are 453 public repositories matching this topic...

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Nov 17, 2024
Python

AgentOps-AI / agentops

Star

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Nov 16, 2024
Python

xinshuoweng / AB3DMOT

Star

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

tracking machine-learning real-time computer-vision robotics evaluation evaluation-metrics multi-object-tracking kitti 3d-tracking 3d-multi-object-tracking 2d-mot-evaluation 3d-mot 3d-multi kitti-3d

Updated Apr 3, 2024
Python

huggingface / lighteval

Star

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluation evaluation-metrics evaluation-framework huggingface

Updated Nov 15, 2024
Python

huggingface / evaluation-guidebook

Star

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

machine-learning tutorial evaluation evaluation-metrics guidebook large-language-models llm

Updated Nov 5, 2024

google-research / rliable

Star

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

benchmarking machine-learning google reinforcement-learning rl evaluation-metrics

Updated Aug 12, 2024
Jupyter Notebook

MIND-Lab / OCTIS

Star

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

nlp natural-language-processing hyperparameter-optimization topic-modeling nlp-library bayesian-optimization hyperparameter-tuning latent-dirichlet-allocation evaluation-metrics neural-topic-models latent-semantic-analysis topic-models hyperparameter-search non-negative-matrix-factorization nlproc

Updated Jul 25, 2024
Python

jitsi / jiwer

Star

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

python3 automatic-speech-recognition speech-to-text evaluation-metrics wer word-error-rate

Updated Nov 1, 2024
Python

nekhtiari / image-similarity-measures

Star

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

processing machine-learning image metrics evaluation-metrics p1

Updated Aug 31, 2024
Python

Unbabel / COMET

Star

A Neural Framework for MT Evaluation

nlp machine-learning natural-language-processing machine-translation artificial-intelligence evaluation-metrics

Updated Jul 29, 2024
Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…

python nlp machine-learning natural-language-processing library linguistics computational-linguistics text-processing nlp-library search-algorithms evaluation-metrics folia language-modelling

Updated Sep 14, 2023
Python

AmenRa / ranx

Star

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

python information-retrieval evaluation comparison numba recommender-systems evaluation-metrics metasearch data-fusion score-fusion ranking-metrics information-retrieval-evaluation information-retrieval-metrics rank-fusion

Updated Jul 1, 2024
Python

relari-ai / continuous-eval

Star

Data-Driven Evaluation for LLM-Powered Applications

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Sep 2, 2024
Python

v-iashin / SpecVQGAN

Star

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio video pytorch transformer gan multi-modal evaluation-metrics video-understanding vas video-features vqvae bmvc melgan audio-generation vggsound

Updated Jul 12, 2024
Jupyter Notebook

salesforce / factCC

Star

Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

text-summarization evaluation-metrics

Updated Jul 22, 2023
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Nov 14, 2024
Python

FuxiaoLiu / LRV-Instruction

Star

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

evaluation vision vqa llama object-detection gpt evaluation-metrics iclr multimodal vision-and-language hallucination vicuna gpt-4 foundation-models prompt-engineering chatgpt llava iclr2024

Updated Mar 13, 2024
Python

bheinzerling / pyrouge

Star

A Python wrapper for the ROUGE summarization evaluation package

nlp summarization rouge evaluation-metrics

Updated Feb 10, 2021
Python

clovaai / generative-evaluation-prdc

Star

Code base for the precision, recall, density, and coverage metrics for generative models. ICML 2020.

diversity machine-learning deep-learning evaluation generative-adversarial-network generative-model recall precision evaluation-metrics fidelity icml icml-2020 icml2020

Updated Jan 9, 2023
Python

sharmaroshan / Twitter-Sentiment-Analysis

Star

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text analysis, data analysis and data visualization

nlp machine-learning sentiment-analysis cross-validation eda data-visualization wordcloud classification data-analysis bag-of-words hashtags evaluation-metrics count-vectorizer datacleaning

Updated Nov 3, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation-metrics

Here are 453 public repositories matching this topic...

confident-ai / deepeval

AgentOps-AI / agentops

xinshuoweng / AB3DMOT

huggingface / lighteval

huggingface / evaluation-guidebook

google-research / rliable

MIND-Lab / OCTIS

jitsi / jiwer

nekhtiari / image-similarity-measures

Unbabel / COMET

proycon / pynlpl

AmenRa / ranx

relari-ai / continuous-eval

v-iashin / SpecVQGAN

salesforce / factCC

TonicAI / tonic_validate

FuxiaoLiu / LRV-Instruction

bheinzerling / pyrouge

clovaai / generative-evaluation-prdc

sharmaroshan / Twitter-Sentiment-Analysis

Improve this page

Add this topic to your repo