HumanModelComparison

What do other LLMs think about the quality of other LLMs when compared to humans?

What do Models Think About Other Models?

About the Dataset

SEAHORSE is a dataset introduced for multilingual, multifaceted summarization evaluation. The dataset consists of 96K summaries with human ratings along six quality dimensions: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness[11†source].

Project Idea

The main idea of this project is to utilize Language Learning Models (LLMs) to evaluate the quality of summaries generated by other machine learning models. These evaluations will be compared to the human ratings present in the SEAHORSE dataset, and the consistency between human and model evaluations will be analyzed. The parameters being considered are:

Comprehensibility: The summary can be read and understood by the rater.
Repetition: The summary is free of unnecessarily repeated information.
Grammar: The summary is grammatically correct.
Attribution: All of the information provided by the summary is fully attributable to the source article.
Main Ideas: The summary captures the main idea(s) of the source article.
Conciseness: The summary concisely represents the information in the source article.

Objective

The objective of this project is to evaluate how well LLMs align with human judgment in evaluating the quality of summaries generated by other models, providing insight into the reliability and limitations of LLMs as evaluators of summarization quality. Currently the objective used is MSE between human vs model generated ratings.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
created_data_wikilingua.csv		created_data_wikilingua.csv
dataset_creation.ipynb		dataset_creation.ipynb
run_comparison.ipynb		run_comparison.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HumanModelComparison

What do Models Think About Other Models?

About the Dataset

Project Idea

Objective

About

Releases

Packages

Languages

aashay96/HumanModelComparison

Folders and files

Latest commit

History

Repository files navigation

HumanModelComparison

What do Models Think About Other Models?

About the Dataset

Project Idea

Objective

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages