define "evaluator" interface #9

keighrim · 2023-05-11T03:01:27Z

(subtask of #3)

We'd like to define a minimal, but concrete behavior of the class of "evaluator". Some features are also discussed in clamsproject/aapb-annotations#2 (comment). At the very minimum, an "evaluator" should be able

take a batch of gold and a batch predictions and return a single HTML file with the evaluation result
take batches of gold and batches of predictions and return a single HTML file with all the evaluation results and aggregated result.

Gold files are freely accessible from the https://github.com/clamsproject/aapb-annotations repository, but predictions files almost always need to be generated on demand, and in many cases (vision, audio apps) generating predictions will take hours, if not days, even with a small size batch. But running CLAMS pipelines, waiting for the generation for predictions (MMIF), and finally obtaining those MMIF files should not be responsibility of evaluators, but instead the evaluation "runner" or "invoker" should take charge of obtaining all golds and preds files before an evaluator runs.

clams-bot added this to infra May 11, 2023

github-project-automation bot moved this to Todo in infra May 11, 2023

keighrim mentioned this issue May 11, 2023

evaluation process prototype #3

Open

3 tasks

keighrim added this to the eval-v1 milestone Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

define "evaluator" interface #9

define "evaluator" interface #9

keighrim commented May 11, 2023

define "evaluator" interface #9

define "evaluator" interface #9

Comments

keighrim commented May 11, 2023