-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JS] "Context was not provided" when using eval dataset which includes context #1603
Comments
I understand that
|
@ssbushi please confirm this is a bug and if so mark as P0 to fix for GA. |
This is not a bug. Context is extracted automatically. Right now, only output of You have the option to use custom extractors https://firebase.google.com/docs/genkit/evaluation?#custom_extractors. But IIUC this functionality might not work in some JS module systems. Based on the example above, you might want to use |
Ahhh I see. It sounds like evals are really designed to work specifically for RAG flows, because then they have retrieved information which can be used to ground both the output and the evaluation process. Re "reference" - that makes sense, but I'd love to use a built-in evaluator rather than going through the pain of setting up a custom evaluator. I've already done the custom thing for my outer flow, but I'd like inner flows to have evals too, ideally using built-in Genkit infra. I think it would be really helpful to have better evaluation support for flows without retrievers because:
Maybe part of the issue here is that there isn't much documentation on setting up custom evaluators without making them a plugin. I guess it's not too hard to do, but it doesn't seem like a well-lit path and I found it a bit confusing (e.g. when/where should I register my evaluator? How do I pass structured data in Reference? how do I report results?). |
@jacobsimionato, thank you very much for the feedback. I will convert our conversation into issues in this repo and track progress. I want to clarify that evals (as a feature) are not specifically made for RAG. I understand why you would come to this conclusion -- the built-in evaluators are all RAG evaluators and some of them even require "context" to be passed in while ignoring "reference". This definitely sounds RAG focused (and they are!) but these evaluators are not intended to be the only ones you should be using. They were implemented as a starting point for Genkit evals for our initial launch, encouraging users to either install 1P or 3P plugins that use evaluators if Like you mentioned, you could also define custom evaluators. They don't have to be registered with plugins. I can see why this is confusing and that an obvious gap exists. I am working on a proposal that lets users use custom extractors again, so that they can use "context" for evaluation, without retrievers. Action: Provide more evaluators out of the box that are not specific to RAG (e.g., exactMatch, criteria-based, etc). Some mix of evaluators that use "reference" and "output" would be ideal.
Yes, the documentation is also unclear and in some places outdated. Unfortunately, the DevSite and Github MD files are not always in sync so even if we update the documentation on Github, it stays stale for a while on DevSite till it is republished. We are working on improving this workflow. I will turn this into an action item to ensure that the documentation accurately represents the scope of Action Review documentation, ensure that evals are not portrayed as RAG specific, or focused on |
Describe the bug
I'm trying to use the Faithfulness evaluator, but I'm getting the error "Context was not provided"
To Reproduce
Commands:
eval.json:
src/index.ts
story.prompt
Actual behavior
Expected behavior
Expected the eval to run successfully.
Runtime (please complete the following information):
** Node version
v22.11.0
The text was updated successfully, but these errors were encountered: