Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing the original mined user queries for the released MixEval test data? #51

Open
yixinL7 opened this issue Nov 10, 2024 · 1 comment

Comments

@yixinL7
Copy link

yixinL7 commented Nov 10, 2024

Dear Authors,

Congratulations on the great work and thank you for your effort in open-sourcing the related artifacts!

Would you consider releasing the original user queries used to build the released MixEval test data (i.e., the 4K data points in MixEval and 1K data points for MixEval-hard)? I understand you mentioned in #36 that you are not open-sourcing the exact web query data or pipeline because it would be easier for others to hack the resampled benchmark versions, and that you have another on-going project. However, such risks should probably be low regarding the original queries for the already released test data. It would be very interesting to study these queries, and they would be a valuable resource to the researchers!

Thank you!

@Psycoy
Copy link
Owner

Psycoy commented Nov 11, 2024

Hi @yixinL7 ,

Thank you for your kind words!
We're sorry that we didn't keep the separate query batches for each dynamic version (it was on the fly). The web query pool is not being open-sourced due to the mentioned issues in #36.
You may consider other real-world user datasets for your experiments, which is quite similar to our web queries (as shown in the figure 2 of the paper).
Sorry for the inconvenience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants