Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing calls & URL size limit of OpenML server #468

Open
janvanrijn opened this issue May 10, 2018 · 4 comments
Open

Listing calls & URL size limit of OpenML server #468

janvanrijn opened this issue May 10, 2018 · 4 comments

Comments

@janvanrijn
Copy link
Member

The OpenML server has a URL size limit, which prevents us from doing listing calls with too many filters. For example, when I want to filter on 1000 task id's it is likely that the call will fail. It would be great if we can somehow automatically detect and catch this in the list all function. However, this is a though problem, as sometimes there are multiple filters (e.g., per task and per setup)

Extending this limit is not a legitimate fix, as the problem will re-occur when using bigger filters.

@janvanrijn
Copy link
Member Author

Just realized that there is a straightforward fix for this, i.e., allowing also post variables for the listing functions (although POST means authentication)

@mfeurer what do you think?

@mfeurer
Copy link
Collaborator

mfeurer commented May 11, 2018

Is the maximum URL length documented somewhere? Also, what's the server response? If the server response is something we can parse/display nicely, is there a reason to act upon this in addition?

Also, how would you handle this except raising an exception? Do you want to "parse" the query on the python side to chunk it? We could go back to post again, but we reverted that a while ago to not have to authenticate, so I don't think that's a great idea. Maybe we can catch a server exception 'URL too long' and then try a post request?

@janvanrijn
Copy link
Member Author

I think this issue will mostly occur with setups, as these can not be filtered by other means (tasks, runs, etc)

I added a function to the openml contrib library:
https://github.com/openml/openml-python-contrib/blob/master/openmlcontrib/setups/functions.py

@mfeurer
Copy link
Collaborator

mfeurer commented Nov 12, 2019

This can be reproduced with:

import openml

openml.datasets.list_datasets(
    data_id=list(range(10000))
)

@janvanrijn we can add a helper to either call the get or post request depending on the length of the URI. However, to do so we need to know the maximal length the server allows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants