Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-term roadmap #130

Open
17 tasks
joaquinvanschoren opened this issue Jul 13, 2020 · 1 comment
Open
17 tasks

Long-term roadmap #130

joaquinvanschoren opened this issue Jul 13, 2020 · 1 comment
Milestone

Comments

@joaquinvanschoren
Copy link
Contributor

Feedback gathered from user reviews of the current OpenML website. I'm posting it here to keep track, these can be prioritized and split off into new issues at the appropriate time.

  • It's unclear for many users how to use OpenML. Make a better 'getting started' page and/or video tutorials.
  • OpenML feels unstructured. It lacks high level overview of dataset etc.
  • Difficult to filter on image/vision datasets.
  • Datasets are all public. No way yet to share datasets with a limited group of people
  • Local installation is hard.
  • Limited instructions on how to deploy in organizations. How to upgrade?
  • AutoML 'bots' do not run automatically on new datasets. Many tasks have no results.
  • Donations should be possible (in a non-intrusive way)
  • Lack of integration with other dataset repositories (e.g. Zenodo, Kaggle, data.world, UCI, open government, OpenNeuro, BrainLife.io,...)
  • Trustworthiness of results: how do we know if results are not overfitted/biased?
  • Data/algorithm management: how to correct issues?
  • Handling of large datasets. Need efficient downloading and loading of large datasets.
  • Handling of evolving datasets. E.g. where new data is added every day.
  • Automated tools for checking data quality, fixing issues
  • Collaboration: Need chat/communication/forum to support user discussions/help. Only some users find their way to GitHub issues, and those discussions are hard to find (e.g. Making OpenML understandable for people with data (who aren't machine learners)  OpenML#455)
  • Collaboration: Streamline interactive model building. Allow people to see and understand each other's models, download and improve them, discuss online,... Maybe have one user/group-assigned 'leading model' per task (not necessarily the one with highest accuracy).
  • Unclear documentation on the security of uploaded data. How to collaborate on sensitive datasets (e.g. medical)?
@joaquinvanschoren
Copy link
Contributor Author

Additional feedback:

  • Allow user-specific metadata: domain scientists often want to store additional meta-data. It should be easy to store this with datasets, flows and runs.
  • Explain better how OpenML fits in existing environments: e.g. organizations that have existing data lakes
  • Unix-like sharing: permissions on user-group-world level
  • Allow sharing jupyter notebooks, build a searchable 'gallery' of example notebooks

@joaquinvanschoren joaquinvanschoren added this to the v2.x milestone Mar 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant