Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional metadata to the columns, dataset and benchmark #65

Merged
merged 6 commits into from
Dec 7, 2023

Conversation

cwognum
Copy link
Collaborator

@cwognum cwognum commented Dec 4, 2023

Changelogs

Metadata for benchmarks

  • Fixes Additional attributes for Benchmark  #63
    • Adds the task type (i.e. multi task or single task), computed automatically.
    • Adds the target type (i.e. regression or classification), can be manually specified, but if not lib tries to infer automatically using sklearn's type_of_target().
    • Adds the train set size, computed automatically.
    • Adds the test set(s) size(s), computed automatically.
    • Adds the number of classes for classification tasks, computed automatically.
    • Adds the number of test sets

Metadata for datasets

  • Fixes Feat: Add precomputed fields #18
    • Adds the dtype to the column annotations, computed automatically.
    • Adds the no. datapoints and no. columns, computed automatically.
    • Adds a n optional field with a reference to the curation.

Misc

  • Add additional syntax to the dataset to make interacting with it easier, similar to pandas.DataFrame. E.g. dataset[row, col] or dataset[:, "smiles"].

Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

@cwognum cwognum added the feature Annotates any PR that adds new features; Used in the release process label Dec 4, 2023
@cwognum cwognum requested a review from hadim as a code owner December 4, 2023 19:13
@cwognum cwognum requested a review from zhu0619 December 4, 2023 19:13
@cwognum
Copy link
Collaborator Author

cwognum commented Dec 4, 2023

Worth noting that we will also have the size of the dataset (in bytes), but I will save this on the Hub side since the Hub already receives this information anyways.

@cwognum cwognum changed the title Add additional meta-data to the columns, dataset and benchmark Add additional metadata to the columns, dataset and benchmark Dec 4, 2023
@cwognum
Copy link
Collaborator Author

cwognum commented Dec 4, 2023

Quick! Change from meta-data to metadata before @jstlaurent and @hadim spot it! 👀

polaris/benchmark/_base.py Outdated Show resolved Hide resolved
Copy link
Contributor

@zhu0619 zhu0619 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Cas.
It looks good to me.
I have only one minor comment.

@cwognum cwognum merged commit e1b33ef into main Dec 7, 2023
4 checks passed
@cwognum cwognum deleted the feat/more-meta-data branch December 7, 2023 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional attributes for Benchmark Feat: Add precomputed fields
3 participants