Add additional metadata to the columns, dataset and benchmark #65

cwognum · 2023-12-04T19:13:25Z

Changelogs

Metadata for benchmarks

Fixes Additional attributes for Benchmark #63
- Adds the task type (i.e. multi task or single task), computed automatically.
- Adds the target type (i.e. regression or classification), can be manually specified, but if not lib tries to infer automatically using sklearn's type_of_target().
- Adds the train set size, computed automatically.
- Adds the test set(s) size(s), computed automatically.
- Adds the number of classes for classification tasks, computed automatically.
- Adds the number of test sets

Metadata for datasets

Fixes Feat: Add precomputed fields #18
- Adds the dtype to the column annotations, computed automatically.
- Adds the no. datapoints and no. columns, computed automatically.
- Adds a n optional field with a reference to the curation.

Misc

Add additional syntax to the dataset to make interacting with it easier, similar to pandas.DataFrame. E.g. dataset[row, col] or dataset[:, "smiles"].

Checklist:

Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
Update the API documentation if a new function is added, or an existing one is deleted.
Write concise and explanatory changelogs above.
If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

…visualized in the Hub

cwognum · 2023-12-04T19:18:55Z

Worth noting that we will also have the size of the dataset (in bytes), but I will save this on the Hub side since the Hub already receives this information anyways.

cwognum · 2023-12-04T19:26:39Z

Quick! Change from meta-data to metadata before @jstlaurent and @hadim spot it! 👀

polaris/benchmark/_base.py

zhu0619

Thanks Cas.
It looks good to me.
I have only one minor comment.

Add additional meta-data to the columns, dataset and benchmark to be …

cceb2fb

…visualized in the Hub

cwognum added the feature Annotates any PR that adds new features; Used in the release process label Dec 4, 2023

cwognum requested a review from hadim as a code owner December 4, 2023 19:13

cwognum requested a review from zhu0619 December 4, 2023 19:13

hadim approved these changes Dec 4, 2023

View reviewed changes

Rename from no_ to n_

2c05e02

cwognum changed the title ~~Add additional meta-data to the columns, dataset and benchmark~~ Add additional metadata to the columns, dataset and benchmark Dec 4, 2023

cwognum added 2 commits December 4, 2023 14:47

updated dataset code snippet

a666b08

Change from no_ to n_ for benchmark too

7190074

zhu0619 reviewed Dec 4, 2023

View reviewed changes

polaris/benchmark/_base.py Outdated Show resolved Hide resolved

zhu0619 reviewed Dec 4, 2023

View reviewed changes

cwognum and others added 2 commits December 4, 2023 22:03

Minor: More informative error in client

b916df5

Don't save None's to nClasses

f1e717a

zhu0619 approved these changes Dec 6, 2023

View reviewed changes

cwognum merged commit e1b33ef into main Dec 7, 2023
4 checks passed

cwognum deleted the feat/more-meta-data branch December 7, 2023 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional metadata to the columns, dataset and benchmark #65

Add additional metadata to the columns, dataset and benchmark #65

cwognum commented Dec 4, 2023 •

edited

Loading

cwognum commented Dec 4, 2023

cwognum commented Dec 4, 2023

zhu0619 left a comment

Add additional metadata to the columns, dataset and benchmark #65

Add additional metadata to the columns, dataset and benchmark #65

Conversation

cwognum commented Dec 4, 2023 • edited Loading

Changelogs

cwognum commented Dec 4, 2023

cwognum commented Dec 4, 2023

zhu0619 left a comment

Choose a reason for hiding this comment

cwognum commented Dec 4, 2023 •

edited

Loading