Skip to content

Commit

Permalink
Docs: Refactored the documentation and deprecation warnings (#267)
Browse files Browse the repository at this point in the history
* Refactored the documentation, mostly the tutorials, and deprecated some classes and methods

* Update import

* Updated ruff config and imports for deprecated

* Test and update deprecation warning
  • Loading branch information
cwognum authored Feb 7, 2025
1 parent 787cdb3 commit 8c35a28
Show file tree
Hide file tree
Showing 35 changed files with 1,293 additions and 3,899 deletions.
14 changes: 4 additions & 10 deletions docs/api/benchmark.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
# Base class

::: polaris.benchmark.BenchmarkSpecification
::: polaris.benchmark.BenchmarkV2Specification
options:
filters: ["!^_", "!md5sum", "!get_cache_path"]

---
## Subclasses

::: polaris.benchmark.SingleTaskBenchmarkSpecification

---

::: polaris.benchmark.MultiTaskBenchmarkSpecification
::: polaris.benchmark.BenchmarkSpecification
options:
filters: ["!^_", "!md5sum", "!get_cache_path"]

---
8 changes: 1 addition & 7 deletions docs/api/dataset.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
::: polaris.dataset.Dataset
options:
filters: ["!^_"]

---

::: polaris.dataset.DatasetV2
options:
filters: ["!^_"]
Expand All @@ -26,4 +20,4 @@
options:
filters: ["!^_"]

---
---
Binary file added docs/images/zarr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 5 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,15 @@

Welcome to the Polaris documentation!



---

## What is Polaris?

!!! info "Our vision"
!!! info "Our mission"

Polaris aims to **foster the development of impactful AI models in drug discovery** by establishing a new
and adaptive standard for measuring progress of computational tools in drug discovery.
Polaris is on a mission to bring innovators and practitioners closer together to develop methods that matter.

Polaris is a suite of tools to implement, host and run benchmarks in computational drug discovery. Existing benchmarks leave several key challenges - related to the characteristics of datasets in drug discovery - unaddressed. This can lead to a situation in which newly proposed models do not perform as well _as advertised_ in real drug discovery programs, ultimately risking misalignment between the scientists developing the models and downstream users. With Polaris, we aim to further close that gap.
Polaris is an optimistic community that fundamentally believes in the ability of Machine Learning to radically improve lives by disrupting the drug discovery process. However, we recognize that the absence of standardized, domain-appropriate datasets, guidelines, and tools for method evaluation is limiting its current impact.

### Polaris Hub
A quick word on the [Polaris Hub](https://polarishub.io/). The hub hosts a variety of high-quality benchmarks and datasets. While the hub is built to easily integrate with the Polaris library, you can use them independently.
Polaris is a Python library designed to interact with the [Polaris Hub](https://www.polarishub.io). Our aim is to build the leading benchmarking platform for drug discovery, promoting the use of high-quality resources and domain-appropriate evaluation protocols. Learn more through our [blog posts](https://polarishub.io/blog).

## Where to next?

Expand All @@ -35,7 +29,7 @@ If you are entirely new to Polaris, this is the place to start! Learn about the

Dive deeper into the Polaris code and learn about advanced concepts to create your own benchmarks and datasets.

[:material-arrow-right: Let's get started](./tutorials/basics.ipynb)
[:material-arrow-right: Let's get started](./tutorials/submit_to_benchmark.ipynb)

---

Expand Down
69 changes: 36 additions & 33 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,48 @@
# Quickstart
Welcome to the Polaris Quickstart guide! This page will introduce you to core concepts and you'll submit a first result to a benchmark on the [Polaris Hub](https://www.polarishub.io).

## Installation
!!! warning "`polaris-lib` vs `polaris`"
Be aware that the package name differs between _pip_ and _conda_.

First things first, let's install Polaris!
Polaris can be installed via _pip_:

We highly recommend using a [Conda Python distribution](https://github.com/conda-forge/miniforge), such as `mamba`:
```bash
pip install polaris-lib
```

or _conda_:
```bash
mamba install -c conda-forge polaris
conda install -c conda-forge polaris
```

??? info "Other installation options"
You can replace `mamba` by `conda`. The package is also pip installable if you need it: `pip install polaris-lib`.
## Core concepts
Polaris explicitly distinguished **datasets** and **benchmarks**.

- A _dataset_ is simply a tabular collection of data, storing datapoints in a row-wise manner.
- A _benchmark_ defines the ML task and evaluation logic (e.g. split and metrics) for a dataset.

One dataset can therefore be associated with multiple benchmarks.

## Login
To interact with the [Polaris Hub](https://polarishub.io/) from the client, you must first authenticate yourself. If you don't have an account yet, you can create one [here](https://polarishub.io/sign-up).

## Authenticating to the Polaris Hub
To interact with the [Polaris Hub](https://polarishub.io/) from the client, you must first login. You can do this
via the following command in your terminal:
You can do this via the following command in your terminal:

```bash
polaris login
```

This will redirect you to a login page on the Polaris Hub where you can either sign in or sign up. Once either
of these options have been completed, you will see an authorization code on your screen. Copy this and paste it
back into your terminal when prompted by the client.
or in Python:
```py
from polaris.hub.client import PolarisHubClient

That's it! You're now all set to interact with datasets and benchmarks across Polaris.

## Benchmarking API

At its core, Polaris is a benchmarking library. It provides a simple API to run benchmarks. While it can be used
independently, it is built to easily integrate with the Polaris Hub. The hub hosts
a variety of high-quality datasets, benchmarks and associated results.
with PolarisHubClient() as client:
client.login()
```

If all you care about is to partake in a benchmark that is hosted on the hub, it is as simple as:
## Benchmark API
To get started, we will submit a result to the [`polaris/hello-world-benchmark`](https://polarishub.io/benchmarks/polaris/hello-world-benchmark).

```python
import polaris as po
Expand All @@ -57,17 +66,18 @@ predictions = [0.0 for x in test]
results = benchmark.evaluate(predictions)

# Submit your results
results.upload_to_hub(owner="dummy-user")
results.upload_to_hub(owner="dummy-user", access="public")
```

That's all there is to it to partake in a benchmark. No complicated, custom data-loaders or evaluation protocol. With just a few lines of code, you can feel confident that you are properly evaluating your model and focus on what you do best: Solving the hard problems in our domain!
Through immutable datasets and standardized benchmarks, Polaris aims to serve as a source of truth for machine learning in drug discovery. The limited flexibility might differ from your typical experience, but this is by design to improve reproducibility. Learn more [here](https://polarishub.io/blog/reproducible-machine-learning-in-drug-discovery-how-polaris-serves-as-a-single-source-of-truth).

Similarly, you can easily access a dataset.
## Dataset API
Loading a benchmark will automatically load the underlying dataset. We can also directly access the [`polaris/hello-world`](https://polarishub.io/datasets/polaris/hello-world) dataset.

```python
import polaris as po

# Load the dataset from the hub
# Load the dataset from the Hub
dataset = po.load_dataset("polaris/hello-world")

# Get information on the dataset size
Expand All @@ -82,21 +92,14 @@ dataset.get_data(
# Or, similarly:
dataset[dataset.rows[0], dataset.columns[0]]

# Get an entire row
# Get an entire data point
dataset[0]
```

## Core concepts

At the core of our API are 4 core concepts, each associated with a class:

1. [`Dataset`][polaris.dataset.Dataset]: The dataset class is carefully designed data-structure, stress-tested on terra-bytes of data, to ensure whatever dataset you can think of, you can easily create, store and use it.
2. [`BenchmarkSpecification`][polaris.benchmark.BenchmarkSpecification]: The benchmark specification class wraps a `Dataset` with additional meta-data to produce a the benchmark. Specifically, it specifies how to evaluate a model's performance on the underlying dataset (e.g. the train-test split and metrics). It provides a simple API to run said evaluation protocol.
3. [`Subset`][polaris.dataset.Subset]: The subset class should be used as a starting-point for any framework-specific (e.g. PyTorch or Tensorflow) data loaders. To facilitate this, it abstracts away the non-trivial logic of accessing the data and provides several style of access to built upon.
4. [`BenchmarkResults`][polaris.evaluate.BenchmarkResults]: The benchmark results class stores the results of a benchmark, along with additional meta-data. This object can be easily uploaded to the Polaris Hub and shared with the broader community.
Drug discovery research involves a maze of file formats (e.g. PDB for 3D structures, SDF for small molecules, and so on). Each format requires specialized knowledge to parse and interpret properly. At Polaris, we wanted to remove that barrier. We use a universal data format based on [Zarr](https://zarr.dev/). Learn more [here](https://polarishub.io/blog/dataset-v2-built-to-scale).

## Where to next?

Now that you've seen how easy it is to use Polaris, let's dive into the details through a set of tutorials!
Now that you've seen how easy it is to use Polaris, let's dive into the details through [a set of tutorials](./tutorials/submit_to_benchmark.ipynb)!

---
13 changes: 13 additions & 0 deletions docs/resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Resources

## Publications

- Correspondence in Nature Biotechnology: [10.1038/s42256-024-00911-w](https://doi.org/10.1038/s42256-024-00911-w).
- Preprint on Method Comparison Protocols: [10.26434/chemrxiv-2024-6dbwv-v2](https://doi.org/10.26434/chemrxiv-2024-6dbwv-v2).

## Talks

- PyData London (June, 2024): [https://www.youtube.com/watch?v=YZDfD9D7mtE](https://www.youtube.com/watch?v=YZDfD9D7mtE)
- MoML (June, 2024): [https://www.youtube.com/watch?v=Tsz_T1WyufI](https://www.youtube.com/watch?v=Tsz_T1WyufI)

---
Loading

0 comments on commit 8c35a28

Please sign in to comment.