Docs: Refactored the documentation and deprecation warnings (#267)

* Refactored the documentation, mostly the tutorials, and deprecated some classes and methods * Update import * Updated ruff config and imports for deprecated * Test and update deprecation warning
polaris-hub · Feb 7, 2025 · 8c35a28 · 8c35a28
1 parent 787cdb3
commit 8c35a28
Show file tree

Hide file tree

Showing 35 changed files with 1,293 additions and 3,899 deletions.
diff --git a/docs/api/benchmark.md b/docs/api/benchmark.md
@@ -1,16 +1,10 @@
-# Base class
-
-::: polaris.benchmark.BenchmarkSpecification
+::: polaris.benchmark.BenchmarkV2Specification
     options:
         filters: ["!^_", "!md5sum", "!get_cache_path"]
 
----
-## Subclasses
-
-::: polaris.benchmark.SingleTaskBenchmarkSpecification
-
----
 
-::: polaris.benchmark.MultiTaskBenchmarkSpecification
+::: polaris.benchmark.BenchmarkSpecification
+    options:
+        filters: ["!^_", "!md5sum", "!get_cache_path"]
 
 ---
diff --git a/docs/api/dataset.md b/docs/api/dataset.md
@@ -1,9 +1,3 @@
-::: polaris.dataset.Dataset
-    options:
-        filters: ["!^_"]
-
---- 
-
 ::: polaris.dataset.DatasetV2
     options:
         filters: ["!^_"]
@@ -26,4 +20,4 @@
     options:
         filters: ["!^_"]
 
----
+--- 
diff --git a/docs/images/zarr.png b/docs/images/zarr.png
diff --git a/docs/index.md b/docs/index.md
@@ -2,21 +2,15 @@
 
 Welcome to the Polaris documentation!
 
-
-
---- 
-
 ## What is Polaris?
 
-!!! info "Our vision"
+!!! info "Our mission"
 
-    Polaris aims to **foster the development of impactful AI models in drug discovery** by establishing a new 
-    and adaptive standard for measuring progress of computational tools in drug discovery.
+    Polaris is on a mission to bring innovators and practitioners closer together to develop methods that matter.
 
-Polaris is a suite of tools to implement, host and run benchmarks in computational drug discovery. Existing benchmarks leave several key challenges - related to the characteristics of datasets in drug discovery - unaddressed. This can lead to a situation in which newly proposed models do not perform as well _as advertised_ in real drug discovery programs, ultimately risking misalignment between the scientists developing the models and downstream users. With Polaris, we aim to further close that gap. 
+Polaris is an optimistic community that fundamentally believes in the ability of Machine Learning to radically improve lives by disrupting the drug discovery process. However, we recognize that the absence of standardized, domain-appropriate datasets, guidelines, and tools for method evaluation is limiting its current impact.
 
-### Polaris Hub
-A quick word on the [Polaris Hub](https://polarishub.io/). The hub hosts a variety of high-quality benchmarks and datasets. While the hub is built to easily integrate with the Polaris library, you can use them independently.
+Polaris is a Python library designed to interact with the [Polaris Hub](https://www.polarishub.io). Our aim is to build the leading benchmarking platform for drug discovery, promoting the use of high-quality resources and domain-appropriate evaluation protocols. Learn more through our [blog posts](https://polarishub.io/blog).
 
 ## Where to next?
 
@@ -35,7 +29,7 @@ If you are entirely new to Polaris, this is the place to start! Learn about the
 
 Dive deeper into the Polaris code and learn about advanced concepts to create your own benchmarks and datasets. 
 
-[:material-arrow-right: Let's get started](./tutorials/basics.ipynb)
+[:material-arrow-right: Let's get started](./tutorials/submit_to_benchmark.ipynb)
 
 ---
 

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -1,39 +1,48 @@
 # Quickstart
+Welcome to the Polaris Quickstart guide! This page will introduce you to core concepts and you'll submit a first result to a benchmark on the [Polaris Hub](https://www.polarishub.io).
 
 ## Installation
+!!! warning "`polaris-lib` vs `polaris`"
+    Be aware that the package name differs between _pip_ and _conda_.
 
-First things first, let's install Polaris!
+Polaris can be installed via _pip_:
 
-We highly recommend using a [Conda Python distribution](https://github.com/conda-forge/miniforge), such as `mamba`:
+```bash
+pip install polaris-lib
+```
 
+or _conda_: 
 ```bash
-mamba install -c conda-forge polaris
+conda install -c conda-forge polaris
 ```
 
-??? info "Other installation options"
-    You can replace `mamba` by `conda`. The package is also pip installable if you need it: `pip install polaris-lib`.
+## Core concepts
+Polaris explicitly distinguished **datasets** and **benchmarks**. 
+
+- A _dataset_ is simply a tabular collection of data, storing datapoints in a row-wise manner. 
+- A _benchmark_ defines the ML task and evaluation logic (e.g. split and metrics) for a dataset.
+
+One dataset can therefore be associated with multiple benchmarks. 
+
+## Login
+To interact with the [Polaris Hub](https://polarishub.io/) from the client, you must first authenticate yourself. If you don't have an account yet, you can create one [here](https://polarishub.io/sign-up).
 
-## Authenticating to the Polaris Hub
-To interact with the [Polaris Hub](https://polarishub.io/) from the client, you must first login. You can do this
-via the following command in your terminal:
+You can do this via the following command in your terminal:
 
 ```bash
 polaris login
 ```
 
-This will redirect you to a login page on the Polaris Hub where you can either sign in or sign up. Once either
-of these options have been completed, you will see an authorization code on your screen. Copy this and paste it 
-back into your terminal when prompted by the client.
+or in Python: 
+```py
+from polaris.hub.client import PolarisHubClient
 
-That's it! You're now all set to interact with datasets and benchmarks across Polaris.
-
-## Benchmarking API
-
-At its core, Polaris is a benchmarking library. It provides a simple API to run benchmarks. While it can be used
-independently, it is built to easily integrate with the Polaris Hub. The hub hosts
-a variety of high-quality datasets, benchmarks and associated results.
+with PolarisHubClient() as client:
+    client.login()
+```
 
-If all you care about is to partake in a benchmark that is hosted on the hub, it is as simple as:
+## Benchmark API
+To get started, we will submit a result to the [`polaris/hello-world-benchmark`](https://polarishub.io/benchmarks/polaris/hello-world-benchmark).
 
 ```python
 import polaris as po
@@ -57,17 +66,18 @@ predictions = [0.0 for x in test]
 results = benchmark.evaluate(predictions)
 
 # Submit your results
-results.upload_to_hub(owner="dummy-user")
+results.upload_to_hub(owner="dummy-user", access="public")
 ```
 
-That's all there is to it to partake in a benchmark. No complicated, custom data-loaders or evaluation protocol. With just a few lines of code, you can feel confident that you are properly evaluating your model and focus on what you do best: Solving the hard problems in our domain!
+Through immutable datasets and standardized benchmarks, Polaris aims to serve as a source of truth for machine learning in drug discovery. The limited flexibility might differ from your typical experience, but this is by design to improve reproducibility. Learn more [here](https://polarishub.io/blog/reproducible-machine-learning-in-drug-discovery-how-polaris-serves-as-a-single-source-of-truth).
 
-Similarly, you can easily access a dataset.
+## Dataset API
+Loading a benchmark will automatically load the underlying dataset. We can also directly access the [`polaris/hello-world`](https://polarishub.io/datasets/polaris/hello-world) dataset.
 
 ```python
 import polaris as po
 
-# Load the dataset from the hub
+# Load the dataset from the Hub
 dataset = po.load_dataset("polaris/hello-world")
 
 # Get information on the dataset size
@@ -82,21 +92,14 @@ dataset.get_data(
 # Or, similarly:
 dataset[dataset.rows[0], dataset.columns[0]]
 
-# Get an entire row
+# Get an entire data point
 dataset[0]
 ```
 
-## Core concepts
-
-At the core of our API are 4 core concepts, each associated with a class:
-
-1. [`Dataset`][polaris.dataset.Dataset]: The dataset class is carefully designed data-structure, stress-tested on terra-bytes of data, to ensure whatever dataset you can think of, you can easily create, store and use it.
-2. [`BenchmarkSpecification`][polaris.benchmark.BenchmarkSpecification]: The benchmark specification class wraps a `Dataset` with additional meta-data to produce a the benchmark. Specifically, it specifies how to evaluate a model's performance on the underlying dataset (e.g. the train-test split and metrics). It provides a simple API to run said evaluation protocol.
-3. [`Subset`][polaris.dataset.Subset]: The subset class should be used as a starting-point for any framework-specific (e.g. PyTorch or Tensorflow) data loaders. To facilitate this, it abstracts away the non-trivial logic of accessing the data and provides several style of access to built upon.
-4. [`BenchmarkResults`][polaris.evaluate.BenchmarkResults]: The benchmark results class stores the results of a benchmark, along with additional meta-data. This object can be easily uploaded to the Polaris Hub and shared with the broader community.
+Drug discovery research involves a maze of file formats (e.g. PDB for 3D structures, SDF for small molecules, and so on). Each format requires specialized knowledge to parse and interpret properly. At Polaris, we wanted to remove that barrier. We use a universal data format based on [Zarr](https://zarr.dev/). Learn more [here](https://polarishub.io/blog/dataset-v2-built-to-scale).
 
 ## Where to next?
 
-Now that you've seen how easy it is to use Polaris, let's dive into the details through a set of tutorials!
+Now that you've seen how easy it is to use Polaris, let's dive into the details through [a set of tutorials](./tutorials/submit_to_benchmark.ipynb)!
 
 ---
diff --git a/docs/resources.md b/docs/resources.md
@@ -0,0 +1,13 @@
+# Resources
+
+## Publications
+
+- Correspondence in Nature Biotechnology: [10.1038/s42256-024-00911-w](https://doi.org/10.1038/s42256-024-00911-w).
+- Preprint on Method Comparison Protocols: [10.26434/chemrxiv-2024-6dbwv-v2](https://doi.org/10.26434/chemrxiv-2024-6dbwv-v2).
+
+## Talks
+
+- PyData London (June, 2024): [https://www.youtube.com/watch?v=YZDfD9D7mtE](https://www.youtube.com/watch?v=YZDfD9D7mtE)
+- MoML (June, 2024): [https://www.youtube.com/watch?v=Tsz_T1WyufI](https://www.youtube.com/watch?v=Tsz_T1WyufI)
+
+---