Feature: Welford std approximation #153

CatEek · 2024-06-19T10:01:52Z

Description

Changed std computation for iterative dataset to use Welford algorithm.

What: Added Welford algo into iterable dataset
Why: Provides precise approximation of an std for a dataset of files

Changes Made

calculate_mean_std fun in iterable dataset module
Relevant tests

melisande-c · 2024-06-19T12:38:08Z

src/careamics/dataset/dataset_utils/dataset_utils.py

+def update_iterative_stats(
+    count: np.ndarray, mean: np.ndarray, m2: np.ndarray, new_values: np.ndarray
+) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """Update the mean and variance of an array iteratively.


Method should be referenced in docstring or at least comments.

Agree, there should be mention of Welford algorithm and maybe a link to the wiki page (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance)

jdeschamps · 2024-06-19T14:29:09Z

src/careamics/dataset/dataset_utils/__init__.py

@@ -13,7 +15,12 @@
 ]


-from .dataset_utils import compute_normalization_stats, reshape_array
+from .dataset_utils import (
+    compute_normalization_stats,


I think I'd prefer from .data_utils.welford import ...

For methods that are clearly part of a particular algorithm, I would like this to be clearly identified, even in the import!

jdeschamps · 2024-06-19T14:31:12Z

src/careamics/dataset/dataset_utils/dataset_utils.py

+def update_iterative_stats(
+    count: np.ndarray, mean: np.ndarray, m2: np.ndarray, new_values: np.ndarray
+) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
+    """Update the mean and variance of an array iteratively.


Agree, there should be mention of Welford algorithm and maybe a link to the wiki page (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance)

jdeschamps · 2024-06-19T14:37:44Z

src/careamics/dataset/iterable_dataset.py

-                target_stds.append(target_std)
+                target_init_mean, _ = compute_normalization_stats(target)
+                target_channels = np.array(np.split(target, target.shape[1], axis=1))
+                if num_samples == 0:


This should be refactored somewhere else, especially because the whole procedure is called twice!

In the spirit of the previous comment, update_iterative_stats and the other method could become _update_iterative_stats and then a new method in dataset_utils.welford called welford_mean_and_std(array, ...) could do the calculation

tests/dataset/test_iterable_dataset.py

jdeschamps · 2024-06-19T14:42:10Z

src/careamics/dataset/dataset_utils/dataset_utils.py

@@ -121,3 +121,69 @@ def compute_normalization_stats(image: np.ndarray) -> Tuple[np.ndarray, np.ndarr
    # Define the list of axes excluding the channel axis
    axes = tuple(np.delete(np.arange(image.ndim), 1))
    return np.mean(image, axis=axes), np.std(image, axis=axes)
+
+
+def update_iterative_stats(


Not sure whether that's "better", but another approach could be:

class WelfordStatistics: count: int means: NDArray m2: NDArray def update(new_values) -> None: # update here def get_stats(self) -> Tuple[NDArray, NDArray]: # finalize here

src/careamics/dataset/dataset_utils/__init__.py

src/careamics/dataset/dataset_utils/running_stats.py

CatEek added 4 commits June 18, 2024 17:18

draft + test

ab97671

types, docstrings

c9639c8

target stats, test

8bd270e

mypy fix

a127edf

CatEek requested a review from jdeschamps June 19, 2024 10:01

CatEek changed the title ~~Iz/feat/welford norm~~ Feature: Welform std approximation Jun 19, 2024

melisande-c reviewed Jun 19, 2024

View reviewed changes

jdeschamps reviewed Jun 19, 2024

View reviewed changes

jdeschamps changed the title ~~Feature: Welform std approximation~~ Feature: Welford std approximation Jun 19, 2024

jdeschamps mentioned this pull request Jun 21, 2024

Refactoring: lightning API package and smoke tests #161

Merged

4 tasks

CatEek and others added 2 commits June 24, 2024 12:12

stats calculation as a class refac

4ccdc25

Merge branch 'main' into iz/feat/welford_norm

e0fbf5f

jdeschamps self-requested a review June 24, 2024 10:38

jdeschamps reviewed Jun 24, 2024

View reviewed changes

jdeschamps and others added 3 commits June 24, 2024 13:22

Merge branch 'main' into iz/feat/welford_norm

f239e40

fix: fix calls to parameters in tests

7edcba2

rename test, minor refac

d812971

jdeschamps self-requested a review June 24, 2024 13:59

jdeschamps approved these changes Jun 24, 2024

View reviewed changes

CatEek merged commit 2764100 into main Jun 24, 2024
15 checks passed

CatEek deleted the iz/feat/welford_norm branch June 24, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Welford std approximation #153

Feature: Welford std approximation #153

CatEek commented Jun 19, 2024

melisande-c Jun 19, 2024

jdeschamps Jun 19, 2024

jdeschamps Jun 19, 2024

jdeschamps Jun 19, 2024

jdeschamps Jun 19, 2024

jdeschamps Jun 19, 2024

Feature: Welford std approximation #153

Feature: Welford std approximation #153

Conversation

CatEek commented Jun 19, 2024

Description

Changes Made

melisande-c Jun 19, 2024

Choose a reason for hiding this comment

jdeschamps Jun 19, 2024

Choose a reason for hiding this comment

jdeschamps Jun 19, 2024

Choose a reason for hiding this comment

jdeschamps Jun 19, 2024

Choose a reason for hiding this comment

jdeschamps Jun 19, 2024

Choose a reason for hiding this comment

jdeschamps Jun 19, 2024

Choose a reason for hiding this comment