Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tabular tool #5

Merged
merged 6 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[bumpversion]
current_version = 0.1.1
commit = False
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{dev}
{major}.{minor}.{patch}

[bumpversion:part:release]
optional_value = _
first_value = dev
values =
dev
_

[bumpversion:part:dev]

[bumpversion:file:pyproject.toml]
search = version = "{current_version}"
replace = version = "{new_version}"

[bumpversion:file:VERSION]
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
ignore = W503, E501
max-line-length = 88
extended-ignore = E203
21 changes: 20 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,4 +157,23 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
#.idea/

# vscode
.vscode

# test data directory
data

# local manifests
src/polus/plugins/_plugins/manifests/*

# allow python scripts inside manifests dir
!src/polus/plugins/_plugins/manifests/*.py

#macOS
*.DS_Store


#husky
node_modules
30 changes: 22 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
fail_fast: true

repos:

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files
exclude: (.*?)\.(h5)$
- id: check-case-conflict
- id: check-json
- id: pretty-format-json
Expand All @@ -26,27 +26,41 @@ repos:
args: ["--fix=lf"]
description: Forces to replace line ending by the UNIX 'lf' character.
- id: trailing-whitespace
exclude: '.bumpversion.cfg'
exclude: ".bumpversion.cfg"
- id: check-merge-conflict

- repo: https://github.com/psf/black
rev: '23.3.0'
rev: "23.3.0"
hooks:
- id: black
language_version: python3.9
exclude: ^src\/polus\/plugins\/_plugins\/models\/\w*Schema.py$
exclude: |
(?x)(
^src\/polus\/plugins\/_plugins\/models\/pydanticv1\/\w*Schema.py$|
^src\/polus\/plugins\/_plugins\/models\/pydanticv2\/\w*Schema.py$
)

- repo: https://github.com/charliermarsh/ruff-pre-commit
# Ruff version.
rev: 'v0.0.274'
rev: "v0.0.274"
hooks:
- id: ruff
exclude: ^src\/polus\/plugins\/_plugins\/models\/\w*Schema.py$
exclude: |
(?x)(
test_[a-zA-Z0-9]+.py$|
^src\/polus\/plugins\/_plugins\/models\/pydanticv1\/\w*Schema.py$|
^src\/polus\/plugins\/_plugins\/models\/pydanticv2\/\w*Schema.py$
)
args: [--fix]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.4.0'
rev: "v1.4.0"
hooks:
- id: mypy
exclude: ^src\/polus\/plugins\/_plugins\/models\/\w*Schema.py$
exclude: |
(?x)(
test_[a-zA-Z0-9]+.py$|
^src\/polus\/plugins\/_plugins\/models\/pydanticv1\/\w*Schema.py$|
^src\/polus\/plugins\/_plugins\/models\/pydanticv2\/\w*Schema.py$
)
additional_dependencies: [types-requests==2.31.0.1]
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @Nicholas-Schaub @NHotaling @hsidky
Empty file added README.md
Empty file.
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1.1
28 changes: 28 additions & 0 deletions clustering/feature-subsetting-tool/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
[bumpversion]
current_version = 0.2.1-dev0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{dev}
{major}.{minor}.{patch}

[bumpversion:part:release]
optional_value = _
first_value = dev
values =
dev
_

[bumpversion:part:dev]

[bumpversion:file:pyproject.toml]
search = version = "{current_version}"
replace = version = "{new_version}"

[bumpversion:file:plugin.json]
[bumpversion:file:README.md]

[bumpversion:file:VERSION]

[bumpversion:file:src/polus/tabular/clustering/feature_subsetting/__init__.py]
21 changes: 21 additions & 0 deletions clustering/feature-subsetting-tool/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM polusai/bfio:2.3.6

# environment variables defined in polusai/bfio
ENV EXEC_DIR="/opt/executables"
ENV POLUS_IMG_EXT=".ome.tif"
ENV POLUS_TAB_EXT=".csv"
ENV POLUS_LOG="INFO"

# Work directory defined in the base container
WORKDIR ${EXEC_DIR}

COPY pyproject.toml ${EXEC_DIR}
COPY VERSION ${EXEC_DIR}
COPY README.md ${EXEC_DIR}
COPY src ${EXEC_DIR}/src

RUN pip3 install ${EXEC_DIR} --no-cache-dir


ENTRYPOINT ["python3", "-m", "polus.tabular.clustering.feature_subsetting"]
CMD ["--help"]
58 changes: 58 additions & 0 deletions clustering/feature-subsetting-tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Feature Data Subset(v0.2.1-dev0)

This WIPP plugin subsets data based on a given feature. It works in conjunction with the `polus-feature-extraction-plugin`, where the feature extraction plugin can be used to extract the features such as the mean intensity of every image in the input image collection.

# Usage
The details and usage of the plugin inputs is provided in the section below. In addition to the subsetted data, the output directory also consists of a `summary.txt` file which has information as to what images were kept and their new filename if they were renamed.

### Explanation of inputs
Some of the inputs are pretty straighforward and are used commonly across most WIPP plugins. This section is used to provide some details and examples of the inputs that may be a little complicated. The image collection with the following pattern will be used as an example : `r{r+}_t{t+}_p{p+}_z{z+}_c{c+}.ome.tif`, where r,t,p,z,c stand for replicate, timepoint, positon,z-positon, and channel respectively. Consider we have 5 replicates, 3 timepoints, 50 positions, 10 z-planes and 4 channels.

1. `inpDir` - This contains the path to the input image collection to subset data from.
2. `tabularDir` This contains the path to the tabular files with file formats (`.csv`, `.arrow`, `.parquet`) containing the feature values for each image. This can be the output of the feature extraction or nyxus plugin
3. `filePattern` - Filepattern of the input images
4. `imageFeature` - Tabular data featuring image filenames
5. `tabularFeature` - Tabular feature that will be used to filter images
6. `groupVar` - This is a mandatory input across which to subset data. This can take either 1 or 2 variables as input and if 2 variables are provided then the second variable will be treated as the minor grouping variable. In our example, if the `z` is provided as input, then within a subcollection, the mean of the feature value will be taken for all images with the same z. Then the z positions will be filtered out based on the input of `percentile` and `removeDirection` variables. Now if `z,c` are provided as input, then 'c' will be treated as the minor grouping variable which means that the mean will be taken for all images with the same z for each channel. Also, the plugin will ensures that the same values of z positions are filtered out across c.
7. `percentile` and `removeDirection` - These two variables denote the critieria with which images are filtered. For example, if percentile is `0.1` and removeDirection is set to `Below` then images with feature value below the 10th percentile will be removed. On the other hand, if removeDirection is set to above then all images with feature value greater than the 10th pecentile will be removed. This enables data subsetting from both `brightfield` and `darkfield` microscopy images.

**Optional Arguments**

8. `sectionVar` - This is an optional input to segregate the input image collection into sub-collections. The analysis will be done seperately for each sub-collection. In our example, if the user enters `r,t` as the sectionVar, then we will have 15 subcollections (5*3),1 for each combination of timepoint and replicate. If the user enters `r` as sectionVar, then we will have 5 sub collections, 1 for each replicate. If the user wants to consider the whole image collection as a single section, then no input is required. NOTE: As a post processing step, same number of images will be subsetted across different sections.
9. `padding` - This is an optional variable with default value of 0. A delay of 3 means that 3 additional planes will captured on either side of the subsetted data. This can be used as a sanity check to ensure that the subsetted data captures the images we want. For example, in our examples if the following z values were filtered out intitially - 5,6,7 ; then a delay of 3 means that the output dataset will have z positions 2,3,4,5,6,7,8,9,10 if all them exist.
10. `writeOutput` - This is an optional argument with default value `True`. If it is set to true, then both the output image collection and `summary.txt` file will be created. If it is set to false, then the output directory will only consist of summary.txt. This option enables the user to tune the hyperparameters such as percentile, removeDirecton, feature without actually creating the output image collection.



Contact [Gauhar Bains](mailto:[email protected]) for more information.

For more information on WIPP, visit the [official WIPP page](https://isg.nist.gov/deepzoomweb/software/wipp).

## Building

To build the Docker image for the conversion plugin, run
`./build-docker.sh`.

## Install WIPP Plugin

If WIPP is running, navigate to the plugins page and add a new plugin. Paste the contents of `plugin.json` into the pop-up window and submit.

## Options

This plugin takes eleven input arguments and one output argument:

| Name | Description | I/O | Type |
| ------------------- | ----------------------------------------------------- | ------ | ------------- |
| `--inpDir` | Input image collection to be processed by this plugin | Input | collection |
| `--tabularDir` | Path to tabular data | Input | genericData |
| `--filePattern` | Filename pattern used to separate data | Input | string |
| `--imageFeature` | Feature in tabular data with image filenames | Input | string |
| `--tabularFeature` | Tabular feature to filter image files | Input | string |
| `--padding` | Number of images to capture outside the cutoff | Input | integer |
| `--groupVar` | variables to group by in a section | Input | string |
| `--percentile` | Percentile to remove | Input | float |
| `--removeDirection` | remove direction above or below percentile | Input | string |
| `--sectionVar` | variables to divide larger sections | Input | string |
| `--writeOutput` | write output image collection or not | Input | boolean |
| `--outDir` | Output collection | Output | genericData |
| `--preview` | Generate a JSON file with outputs | Output | JSON |
1 change: 1 addition & 0 deletions clustering/feature-subsetting-tool/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.2.1-dev0
4 changes: 4 additions & 0 deletions clustering/feature-subsetting-tool/build-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

version=$(<VERSION)
docker build . -t polusai/feature-subsetting-tool:${version}
14 changes: 14 additions & 0 deletions clustering/feature-subsetting-tool/example/summary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
------------------------------------------------

Files :

x00_y01_p03_c1.ome.tif -----> x00_y01_p01_c1.ome.tif
x00_y01_p03_c2.ome.tif -----> x00_y01_p01_c2.ome.tif
x00_y01_p03_c3.ome.tif -----> x00_y01_p01_c3.ome.tif
x00_y01_p03_c4.ome.tif -----> x00_y01_p01_c4.ome.tif
x00_y01_p03_c5.ome.tif -----> x00_y01_p01_c5.ome.tif
x00_y01_p04_c1.ome.tif -----> x00_y01_p02_c1.ome.tif
x00_y01_p04_c2.ome.tif -----> x00_y01_p02_c2.ome.tif
x00_y01_p04_c3.ome.tif -----> x00_y01_p02_c3.ome.tif
x00_y01_p04_c4.ome.tif -----> x00_y01_p02_c4.ome.tif
x00_y01_p04_c5.ome.tif -----> x00_y01_p02_c5.ome.tif
16 changes: 16 additions & 0 deletions clustering/feature-subsetting-tool/package-release.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# This script is designed to help package a new version of a plugin

# Get the new version
version=$(<VERSION)

# Bump the version
bump2version --config-file bumpversion.cfg --new-version ${version} --allow-dirty part

# Build the container
./build-docker.sh

# Push to dockerhub
docker push polusai/feature-subsetting-tool:${version}

# Run pytests
python -m pytest -s tests
Loading