Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tools for dimension reduction and quality assesment #17

Merged
merged 14 commits into from
Aug 16, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[bumpversion]
current_version = 0.1.0-dev0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{dev}
{major}.{minor}.{patch}

[bumpversion:part:release]
optional_value = _
first_value = dev
values =
dev
_

[bumpversion:part:dev]

[bumpversion:file:pyproject.toml]
search = version = "{current_version}"
replace = version = "{new_version}"

[bumpversion:file:src/polus/tabular/features/dimension_reduction_quality_metrics/__init__.py]

[bumpversion:file:dimensionreductionqualitymetrics.cwl]

[bumpversion:file:ict.yaml]

[bumpversion:file:plugin.json]

[bumpversion:file:README.md]

[bumpversion:file:VERSION]
25 changes: 25 additions & 0 deletions features/dimension-reduction-quality-metrics-tool/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM polusai/bfio:2.3.6

# environment variables defined in polusai/bfio
ENV EXEC_DIR="/opt/executables"
ENV POLUS_IMG_EXT=".ome.tif"
ENV POLUS_TAB_EXT=".feather"
ENV POLUS_LOG="INFO"

# Work directory defined in the base container
WORKDIR ${EXEC_DIR}

# TODO: Change the tool_dir to the tool directory
ENV TOOL_DIR="features/dimension-reduction-quality-metrics-tool"

# Copy the repository into the container
RUN mkdir tabular-tools
COPY . ${EXEC_DIR}/tabular-tools

# Install the tool
RUN pip3 install "${EXEC_DIR}/tabular-tools/${TOOL_DIR}" --no-cache-dir

# Set the entrypoint
# TODO: Change the entrypoint to the tool entrypoint
ENTRYPOINT ["python3", "-m", "polus.tabular.features.dimension_reduction_quality_metrics"]
CMD ["--help"]
55 changes: 55 additions & 0 deletions features/dimension-reduction-quality-metrics-tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Dimension Reduction Quality Metrics (v0.1.0-dev0)

This tool is used to measure the quality of dimensionality reductions.
It provides the following methods for dimensionality reduction:

1. False Nearest Neighbors (FNN).

## FNN

Consider a query in the original space and some of its nearest neighbors.
Find the nearest neighbors of the query in the reduced space.
If the nearest neighbors in the reduced space are not the same as the nearest neighbors in the original space, then the reduced space is not a good representation of the original space.
FNN is the mean recall of the nearest neighbors in the reduced space over a large number of queries.

## Parameters

This tool takes the following parameters:

1. `--originalDir`: Directory containing the original data.
2. `--originalPattern`: Pattern to parse original files.
3. `--embeddedDir`: Directory containing the reduced data.
4. `--embeddedPattern`: Pattern to parse reduced files.
5. `--numQueries`: Number of queries to use.
6. `--ks`: Comma separated list of numbers of nearest neighbors to consider.
7. `--distanceMetrics`: Comma separated list of distance metrics to use.
8. `--qualityMetrics`: Comma separated list of quality metrics to use.
9. `--outDir`: Output directory.
10. `--preview`: Generate JSON file with outputs without running the tool.

## Docker Container

To build the Docker image for the conversion plugin, run `./build-docker.sh`.

## Install WIPP Plugin

If WIPP is running, navigate to the plugins page and add a new plugin.
Paste the contents of `plugin.json` into the pop-up window and submit.
For more information on WIPP, visit the [official WIPP page](https://isg.nist.gov/deepzoomweb/software/wipp).

## Options

This plugin takes seven input arguments and one output argument:

| Name | Description | I/O | Type | Default |
| ------------------- | --------------------------------------------------------- | ------ | ----------- | ------------------ |
| `--originalDir` | Directory containing the original data. | Input | genericData | N/A |
| `--originalPattern` | Pattern to parse original files. | Input | string | ".*" |
| `--embeddedDir` | Directory containing the reduced data. | Input | genericData | N/A |
| `--embeddedPattern` | Pattern to parse reduced files. | Input | string | ".*" |
| `--numQueries` | Number of queries to use. | Input | int | 1000 |
| `--ks` | Comma separated list of numbers of nearest neighbors. | Input | string | "10,100" |
| `--distanceMetrics` | Comma separated list of distance metrics to use. | Input | string | "euclidean,cosine" |
| `--qualityMetrics` | Comma separated list of quality metrics to use. | Input | string | "fnn" |
| `--outDir` | Output directory. | Output | genericData | N/A |
| `--preview` | Generate JSON file with outputs without running the tool. | Input | boolean | False |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1.0-dev0
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

# TODO: Change the name of the tool here
tool_dir="features"
tool_name="dimension-reduction-quality-metrics-tool"

# The version is read from the VERSION file
version=$(<VERSION)
tag="polusai/${tool_name}:${version}"
echo "Building docker image with tag: ${tag}"

# The current directory and the repository root are saved in variables
cur_dir=$(pwd)
repo_root=$(git rev-parse --show-toplevel)

# The Dockerfile and .dockerignore files are copied to the repository root before building the image
cd ${repo_root}
cp ./${tool_dir}/${tool_name}/Dockerfile .
cp .gitingore .dockerignore
docker build . -t ${tag}
rm Dockerfile .dockerignore
cd ${cur_dir}
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
class: CommandLineTool
cwlVersion: v1.2
inputs:
originalDir:
inputBinding:
prefix: --originalDir
type: Directory
originalPattern:
inputBinding:
prefix: --originalPattern
type: string?
embeddedDir:
inputBinding:
prefix: --embeddedDir
type: Directory
embeddedPattern:
inputBinding:
prefix: --embeddedPattern
type: string?
numQueries:
inputBinding:
prefix: --numQueries
type: int?
ks:
inputBinding:
prefix: --ks
type: string?
distanceMetrics:
inputBinding:
prefix: --distanceMetrics
type: string?
qualityMetrics:
inputBinding:
prefix: --qualityMetrics
type: string?
outDir:
inputBinding:
prefix: --outDir
type: Directory
preview:
inputBinding:
prefix: --preview
type: boolean?
outputs:
outDir:
outputBinding:
glob: $(inputs.outDir.basename)
type: Directory
requirements:
DockerRequirement:
dockerPull: polusai/dimension-reduction-quality-metrics-tool:0.1.0-dev0
InitialWorkDirRequirement:
listing:
- entry: $(inputs.outDir)
writable: true
InlineJavascriptRequirement: {}
129 changes: 129 additions & 0 deletions features/dimension-reduction-quality-metrics-tool/ict.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
author: Najib Ishaq
contact: [email protected]
container: polusai/dimension-reduction-quality-metrics-tool:0.1.0-dev0
description: Dimension reduction with various methods
entrypoint: python3 -m polus.tabular.transforms.dimension_reduction_quality_metrics

inputs:

- description: Original tabular data
format:
- originalDir
name: originalDir
required: true
type: path

- description: File pattern for the original data
format:
- originalPattern
name: originalPattern
required: false
type: string

- description: Dimension reduced tabular data
format:
- embeddedDir
name: embeddedDir
required: true
type: path

- description: File pattern for the embedded data
format:
- embeddedPattern
name: embeddedPattern
required: false
type: string

- description: Number of queries to use
format:
- numQueries
name: numQueries
required: false
type: integer

- description: Numbers of neighbors to use
format:
- ks
name: ks
required: false
type: string

- description: Distance metrics to use
format:
- metrics
name: metrics
required: false
type: string

- description: Quality metrics to use
format:
- qualityMetrics
name: qualityMetrics
required: false
type: string

name: polusai/dimension-reduction-quality-metrics

outputs:

- description: Output collection
format:
- outDir
name: outDir
required: true
type: path

repository: https://github.com/polusai/tabular-tools

specVersion: 1.0.0

title: Dimension Reduction Quality Metrics

ui:

- description: Original tabular data
key: inputs.originalDir
title: Original tabular data
type: path

- description: Pattern to parse original files
key: inputs.originalPattern
title: OriginalPattern
type: text

- description: Dimension reduced tabular data
key: inputs.embeddedDir
title: Dimension reduced tabular data
type: path

- description: Pattern to parse embedded files
key: inputs.embeddedPattern
title: EmbeddedPattern
type: text

- description: Output a JSON preview of outputs produced by this plugin
key: inputs.preview
title: Preview
type: boolean

- description: Number of queries to use
key: inputs.numQueries
title: numQueries
type: integer

- description: Numbers of neighbors to use
key: inputs.ks
title: ks
type: string

- description: Distance metrics to use
key: inputs.metrics
title: metrics
type: string

- description: Quality metrics to use
key: inputs.qualityMetrics
title: qualityMetrics
type: string

version: 0.1.0-dev0
Loading
Loading