Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Fixing actions and moved tabular-merger #3

Merged
merged 3 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion ruff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ convention = "google"
[per-file-ignores]
"__init__.py" = ["F401"]
"__main__.py" = ["B008", "S101"]
"./**/tests/*.py" = ["S101"] # Use of assert detected.
"./**/tests/*.py" = [
"S101", # Use of assert detected.
"PLR2004", # Use of magic value in comparison.
]

[isort]
force-single-line = true
27 changes: 27 additions & 0 deletions transforms/tabular-merger-tool/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[bumpversion]
current_version = 0.1.3-dev0
commit = True
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<dev>\d+))?
serialize =
{major}.{minor}.{patch}-{release}{dev}
{major}.{minor}.{patch}

[bumpversion:part:release]
optional_value = _
first_value = dev
values =
dev
_

[bumpversion:part:dev]

[bumpversion:file:pyproject.toml]
search = version = "{current_version}"
replace = version = "{new_version}"

[bumpversion:file:plugin.json]

[bumpversion:file:VERSION]

[bumpversion:file:src/polus/images/transforms/tabular/tabular_merger/__init__.py]
21 changes: 21 additions & 0 deletions transforms/tabular-merger-tool/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM polusai/bfio:2.1.9

# environment variables defined in polusai/bfio
ENV EXEC_DIR="/opt/executables"
ENV POLUS_IMG_EXT=".ome.tif"
ENV POLUS_TAB_EXT=".arrow"
ENV POLUS_LOG="INFO"

# Work directory defined in the base container
WORKDIR ${EXEC_DIR}

COPY pyproject.toml ${EXEC_DIR}
COPY VERSION ${EXEC_DIR}
COPY README.md ${EXEC_DIR}
RUN pip3 install --index-url https://test.pypi.org/simple/ filepattern==2.2.7
COPY src ${EXEC_DIR}/src

RUN pip3 install ${EXEC_DIR} --no-cache-dir

ENTRYPOINT ["python3", "-m", "polus.images.transforms.tabular.tabular_merger"]
CMD ["--help"]
52 changes: 52 additions & 0 deletions transforms/tabular-merger-tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Tabular Merger (v0.1.0)

This WIPP plugin merges all tabular files with vaex supported file formats into a combined file using either row or column merging.

1. csv
2. hdf5
3. parquet
4. feather
5. arrow

**row merging with same headers**

If this is a case `dim = rows` and `sameColumns`, files are assumed to have headers (column Names) in the first row. If headers are not the same between all files, It finds common headers among files and then performs row merging. An additional column with name `file` is created in the output file, and this contains the name of the original file associated with the row of data.

**row merging without same headers**

If this is a case `dim = rows`, In this case files can be merged even when are headers are not exactly same between all files, files that don't have a specific column header will have the column filled with 'NaN' values. An additional column with name `file` is created in the output file, and this contains the name of the original file associated with the row of data.

**column merging with same rows**
If this is a case `dim = columns` and `sameRows`, it is assumed that all files have same number of rows. The filename is added as a prefix to each column name to avoid the duplication of column names on merging.

**column merging with unequal rows**
If this is a case `dim = columns`. The `map_var` should be defined to join tabular files with unequal rows. The `indexcolumn` column is created from `map_var` and indexing its values in each tabular file which allows the joining of tabular files without duplication of rows.

If `stripExtension` is set to true, then the file extensiton is removed from the file name in the `file` column.

For more information on WIPP, visit the [official WIPP page](https://isg.nist.gov/deepzoomweb/software/wipp).

## Building

To build the Docker image for the conversion plugin, run
`./build-docker.sh`.

## Install WIPP Plugin

If WIPP is running, navigate to the plugins page and add a new plugin. Paste the contents of `plugin.json` into the pop-up window and submit.

## Options

This plugin takes eight input argument and one output argument:

| Name | Description | I/O | Type |
|--------------------|------------------------------------------------------------|--------|---------------|
| `--inpDir` | Input data collection to be processed by this plugin | Input | genericData |
| `--filePattern` | Pattern to parse tabular files | Input | string |
| `--stripExtension` | Should csv be removed from the filename in the output file | Input | boolean |
| `--dim` | Perform `rows` or `columns` merger | Input | enum |
| `--sameRows` | Merge tabular files with the same number of rows? | Input | boolean |
| `--sameColumns` | Merge tabular files with the same header(Column Names) | Input | boolean |
| `--mapVar` | Column name use to merge files | Input | string |
| `--outDir` | Output file | Output | genericData |
| `--preview` | Generate JSON file with outputs | Output | JSON |
1 change: 1 addition & 0 deletions transforms/tabular-merger-tool/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.1.3-dev0
114 changes: 114 additions & 0 deletions transforms/tabular-merger-tool/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
{
"name": "Tabular Merger",
"version": "0.1.3-dev0",
"title": "Tabular Merger",
"description": "Merge vaex supported tabular file format into a single merged file.",
"author": "Nicholas Schaub ([email protected]), Hamdah Shafqat Abbasi ([email protected])",
"institution": "National Center for Advancing Translational Sciences, National Institutes of Health",
"repository": "https://github.com/PolusAI/polus-plugins",
"website": "https://ncats.nih.gov/preclinical/core/informatics",
"citation": "",
"containerId": "polusai/tabular-merger-tool:0.1.3-dev0",
"baseCommand": [
"python3",
"-m",
"polus.images.transforms.tabular.tabular_merger"
],
"inputs": [
{
"name": "inpDir",
"type": "genericData",
"description": "Input data collection to be processed by this plugin",
"required": true
},
{
"name": "filePattern",
"type": "string",
"description": "Pattern to parse input files",
"default": ".+",
"required": false
},
{
"name": "stripExtension",
"type": "boolean",
"description": "Should file extension be removed for filenames in the merged file column",
"required": true
},
{
"name": "dim",
"type": "enum",
"options": {
"values": [
"rows",
"columns",
"default"
]
},
"description": "Merging dimension",
"required": true
},
{
"name": "sameRows",
"type": "boolean",
"description": "Perform column merge on all files with the same number of rows?",
"required": false
},
{
"name": "sameColumns",
"type": "boolean",
"description": "Perform row merge on all files with the same column names",
"required": false
},
{
"name": "mapVar",
"type": "string",
"description": "Column name to join files column wise",
"required": false
}
],
"outputs": [
{
"name": "outDir",
"type": "genericData",
"description": "Output data collection"
}
],
"ui": [
{
"key": "inputs.inpDir",
"title": "Input collection",
"description": "Input image collection to be processed by this plugin"
},
{
"key": "inputs.filePattern",
"title": "filePattern",
"description": "Pattern to parse input files",
"default": ".+"
},
{
"key": "inputs.stripExtension",
"title": "Remove File Extension",
"description": "Remove file extension in the merged file column"
},
{
"key": "inputs.dim",
"title": "Merging dimension",
"description": "Merge along rows or columns?"
},
{
"key": "inputs.sameRows",
"title": "Merge files with equal rows:",
"description": "Merge only files with matching number of rows?"
},
{
"key": "inputs.sameColumns",
"title": "Merge CSVs with same columns:",
"description": "Merge files with with common columns between files?"
},
{
"key": "inputs.mapVar",
"title": "Column name use to merge files",
"description": "Column name use to merge files"
}
]
}
34 changes: 34 additions & 0 deletions transforms/tabular-merger-tool/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[tool.poetry]
name = "polus-images-transforms-tabular-tabular-merger"
version = "0.1.3-dev0"
description = "Merge vaex supported tabular file format into a single merged file."
authors = [
"Nick Schaub <[email protected]>",
"Hamdah Shafqat abbasi <[email protected]>"
]
readme = "README.md"
packages = [{include = "polus", from = "src"}]

[tool.poetry.dependencies]
python = ">=3.9"
filepattern = "^2.0.0"
typer = "^0.7.0"
blake3 = "^0.3.3"
llvmlite = "^0.39.1"
fastapi = "^0.92.0"
astropy = "5.2.1"
vaex = "^4.17.0"
tqdm = "^4.65.0"


[tool.poetry.group.dev.dependencies]
bump2version = "^1.0.1"
pre-commit = "^3.1.0"
black = "^23.1.0"
flake8 = "^6.0.0"
mypy = "^1.0.1"
pytest = "^7.2.1"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
30 changes: 30 additions & 0 deletions transforms/tabular-merger-tool/run-plugin.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
version=$(<VERSION)
datapath=$(readlink --canonicalize data)

# Inputs
inpDir=/data/input
filePattern=".*"

# Output paths
outDir=/data/output

#Other params
stripExtension=false
dim=rows
mapVar = "mask_intensity"

# Log level, must be one of ERROR, CRITICAL, WARNING, INFO, DEBUG
LOGLEVEL=INFO

docker run --mount type=bind,source=${datapath},target=/data/ \
--env POLUS_LOG=${LOGLEVEL} \
polusai/tabular-merger-plugin:${version} \
--inpDir ${inpDir} \
--filePattern ${filePattern} \
--stripExtension ${stripExtension} \
--dim ${dim} \
--sameRows \
--sameColumns \
--mapVar ${mapVar} \
--outDir ${outDir}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""Tabular Merger."""
__version__ = "0.1.3-dev0"

from . import tabular_merger
Loading
Loading