diff --git a/README.md b/README.md index 72f2593..8e5bd54 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,20 @@ # Morphology-analysis -Extract morphological characteristics from image of fish trait segmentation. -The goals of the tool is to extract presence-absence table, measurements and landmarks of fish from the segmented fish image produced by M. Maruf. -It provides a framework for creating modularized tools by using classes and jupyter notebook to visualize and test functionalities. -The tool is automatically containerized when a new release is published. It provides a validated version for easy integration into the [BGNN_Snakemake](https://github.com/hdr-bgnn/BGNN_Snakemake). -This tool can me made more generalizable but it has been developed for the [Minnows Project](https://github.com/hdr-bgnn/minnowTraits). +The primary goal of this repository is to produce a presence table (default), create landmarks (optional), visualize landmarks (optional), and extract measurements (optional) from a segmented fish image (see [BGNN-trait-segmentation](https://github.com/hdr-bgnn/BGNN-trait-segmentation)). +Secondarily, this repository provides a framework for creating modularized tools by using classes. -## 1- Segmented image .png description +Finally, this repository also provides a way to visualize outputs of the tools and test functionalities using a jupyter notebook. -The segmented image input looks like image below, with traits color coded and identified by "blobs". The segmentation model uses [M. Maruf's segmentation code](https://github.com/hdr-bgnn/BGNN-trait-segmentation/blob/main/Segment_mini/scripts/segmentation_main.py), and is based on a Convolutional Neural Network (CNN; more specifically unet). You can find more information on the [BGNN-trait-segmentation repository](https://github.com/hdr-bgnn/BGNN-trait-segmentation). The output is 11 classes (traits) that are color coded. We are only using 9 of them, and are excluding alt_fin_ray and caudal_fin_ray. +A Docker container image of this codebase and its dependencies is automatically built and published each time a new release is published. The Docker container isolates dependencies and facilitates reproducible and version-tagged integration into Imageomics workflows (see for example the [BGNN core workflow](https://github.com/hdr-bgnn/BGNN_Snakemake)). + +This tool was originally developed for the [Minnows Project](https://github.com/hdr-bgnn/Minnow_Segmented_Traits). + +## 1- Input: Segmented Image + +The input is the output of the segmentation model, which is a file named basename_segmented.png, where "basename" is the file name for the original image. The segmentation model is based on a Convolutional Neural Network (CNN; more specifically unet). More information can be found In the [BGNN-trait-segmentation repository](https://github.com/hdr-bgnn/BGNN-trait-segmentation), in particular see the [segmentation code there](https://github.com/hdr-bgnn/BGNN-trait-segmentation/blob/main/Segment_mini/scripts/segmentation_main.py). + +A segmented image looks like the image below, with traits color-coded and visualized by "blobs", which are detatched entities that denote the same trait. There are 11 trait classes corresponding to the annotated traits. Here, only 9 are used (alt_fin_ray and caudal_fin_ray are excluded). ![segmented fish image](Test_Data/INHS_FISH_000742_segmented.png) @@ -30,27 +35,39 @@ When you export this image in python using pillow library (PIL.Image.open(file_n * 'trunk': [0, 124, 124] -## 2- Presence Absence of Traits +## 2- Default Tool: Presence + +The input of the presence tool, is a segmented image named "basename_segmented.png" (as generated by the [BGNN-trait-segmentation](https://github.com/hdr-bgnn/BGNN-trait-segmentation) code). + +The output of the presence tool is a presence table named "basename_presence.json" for each image. + +To check for the presence of traits, the tool: + + 1. Isolates an individual trait (e.g., isolates the dorsal_fin) + 2. Counts the number of blobs for that trait + 3. Calculates the percentage of the area of the largest blob as a proportion of the area of the total blobs for that trait + - i.e., if only 1 blob is found, that percentage is 100% + + +## 3- Optional Tools: Landmark, Visualize, Metadata, and Measure -The approach that we use for checking presence and absence of traits is the following: +The input for the measure tool is a metadata file named "basename_metadata.json" generated by [Drexel](https://github.com/hdr-bgnn/drexel_metadata). - 1. Isolate indivual traits (e.g., isolate the dorsal_fin) - 2. Count number of blobs - 3. Calculate percentage of large blob +To create landmarks, visualizations, and extract measurements, the tool: + 1. Isolates an individual trait (e.g., isolates the dorsal_fin) + 2. Removes any small blobs of a trait and fills in holes within blobs of a trait + 3. Identify and place landmarks + 4. Extract measurements of traits using landmarks (_lm), masks (_m), or a bounding box (_bbox) -## 3- Other Functions +The user can specify which method of measurement extraction to use (described more below under *measurements*). -The approach that we use for creating landmarks and extracting measurements is the following: +Please create an [issue](https://github.com/hdr-bgnn/Morphology-analysis/issues/new) to suggest an additional landmark or measurement. - 1. Remove small blobs and fill in gaps within each trait - 2. Identify landmarks - 3. Use landmarks, bounding box, and morphological tools (centroid, area, etc.) to extract the measurements -If you had more features in the class and codes to extract landmarks or measurement, please create an issue or make a pull request to update the image description and corresponding table. - ### Landmarks +The landmarks used are shown and described below: ![Fish landmarks](Traits_description/Minnow_Landmarks_v1.png) @@ -77,9 +94,31 @@ If you had more features in the class and codes to extract landmarks or measurem 16 | Dorsal-most (upper) part of eye) | Top-most point of the eye mask defined by top boundary of the bbox 17 | Ventral-most (lower) part of eye | Furthest bottom point of the eye mask defined by bottom boundary of the bbox 18 | Center (centroid) of eye | Center of the eye mask + + +### Visualize + +This tool saves the segmented fish image with landmarks. + +Here is an [example](Test_Data/INHS_FISH_000742_image_lm.png) of a visualization of the landmarks on the segmented image. + + +### Metadata + +This is an input file. The file should be generated by [drexel](https://github.com/hdr-bgnn/drexel_metadata/tree/Thibault/gen_metadata_mini) and named basename_metadata.json. +This file contains information about the rule and scale (pixels per cm) to be used for the **Measure** tool. + +The scale function within this tool outputs presence of a ruler, the scale in pixels, and the unit (cm), if a ruler is found. +If no ruler is found, the tool prints "none" and gives "" value for the scale and unit. + +Here is an [example](Test_Data/INHS_FISH_000742.json) of a metadata file. + -#### Measurements +### Measure +The output of the measure tool is a file named basename_measure.json that has measurements (in pixels) for each trait. + +The measurements extracted are shown and described below: ![Fish measurment](Traits_description/Minnow_Measurements_v1.png) @@ -101,140 +140,173 @@ If you had more features in the class and codes to extract landmarks or measurem distance | eye diameter using landmarks | ED_bbox | length across the eye along the anterior-posterior (left-right) axis (distance between the left-right sides of a bounding box around the eye) angle | fish angle using landmarks | Fa_lm | angle of the tilt of the fish from horizontal (angle between the SL_lm and and the horizontal line of the image) angle | fish angle using PCA | Fa_pca | angle of the tilt of the fish from horizontal (angle between the pca through the midline of the fish mask and the horizontal line of the image) + -#### method of measurement extraction +#### Method of Measurement Extraction + +Each method of measurement extraction is in a separate Python class, adding flexibility. These are described in [Trait_class](Scripts/Traits_class.py) -We created classes to add more flexibility, which can be helpful to generalize to other projects. [Trait_class](Scripts/Traits_class.py) +Since the functions are modular, the method for extracting measurements can be specified: -Since the functions are modular, we can specify different methods for extracting measurements. _landmarks_ -These measurement trait classes functions have the suffix "_lm"_ to denote the method of extraction. -The lengths (in pixels) are calculated as the distance between two landmarks (described in the "Definition" column of the trait description csvs). -_bbox (bounding box)_ -These trait classes functions have the suffix "_bbox"_ to denote the method of extraction. -The lengths (in pixels) are calculated as the distance of a perpindicular line between the edges (either vertical or horizontal) of the bbox. +These trait measurement class functions have the suffix *"_lm"* to denote the method of extraction. +The lengths (in pixels) are calculated as the distance between two landmarks (described in the "Definition" column of the trait description csv). + + +_bounding box (bbox)_ + + +These trait class functions have the suffix *"_bbox"* to denote the method of extraction. +The lengths (in pixels) are calculated as the distance of a perpendicular line between the edges (either vertical or horizontal) of the bounding box (bbox). -_mask_ -These trait classes have the suffix "_m"_ to denote the method of extraction. #### Areas -Areas are calculated as the total pixels in the mask of a trait (e.g., head area is the area of the mask of the head). These are described in the "Definition" column of the Minnow_Measurements_v1.csv. +Areas are calculated as the total pixels in the mask of a trait (e.g., head area is the area of the mask of the head). These are described in the "Definition" column of [Minnow_Measurements_v1.csv](Traits_description/Minnow_Measurements_v1.csv). + +_mask_ + + +These trait classes have the suffix *"_m"* to denote the method of extraction. + + +## 4- Usage + +By default Morphology_main.py outputs a presence table. Creating landmarks, visualizing landmarks, and measurement tables are optional outputs. -## 4- Usage, input and output +The inputs for the tools are: -By default, without option, Morphology_main.py will output the presence and absence table. Morphology, landmarks tables and landmark image are optional output. +* segmented image: filename_segmented.png (**required**) +* Metadata: basename_metadata.json (*optional*) -The outputs for the functions are a series of .json files and .png file. - + presence_matrix.json - + measurements.json - + landmark.json - + landmark_image.png +The outputs for the tools are: -Usage: +* Presence: basename_presence.json (**required**) +* Landmark: basename_landmark.json (*optional*) +* Visualize: basename_lm_image.png (*optional*) +* Measure: basename_morphology.json (*optional*) + +The main arguments for running `Morphology_main.py` are: ``` -Morphology_main.py [-h] [--metadata METADATA] [--morphology MORPHOLOGY] [--landmark LANDMARK] [--lm_image LM_IMAGE] input_image output_presence +Morphology_main.py input_image output_presence ``` -Example with Test_Data: +To add optional tools, simply add one or all of the following, where "--" denotes the tool and the second term is the output file: + +``` +Morphology_main.py [-h] [--metadata METADATA] [--morphology MORPHOLOGY] [--landmark LANDMARK] [--lm_image LM_IMAGE] input_image output_presence ``` + +Where below are the specific tools: + +*landmark* +``` +--landmark LANDMARK.json +``` + +*visualize* +``` +--visualize VISUALIZE.png +``` + +*measure* +``` +--measure MEASURE.json +``` + + +Here is an example using [test data](Test_Data): +```sh cd Morphology-analysis/ ./Script/Morphology_main.py --metadata Test_Data/INHS_FISH_000742.json --morphology Test_Data/INHS_FISH_000742_measure.json --landmark Test_Data/INHS_FISH_000742_landmark.json --lm_image Test_Data/INHS_FISH_000742_image_lm.png Test_Data/INHS_FISH_000742_segmented.png Test_Data/INHS_FISH_000742_presence.json ``` - + metadata.json : **Optional**. Path to input file metadata.json. The file should be generated by [drexel](https://github.com/hdr-bgnn/drexel_metadata/tree/Thibault/gen_metadata_mini) [example here] and formatted by [drexel_metadata_formatter](https://github.com/hdr-bgnn/drexel_metadata_formatter). [example](Test_Data/INHS_FISH_000742.json). - + morphology.json : **Optional**. Save the morphology dictionnary with the filename provided. Expected shape, dictionnary, [example](Test_Data/INHS_FISH_000742_measure.json) - + landmark.json : **Optional**. Save the landmark dictionnary with the filename provided. dictionnary, key = landmark label, value = calculated value [example](Test_Data/INHS_FISH_000742_landmark.json) - + image_lm.png : **Optional**. Save the visualization image for landmarks with the filename provided. [example](Test_Data/INHS_FISH_000742_image_lm.png) - + input_file.png : **Positional require**. Segmented fish image generated by [Maruf code](https://github.com/hdr-bgnn/BGNN-trait-segmentation/tree/main/Segment_mini), [example](Test_Data/INHS_FISH_000742_segmented.png) - + ouput_presence.json : **Positional require**. Save the presence-absence dictionnary with the filename provided [example](Test_Data/INHS_FISH_000742_presence.json) - -## 5- Container, usage and release - -We use github action to create a container what run the main script [Morphology_main.py](Scripts/Morphology_main.py). - 1. The workflow to build the container is defined [here](.github/workflows/Deploy_Morpholgy.yml). - 2. The Dockerfile definition is [here](Dockerfile) - 3. Pull command : - ``` - docker pull ghcr.io/hdr-bgnn/morphology-analysis/morphology:latest - #or - singularity pull docker://ghcr.io/hdr-bgnn/morphology-analysis/morphology:latest - ``` - 4. To access the usage. (equivalent to : Morphology_main.py -h) : " - ``` - singularity run morphology_latest.sif - ``` - 5. Usage : - ``` - singularity exec morphology_latest.sif Morphology_main.py --metadata --morphology --landmark --lm_image - # test Example - singularity exec morphology_latest.sif Morphology_main.py --metadata Test_Data/INHS_FISH_000742.json --morphology Test_Data/INHS_FISH_000742_measure.json --landmark Test_Data/INHS_FISH_000742_landmark.json --lm_image Test_Data/INHS_FISH_000742_image_lm.png Test_Data/INHS_FISH_000742_segmented.png Test_Data/INHS_FISH_000742_presence.json - ``` - -## 6- Notebook to play -**Work in Progress (open to improvement and development)** +If no arguments are given, an error message will say "missing two positional arguments", which are the input file and the output file. Use "-h" to pull up the help file with the full list of arguments. -In development, you can check [this notebook](https://github.com/hdr-bgnn/Morphology-analysis/blob/main/Scripts/Morphology_dev.ipynb) -You will need to use [Morphology_env.yml](https://github.com/hdr-bgnn/Morphology-analysis/blob/main/Scripts/morphology_env.yml) to set up your environment before working (required dependencies). I recommend conda, miniconda as environment manager. -To set up your virtual environment in the OSC: +## 5- Containerization & Versioning -#go to OSC home directory -#open a cluster +Upon publishing a new release, a Docker container image is automatically built from the release and published on the GitHub container and package registry. The published image is tagged with major, major.minor, and major.minor.patch versions corresponding to the release. -#clone the repository onto your home directory -```git clone ``` +The workflow to build the container is defined as a GitHub action [here](.github/workflows/Deploy_Morpholgy.yml). -#navigate to scripts -```cd Morphology-analysis/Scripts``` +The Dockerfile definition is [here](Dockerfile). -#use conda +Pull the latest image (in an HPC environment, docker is typically not supported but singularity is): ``` -module load miniconda3 -conda info -e #see what environments you have; you should be on "base" -conda env create -f morphology_env.yml -n morphology_env +docker pull ghcr.io/hdr-bgnn/morphology-analysis/morphology:latest +#singularity pull docker://ghcr.io/hdr-bgnn/morphology-analysis/morphology:latest ``` --f means files to select (which is morphology_env.yml) --n means to name the virtual environment, which here is "morphology_env" - -#check that environment was made -```conda info -e``` -#now you have a virtual environment! -#to activate it: +Run the container (assuming an HPC environment that supports singularity but not docker): ``` -source acitvate morphology_env -#check that you're on the virtual environment -conda info -e #you should be on "morphology_env" +singularity exec morphology_latest.sif Morphology_main.py --metadata --morphology --landmark --lm_image + +# Example +singularity exec morphology_latest.sif Morphology_main.py --metadata Test_Data/INHS_FISH_000742.json --morphology Test_Data/INHS_FISH_000742_measure.json --landmark Test_Data/INHS_FISH_000742_landmark.json --lm_image Test_Data/INHS_FISH_000742_image_lm.png Test_Data/INHS_FISH_000742_segmented.png Test_Data/INHS_FISH_000742_presence.json ``` -Once the environment is set up, you do not need to recreate it. -Launch the jupyter notebook app and set your kernel to "Python Morphology_jupyter". + +## 6- Use locally + +A [jupyter notebook](Scripts/Morphology_dev.ipynb) is provided that will allow generation and visualization of morphological traits for some sample data. + + +### Set up + +*Requirements* + +- Install [Jupyter](https://jupyter.org/install) +- Other requirements are defined in [Scripts/morphology_env.yml](Scripts/morphology_env.yml) + +[conda](https://docs.conda.io/en/latest/) command line tool can be used to install the requirments. If the conda command line tool is not already installed one popular python distribution that provides conda is [miniconda](https://docs.conda.io/en/latest/miniconda.html). + +*Clone Repository* + +Before starting, please clone this repository and check that you are in the base directory. + + +### Local Usage + +Both jupyter and the [Scripts/morphology_env.yml](Scripts/morphology_env.yml) requirements can be installed into a single environment for simplicity. + +To create an environment named `morphology` with these requirements, run the following commands: + ``` -#activate the virtual environment kernel for jupyter notebook -pip install ipykernel -python -m ipykernel install --user --name morphlogy_env --display-name "Python (Morphology_jupyter)" +conda env create -f Scripts/morphology_env.yml +conda activate morphology +pip install jupyter-lab ``` -Once you set up the kernel for jupyter notebook, you do not need to do it again. -**Launch Jupyter notebook Morphology_dev.ipynb** - + Use OSC dashboard [onthedemand](https://ondemand.osc.edu/pun/sys/dashboard) - + Tab Interactive Apps - + Select Jupyter notebook - + Choose the configuration you want (start with cores:1 Number_hours:1, Node_type:any) - + Launch - + Navigate to ~/Morphology-analysis/Scripts/Morphology_dev.ipynb - + Change kernel to Morphology_jupyter -## 7- Development tricks +Then to start jupyter run the following command: -If you want to test new version of Morphology_main.py (upudated version on your local computer). You can use the container by bind the local folder (here it is in Scripts/) containing the updated version of Morphology_main.py and /pipeline/Morphology is where Morphology_main.py is expected to be in the container. Example: ``` -singularity exec --bind Scripts/:/pipeline/Morphology morpho.sif Morphology_main.py --metadata Test_Data/INHS_FISH_000742.json --morphology Test_Data/INHS_FISH_000742_measure.json --landmark Test_Data/INHS_FISH_000742_landmark.json --lm_image Test_Data/INHS_FISH_000742_image_lm.png Test_Data/INHS_FISH_000742_segmented.png Test_Data/INHS_FISH_000742_presence.json +jupyter-lab +``` + +*If you see a jupyter-lab command not found error you may need to first run `conda activate morphology`.* + +Once your web browser opens jupyter noteboook, navigate to the Scripts directory and doubleclick Morphology_dev.ipynb. If prompted, select the default python environment. + + +### Cluster Usage + +To run this notebook on a cluster requires creating a conda environment from [Scripts/morphology_env.yml](Scripts/morphology_env.yml) and setting up a jupyter [kernel](https://docs.jupyter.org/en/latest/projects/kernels.html#kernels). Clusters typically provide their own version of jupyter so that software will not need to be installed, but a kernel must be setup so the cluster provided jupyter software can find your conda environment. + +If your cluster provides conda via a minconda3 module the following commands will create an environment named `morphology` with the required dependencies: ``` +module load miniconda3 +conda env create -f Scripts/morphology_env.yml +``` + +Next you will need to configure the cluster-provided jupyter software to use the `morphology` conda environment by setting up a kernel. +Consult your specific cluster's documentation for instructions to setup the kernel. For more about setting up kernels see [ipython documentation](https://ipython.readthedocs.io/en/stable/install/kernel_install.html#kernels-for-different-environments). + +Once the kernel is set up, launch juypter notebook, navigate to Scripts directory, and double-click Morphology_dev.ipynb. If prompted select the kernel associated with the `morphology` conda environment.