Last pre-release checks

SchynsLab · Jan 10, 2023 · 7249e6d · 7249e6d
1 parent 9edc6a7
commit 7249e6d
Show file tree

Hide file tree

Showing 15 changed files with 184 additions and 174 deletions.
diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md
@@ -1,9 +1,9 @@
 # Medusa installation
 
-Medusa is a Python package which works with Python versions 3.9 and above. We recommend
-using Python version 3.9. Moreover, we strongly recommend to install the `medusa` package
+Medusa is a Python package which works with Python version 3.9 and on Linux, Windows,
+and Mac (except Mac M1/M2). Moreover, we strongly recommend to install the `medusa` package
 in a separate environment, using for example [conda](https://anaconda.org/anaconda/conda).
-If you'd use *conda*, you can create a new environment named "medusa" with python 3.9
+If you'd use *conda*, you can create a new environment named "medusa" with Python 3.9
 as follows:
 
 ```console
@@ -18,8 +18,13 @@ conda activate medusa
 
 The next step is to install Medusa. Medusa actually offers two version of the package:
 `medusa` and `medusa-gpu`, where the latter can be used instead of the former if you
-have access to an NVIDIA GPU (and CUDA version 11.6). When you're not sure whether
-you have access to an appropriate GPU, install the regular `medusa` package.
+have access to an NVIDIA GPU (and CUDA version 11.6). Actually, `medusa-gpu` can also
+be installed and used on systems without a GPU, but the installation is noticeably
+larger (~2GB, instead of 300MB for the CPU version). When you're not sure whether
+you have access to an appropriate GPU, we recommend installing the regular `medusa` package.
+
+To install Medusa, run one of the commands listed below in your terminal (with the right
+environment activated):
 
 `````{tab-set}
 
@@ -37,18 +42,33 @@ pip install https://github.com/medusa-4D/medusa/releases/download/v0.0.3/medusa_
 
 `````
 
-At this point, `medusa` can be used, but only the Mediapipe reconstruction model can be
-used. To be able to use the FLAME-based reconstruction models such as DECA, EMOCA, and
+```{note}
+While installing Python packages/wheels from other locations than PyPI is generally
+discouraged, Medusa actually hosts its builds in its own Github repository (as you can
+see in the install commands above). The reason for doing so (instead of on PyPI) is that
+Medusa depends on a specific version of [PyTorch](https://pytorch.org/), which itself
+is not available on PyPI (only as a wheel). Listing non-PyPI dependencies in packages
+is not permitted by PyPI, which is why Medusa wheels are hosted on Github.
+
+If you want to build Medusa yourself, you can clone the repository and run the
+`build_wheels` script, which will create a directory `dist` with two wheel files
+(one for `medusa` and one for `medusa-gpu`).
+```
+
+At this point, `medusa` can be used, but only the Mediapipe reconstruction model will be
+available. To be able to use the FLAME-based reconstruction models such as DECA, EMOCA, and
 Spectre, you need to download some additional data. Importantly, before you do, you need
 to [register](https://flame.is.tue.mpg.de/register.php) on the [FLAME website](https://flame.is.tue.mpg.de/index.html)
 and accept their [license terms](https://flame.is.tue.mpg.de/modellicense.html).
 
 After creating an account, you can download all external data with the
 `medusa_download_ext_data` command. To download all data to new directory
-(medusa_ext_data), you'd run:
+(default location: `~/.medusa_ext_data`), you'd run:
 
 ```console
 medusa_download_ext_data --directory medusa_ext_data --username your_flame_username --password your_flame_passwd
 ```
 
-After all data has been downloaded (~1.8GB), all Medusa functionality should be available!
+where `your_flame_username` and `your_flame_passwd` are the username and password associated
+with the account you created on the FLAME website. After all data has been downloaded
+(~1.8GB), all Medusa functionality should be available!
diff --git a/docs/index.md b/docs/index.md
@@ -6,25 +6,24 @@
 ![coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/lukassnoek/cb6da52c965ec24f136b74a1ebad1964/raw/medusa_interrogate_badge.json)
 ![Python](https://img.shields.io/badge/python-3.9-blue.svg)
 
-Medusa is a Python toolbox to perform 4D face reconstruction and analysis. You can use it
-to reconstruct a series of 3D meshes of (moving) faces from video files: one 3D mesh for
-each frame of the video (resulting in a "4D" representation of facial movement). In
-addition to functionality to reconstruct faces, Medusa also contains functionality to
-preprocess and visualize the resulting 4D reconstructions.
+Medusa is Python toolbox for face image and video analysis. It offers tools for face
+detection, alignment, rendering, and most importantly, *4D reconstruction*.
+Using state-of-the-art 3D reconstruction models, Medusa can track and reconstruct faces
+in videos (one 3D mesh per face, per frame) and thus provide a way to automatically
+measure and quantify face movement as 4D signals.
 
-More specifically, Medusa allows you to reconstruct, preprocess, and analyze
-frame-by-frame time series of 3D faces from videos. The data that Medusa outputs is
-basically a set of 3D points ("vertices"), which together represent face shape.
-Medusa then processes these points in a similar way that fMRI or EEG/MEG software
-processes voxels or sensors, but instead of representing "brain activity", it represents
-face movement! Medusa makes relatively few assumptions as to how you want
-to (further) analyze the face and just returns the raw set of vertices. For some ideas on
+In Medusa, 4D reconstruction data is represented as a series of 3D meshes. Each mesh
+describes the face shape at a particular frame in the video, and the changes in the
+meshes over time thus decribe facial *movement* (including expression) quantitatively
+and dynamically. Medusa makes relatively few assumptions as to how you want to (further)
+analyze the face and just returns the raw set of vertices. For some ideas on
 how to analyze such data, check out the [analysis tutorials](tutorials/analysis) (WIP).
 
 ## Documentation overview
 
-On this website, you can find general information about Medusa (such as how to [install](getting_started/installation)
-and [cite](getting_started/citation) it), as well as several tutorials
-and details on Medusa's [command-line interface](api/cli) and [Python interface](api/python).
+On this website, you can find general information about Medusa (such as how to
+[install](getting_started/installation) and [cite](getting_started/citation) it), as
+well as several tutorials and details on Medusa's [command-line interface](api/cli) and
+[Python interface](api/python).
 
 A great way to get more familiar with the package is to check out the [quickstart](getting_started/quickstart)!
diff --git a/medusa/containers/results.py b/medusa/containers/results.py
@@ -308,23 +308,23 @@ def visualize(
             # BELOW: OLD CODE TO CREATE BOUNDING BOX FROM CROPPED IMAGES
             # bbox_crop = torch.tensor([[0, 0], [0, h-1], [h-1, w-1], [0, w-1]], dtype=torch.float32, device=self.device)
             # bbox_crop = bbox_crop.repeat(b, 1, 1)
-            # crop_mats = torch.inverse(self.crop_mats[idx])
-            # bbox = transform_points(crop_mats, bbox_crop)
+            # crop_mat = torch.inverse(self.crop_mat[idx])
+            # bbox = transform_points(crop_mat, bbox_crop)
 
             # Check for landmarks (`lms`), which we'll draw if available
             if hasattr(self, "lms"):
                 lms = self.lms[det_idx]
 
                 if show_cropped:
                     # Need to crop the original images!
-                    crop_mats = self.crop_mats[det_idx]
+                    crop_mat = self.crop_mat[det_idx]
                     img = warp_affine(
-                        img.unsqueeze(0).float(), crop_mats[:, :2, :], crop_size
+                        img.unsqueeze(0).float(), crop_mat[:, :2, :], crop_size
                     )
                     img = img.to(torch.uint8).squeeze(0)
 
                     # And warp the landmarks to the cropped image space
-                    lms = transform_points(crop_mats, lms)
+                    lms = transform_points(crop_mat, lms)
 
                 # TODO: scale radius
                 img = draw_keypoints(img, lms, colors=(0, 255, 0), radius=2)
@@ -334,11 +334,11 @@ def visualize(
                     if show_cropped:
                         template_ = template.unsqueeze(0)
                     else:
-                        crop_mats = torch.inverse(self.crop_mats[det_idx])
+                        crop_mat = torch.inverse(self.crop_mat[det_idx])
                         template_ = template.repeat(lms.shape[0], 1, 1).to(
-                            crop_mats.device
+                            crop_mat.device
                         )
-                        template_ = transform_points(crop_mats, template_)
+                        template_ = transform_points(crop_mat, template_)
 
                     img = draw_keypoints(img, template_, colors=(0, 0, 255), radius=1.5)
                     img = img.to(self.device)

diff --git a/medusa/crop/__init__.py b/medusa/crop/__init__.py
@@ -1,7 +1,8 @@
 """Top-level module with two main crop models:
-* ``LandmarkAlignCropModel``
-* ``LandmarkBboxCropModel``
+
+* ``AlignCropModel``
+* ``BboxCropModel``
 """
 
-from .align_crop import LandmarkAlignCropModel
-from .bbox_crop import LandmarkBboxCropModel
+from .align_crop import AlignCropModel
+from .bbox_crop import BboxCropModel
diff --git a/medusa/crop/align_crop.py b/medusa/crop/align_crop.py
@@ -29,7 +29,7 @@
 The coordinates are relative to an image of size 112 x 112."""
 
 
-class LandmarkAlignCropModel(BaseCropModel):
+class AlignCropModel(BaseCropModel):
     """Cropping model based on functionality from the ``insightface`` package,
     as used by MICA (https://github.com/Zielon/MICA).
 
@@ -52,7 +52,7 @@ class LandmarkAlignCropModel(BaseCropModel):
     To crop an image to be used for MICA reconstruction:
 
     >>> from medusa.data import get_example_frame
-    >>> crop_model = LandmarkAlignCropModel()
+    >>> crop_model = AlignCropModel()
     >>> img = get_example_frame()  # path to jpg image
     >>> out = crop_model(img)
     """
@@ -90,7 +90,7 @@ def __call__(self, imgs):
         -------
         out_crop : dict
             Dictionary with cropping outputs; includes the keys "imgs_crop" (cropped
-            images) and "crop_mats" (3x3 crop matrices)
+            images) and "crop_mat" (3x3 crop matrices)
         """
         # Load images here instead of in detector to avoid loading them twice
         imgs = load_inputs(
@@ -100,17 +100,17 @@ def __call__(self, imgs):
         out_det = self._det_model(imgs)
 
         if out_det.get("conf", None) is None:
-            return {"imgs_crop": None, "crop_mats": None, **out_det}
+            return {"imgs_crop": None, "crop_mat": None, **out_det}
 
         # Estimate transform landmarks -> template landmarks
-        crop_mats = estimate_similarity_transform(
+        crop_mat = estimate_similarity_transform(
             out_det["lms"], self.template, estimate_scale=True
         )
         imgs_stacked = imgs[out_det["img_idx"]]
         imgs_crop = warp_affine(
-            imgs_stacked, crop_mats[:, :2, :], dsize=self.output_size
+            imgs_stacked, crop_mat[:, :2, :], dsize=self.output_size
         )
 
-        out_crop = {"imgs_crop": imgs_crop, "crop_mats": crop_mats, **out_det}
+        out_crop = {"imgs_crop": imgs_crop, "crop_mat": crop_mat, **out_det}
 
         return out_crop
diff --git a/medusa/crop/bbox_crop.py b/medusa/crop/bbox_crop.py
@@ -18,7 +18,7 @@
 from .base import BaseCropModel
 
 
-class LandmarkBboxCropModel(BaseCropModel):
+class BboxCropModel(BaseCropModel):
     """A model that crops an image by creating a bounding box based on a set of
     face landmarks.
 
@@ -107,7 +107,7 @@ def __call__(self, imgs):
         -------
         out_crop : dict
             Dictionary with cropping outputs; includes the keys "imgs_crop" (cropped
-            images) and "crop_mats" (3x3 crop matrices)
+            images) and "crop_mat" (3x3 crop matrices)
         """
         # Load images here instead of in detector to avoid loading them twice
 
@@ -118,7 +118,7 @@ def __call__(self, imgs):
         out_det = self._detector(imgs)
 
         if out_det.get("conf", None) is None:
-            return {**out_det, "imgs_crop": None, "crop_mats": None}
+            return {**out_det, "imgs_crop": None, "crop_mat": None}
 
         n_det = out_det["lms"].shape[0]
         bbox = out_det["bbox"]
@@ -173,17 +173,17 @@ def __call__(self, imgs):
             device=self.device,
         )
         dst = dst.repeat(n_det, 1, 1)
-        crop_mats = estimate_similarity_transform(
+        crop_mat = estimate_similarity_transform(
             bbox[:, :3, :], dst, estimate_scale=True
         )
 
         # Finally, warp the original images (uncropped) images to the final
         # cropped space
-        imgs_crop = warp_affine(imgs_stack, crop_mats[:, :2, :], dsize=(h_out, w_out))
+        imgs_crop = warp_affine(imgs_stack, crop_mat[:, :2, :], dsize=(h_out, w_out))
         out_crop = {
             **out_det,
             "imgs_crop": imgs_crop,
-            "crop_mats": crop_mats,
+            "crop_mat": crop_mat,
             "lms": lms,
         }