luxonis · ptoupas · Nov 11, 2024 · Nov 7, 2024 · Nov 7, 2024 · Nov 7, 2024
@@ -21,13 +21,18 @@ Convert your **ONNX** models to a format compatible with any generation of Luxon
 
 ## Table of Contents
 
-- [MLOps - Compilation Library](#mlops---compilation-library)
+- [ModelConverter - Compilation Library](#modelconverter---compilation-library)
+  - [Status](#status)
   - [Table of Contents](#table-of-contents)
   - [Installation](#installation)
+    - [System Requirements](#system-requirements)
     - [Before You Begin](#before-you-begin)
     - [Instructions](#instructions)
     - [GPU Support](#gpu-support)
   - [Running ModelConverter](#running-modelconverter)
+    - [Encoding Configuration Flags](#encoding-configuration-flags)
+      - [YAML Configuration File](#yaml-configuration-file)
+      - [NN Archive Configuration File](#nn-archive-configuration-file)
     - [Sharing Files](#sharing-files)
     - [Usage](#usage)
     - [Examples](#examples)
@@ -101,9 +106,58 @@ To enable GPU acceleration for `hailo` conversion, install the [Nvidia Container
 
 ## Running ModelConverter
 
-Configuration for the conversion predominantly relies on a `yaml` config file. For reference, see [defaults.yaml](shared_with_container/configs/defaults.yaml) and other examples located in the [shared_with_container/configs](shared_with_container/configs) directory.
+There are two main ways to execute configure the conversion process:
 
-However, you have the flexibility to modify specific settings without altering the config file itself. This is done using command line arguments. You provide the arguments in the form of `key value` pairs. For better understanding, see [Examples](#examples).
+1. **YAML Config File (Primary Method)**:
+   The primary way to configure the conversion is through a YAML configuration file. For reference, you can check [defaults.yaml](shared_with_container/configs/defaults.yaml) and other examples located in the [shared_with_container/configs](shared_with_container/configs) directory.
+1. **NN Archive**:
+   Alternatively, you can use an [NN Archive](https://rvc4.docs.luxonis.com/software/ai-inference/nn-archive/#NN%20Archive) as input. An NN Archive includes a model in one of the supported formats—ONNX (.onnx), OpenVINO IR (.xml and .bin), or TensorFlow Lite (.tflite)—alongside a `config.json` file. The config.json file follows a specific configuration format as described in the [NN Archive Configuration Guide](https://rvc4.docs.luxonis.com/software/ai-inference/nn-archive/#NN%20Archive-Configuration).
+
+**Modifying Settings with Command-Line Arguments**:
+In addition to these two configuration methods, you have the flexibility to override specific settings directly via command-line arguments. By supplying `key-value` pairs in the CLI, you can adjust particular settings without explicitly altering the config files (YAML or NN Archive). For further details, refer to the [Examples](#examples) section.
+
+### Encoding Configuration Flags
+
+In the conversion process, you have options to control the color encoding format in both the YAML configuration file and the NN Archive configuration. Here’s a breakdown of each available flag:
+
+#### YAML Configuration File
+
+The `encoding` flag in the YAML configuration file allows you to specify color encoding as follows:
+
+- **Single-Value `encoding`**:
+  Setting encoding to a single value, such as *"RGB"*, *"BGR"*, *"GRAY"*, or *"NONE"*, will automatically apply this setting to both `encoding.from` and `encoding.to`. For example, `encoding: RGB` sets both `encoding.from` and `encoding.to` to *"RGB"* internally.
+- **Multi-Value `encoding.from` and `encoding.to`**:
+  Alternatively, you can explicitly set `encoding.from` and `encoding.to` to different values. For example:
+  ```yaml
+  encoding:
+    from: RGB
+    to: BGR
+  ```
+  This configuration specifies that the input data is in RGB format and will be converted to BGR format during processing.
+
+> [!NOTE]
+> If the encoding is not specified in the YAML configuration, the default values are set to `encoding.from=RGB` and `encoding.to=BGR`.
+
+> [!NOTE]
+> Certain options can be set **globally**, applying to all inputs of the model, or **per input**. If specified per input, these settings will override the global configuration for that input alone. The options that support this flexibility include `scale_values`, `mean_values`, `encoding`, `data_type`, `shape`, and `layout`.
+
+#### NN Archive Configuration File
+
+In the NN Archive configuration, there are two flags related to color encoding control:
+
+- **`dai_type`**:
+  Provides a more comprehensive control over the input type compatible with the DAI backend. It is read by DepthAI to automatically configure the processing pipeline, including any necessary modifications to the input image format.
+- **`reverse_channels` (Deprecated)**:
+  Determines the input color format of the model: when set to *True*, the input is considered to be *"RGB"*, and when set to *False*, it is treated as *"BGR"*. This flag is deprecated and will be replaced by the `dai_type` flag in future versions.
+
+> [!NOTE]
+> If neither `dai_type` nor `reverse_channels` the input to the model is considered to be *"RGB"*.
+
+> [!NOTE]
+> If both `dai_type` and `reverse_channels` are provided, the converter will give priority to `dai_type`.
+
+> [!IMPORTANT]
+> Provide mean/scale values in the original color format used during model training (e.g., RGB or BGR). Any necessary channel permutation is handled internally—do not reorder values manually.
 
 ### Sharing Files
 
@@ -158,6 +212,7 @@ You can run the built image either manually using the `docker run` command or us
 1. Execute the conversion:
 
 - If using the `docker run` command:
+
   ```bash
   docker run --rm -it \
     -v $(pwd)/shared_with_container:/app/shared_with_container/ \
@@ -168,11 +223,15 @@ You can run the built image either manually using the `docker run` command or us
     convert <target> \
     --path <s3_url_or_path> [ config overrides ]
   ```
+
 - If using the `modelconverter` CLI:
+
   ```bash
   modelconverter convert <target> --path <s3_url_or_path> [ config overrides ]
   ```
+
 - If using `docker-compose`:
+
   ```bash
   docker compose run <target> convert <target> ...
   ```
@@ -200,13 +259,17 @@ Specify all options via the command line without a config file:
 ```bash
 modelconverter convert rvc2 input_model models/yolov6n.onnx \
                         scale_values "[255,255,255]" \
-                        reverse_input_channels True \
+                        inputs.0.encoding.from RGB \
+                        inputs.0.encoding.to BGR \
                         shape "[1,3,256,256]" \
                         outputs.0.name out_0 \
                         outputs.1.name out_1 \
                         outputs.2.name out_2
 ```
 
+> [!WARNING]
+> If you modify the default stages names (`stages.stage_name`) in the configuration file (`config.yaml`), you need to provide the full path to each stage in the command-line arguments. For instance, if a stage name is changed to `stage1`, use `stages.stage1.inputs.0.name` instead of `inputs.0.name`.
+
 ## Multi-Stage Conversion
 
 The converter supports multi-stage conversion. This means conversion of multiple

@@ -37,7 +37,7 @@
     MODELS_DIR,
     OUTPUTS_DIR,
 )
-from modelconverter.utils.types import Target
+from modelconverter.utils.types import DataType, Encoding, Target
 
 logger = logging.getLogger(__name__)
 
@@ -114,7 +114,7 @@ class Format(str, Enum):
 DevOption: TypeAlias = Annotated[
     bool,
     typer.Option(
-        help="Builds a new iamge and uses the development docker-compose file."
+        help="Builds a new image and uses the development docker-compose file."
     ),
 ]
 
@@ -212,18 +212,31 @@ def extract_preprocessing(
     for inp in stage_cfg.inputs:
         mean = inp.mean_values or [0, 0, 0]
         scale = inp.scale_values or [1, 1, 1]
+        encoding = inp.encoding
+        layout = inp.layout
+
+        dai_type = encoding.to.value
+        if dai_type != "NONE":
+            if inp.data_type == DataType.FLOAT16:
+                type = "F16F16F16"
+            else:
+                type = "888"
+            dai_type += type
+            dai_type += "i" if layout == "NHWC" else "p"
 
         preproc_block = PreprocessingBlock(
-            reverse_channels=inp.reverse_input_channels,
             mean=mean,
             scale=scale,
-            interleaved_to_planar=False,
+            reverse_channels=encoding.to == Encoding.RGB,
+            interleaved_to_planar=layout == "NHWC",
+            dai_type=dai_type,
         )
         preprocessing[inp.name] = preproc_block
 
         inp.mean_values = None
         inp.scale_values = None
-        inp.reverse_input_channels = False
+        inp.encoding.from_ = Encoding.NONE
+        inp.encoding.to = Encoding.NONE
 
     return cfg, preprocessing
 
@@ -477,9 +490,13 @@ def convert(
                     archive_cfg,
                     preprocessing,
                     main_stage,
-                    exporter.inference_model_path
-                    if isinstance(exporter, Exporter)
-                    else exporter.exporters[main_stage].inference_model_path,
+                    (
+                        exporter.inference_model_path
+                        if isinstance(exporter, Exporter)
+                        else exporter.exporters[
+                            main_stage
+                        ].inference_model_path
+                    ),
                 )
                 generator = ArchiveGenerator(
                     archive_name=f"{cfg.name}.{target.value.lower()}",

@@ -237,7 +237,7 @@ def _get_alls(self, runner: ClientRunner) -> str:
                 f"{mean_values},{scale_values},{hn_name})"
             )
 
-            if inp.reverse_input_channels:
+            if inp.encoding_mismatch:
                 alls.append(
                     f"bgr_to_rgb_{safe_name} = input_conversion("
                     f"{hn_name},bgr_to_rgb)"

@@ -109,25 +109,43 @@ def _export_openvino_ir(self) -> Path:
                 reverse_only=True,
             )
             for inp in self.inputs.values():
-                if inp.mean_values is not None and inp.reverse_input_channels:
+                if inp.mean_values is not None and inp.encoding_mismatch:
                     inp.mean_values = inp.mean_values[::-1]
-                if inp.scale_values is not None and inp.reverse_input_channels:
+                if inp.scale_values is not None and inp.encoding_mismatch:
                     inp.scale_values = inp.scale_values[::-1]
-                inp.reverse_input_channels = False
+                inp.encoding.from_ = Encoding.BGR
+                inp.encoding.to = Encoding.BGR
 
+        mean_values_str = ""
+        scale_values_str = ""
         for name, inp in self.inputs.items():
+            # Append mean values in a similar style
             if inp.mean_values is not None:
-                self._add_args(
-                    args,
-                    ["--mean_values", f"{name}{_lst_join(inp.mean_values)}"],
+                if mean_values_str:
+                    mean_values_str += ","
+                mean_values_str += (
+                    f"{name}[{', '.join(str(v) for v in inp.mean_values)}]"
                 )
+
+            # Append scale values in a similar style
             if inp.scale_values is not None:
-                self._add_args(
-                    args,
-                    ["--scale_values", f"{name}{_lst_join(inp.scale_values)}"],
+                if scale_values_str:
+                    scale_values_str += ","
+                scale_values_str += (
+                    f"{name}[{', '.join(str(v) for v in inp.scale_values)}]"
                 )
-            if inp.reverse_input_channels:
-                self._add_args(args, ["--reverse_input_channels"])
+        # Extend args with mean and scale values if they were collected
+        if mean_values_str:
+            args.extend(["--mean_values", mean_values_str])
+        if scale_values_str:
+            args.extend(["--scale_values", scale_values_str])
+
+        # Append reverse_input_channels flag only once if needed
+        reverse_input_flag = any(
+            inp.encoding_mismatch for inp in self.inputs.values()
+        )
+        if reverse_input_flag:
+            args.append("--reverse_input_channels")
 
         self._add_args(args, ["--input_model", self.input_model])
 
@@ -137,7 +155,7 @@ def _export_openvino_ir(self) -> Path:
         return self.input_model.with_suffix(".xml")
 
     def _check_reverse_channels(self):
-        reverses = [inp.reverse_input_channels for inp in self.inputs.values()]
+        reverses = [inp.encoding_mismatch for inp in self.inputs.values()]
         return all(reverses) or not any(reverses)
 
     @staticmethod