Difference in Output When Using PIL.Image vs numpy.array for Image and Mask Input. #10755

purple-k · 2025-02-10T05:24:27Z

hi.
I get different results when providing image and mask as input using PIL.Image versus numpy. array. Why does this happen?
Is there an issue with my normalization method?

pillow	array

pillow code

image = Image.open(image_path).convert("RGB")
mask = Image.open(mask_path).convert("L")

output_image = pipeline(
    image=image,
    mask_image=mask,
    generator=torch.Generator(device=self.device).manual_seed(0),
).images[0]

array code

image = Image.open(image_path).convert("RGB")
mask = Image.open(mask_path).convert("L")
image_array = np.array(image) / 255.0
mask_array = np.array(mask) / 255.0

output_image = pipeline(
    image=image_array,
    mask_image=mask_array,
    generator=torch.Generator(device=self.device).manual_seed(0),
).images[0]

purple-k · 2025-02-10T07:38:18Z

It seems that the difference is caused by the code below.

    def preprocess(
        self,
        image: PipelineImageInput,
        height: Optional[int] = None,
        width: Optional[int] = None,
        resize_mode: str = "default",  # "default", "fill", "crop"
        crops_coords: Optional[Tuple[int, int, int, int]] = None,
    ) -> torch.Tensor:
        """
        Preprocess the image input.

        Args:
            image (`PipelineImageInput`):
                The image input, accepted formats are PIL images, NumPy arrays, PyTorch tensors; Also accept list of
                supported formats.
            height (`int`, *optional*):
                The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default
                height.
            width (`int`, *optional*):
                The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width.
            resize_mode (`str`, *optional*, defaults to `default`):
                The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within
                the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will
                resize the image to fit within the specified width and height, maintaining the aspect ratio, and then
                center the image within the dimensions, filling empty with data from image. If `crop`, will resize the
                image to fit within the specified width and height, maintaining the aspect ratio, and then center the
                image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
                supported for PIL image input.
            crops_coords (`List[Tuple[int, int, int, int]]`, *optional*, defaults to `None`):
                The crop coordinates for each image in the batch. If `None`, will not crop the image.

        Returns:
            `torch.Tensor`:
                The preprocessed image.
        """
        supported_formats = (PIL.Image.Image, np.ndarray, torch.Tensor)

        # Expand the missing dimension for 3-dimensional pytorch tensor or numpy array that represents grayscale image
        if self.config.do_convert_grayscale and isinstance(image, (torch.Tensor, np.ndarray)) and image.ndim == 3:
            if isinstance(image, torch.Tensor):
                # if image is a pytorch tensor could have 2 possible shapes:
                #    1. batch x height x width: we should insert the channel dimension at position 1
                #    2. channel x height x width: we should insert batch dimension at position 0,
                #       however, since both channel and batch dimension has same size 1, it is same to insert at position 1
                #    for simplicity, we insert a dimension of size 1 at position 1 for both cases
                image = image.unsqueeze(1)
            else:
                # if it is a numpy array, it could have 2 possible shapes:
                #   1. batch x height x width: insert channel dimension on last position
                #   2. height x width x channel: insert batch dimension on first position
                if image.shape[-1] == 1:
                    image = np.expand_dims(image, axis=0)
                else:
                    image = np.expand_dims(image, axis=-1)

        if isinstance(image, list) and isinstance(image[0], np.ndarray) and image[0].ndim == 4:
            warnings.warn(
                "Passing `image` as a list of 4d np.ndarray is deprecated."
                "Please concatenate the list along the batch dimension and pass it as a single 4d np.ndarray",
                FutureWarning,
            )
            image = np.concatenate(image, axis=0)
        if isinstance(image, list) and isinstance(image[0], torch.Tensor) and image[0].ndim == 4:
            warnings.warn(
                "Passing `image` as a list of 4d torch.Tensor is deprecated."
                "Please concatenate the list along the batch dimension and pass it as a single 4d torch.Tensor",
                FutureWarning,
            )
            image = torch.cat(image, axis=0)

        if not is_valid_image_imagelist(image):
            raise ValueError(
                f"Input is in incorrect format. Currently, we only support {', '.join(str(x) for x in supported_formats)}"
            )
        if not isinstance(image, list):
            image = [image]

        if isinstance(image[0], PIL.Image.Image):
            if crops_coords is not None:
                image = [i.crop(crops_coords) for i in image]
            if self.config.do_resize:
                height, width = self.get_default_height_width(image[0], height, width)
                image = [self.resize(i, height, width, resize_mode=resize_mode) for i in image]
            if self.config.do_convert_rgb:
                image = [self.convert_to_rgb(i) for i in image]
            elif self.config.do_convert_grayscale:
                image = [self.convert_to_grayscale(i) for i in image]
            image = self.pil_to_numpy(image)  # to np
            image = self.numpy_to_pt(image)  # to pt

        elif isinstance(image[0], np.ndarray):
            image = np.concatenate(image, axis=0) if image[0].ndim == 4 else np.stack(image, axis=0)

            image = self.numpy_to_pt(image)

            height, width = self.get_default_height_width(image, height, width)
            if self.config.do_resize:
                image = self.resize(image, height, width)

        elif isinstance(image[0], torch.Tensor):
            image = torch.cat(image, axis=0) if image[0].ndim == 4 else torch.stack(image, axis=0)

            if self.config.do_convert_grayscale and image.ndim == 3:
                image = image.unsqueeze(1)

            channel = image.shape[1]
            # don't need any preprocess if the image is latents
            if channel == self.config.vae_latent_channels:
                return image

            height, width = self.get_default_height_width(image, height, width)
            if self.config.do_resize:
                image = self.resize(image, height, width)

        # expected range [0,1], normalize to [-1,1]
        do_normalize = self.config.do_normalize
        if do_normalize and image.min() < 0:
            warnings.warn(
                "Passing `image` as torch tensor with value range in [-1,1] is deprecated. The expected value range for image tensor is [0,1] "
                f"when passing as pytorch tensor or numpy Array. You passed `image` with value range [{image.min()},{image.max()}]",
                FutureWarning,
            )
            do_normalize = False
        if do_normalize:
            image = self.normalize(image)

        if self.config.do_binarize:
            image = self.binarize(image)

        return image

def resize(
    self,
    image: Union[PIL.Image.Image, np.ndarray, torch.Tensor],
    height: int,
    width: int,
    resize_mode: str = "default",  # "default", "fill", "crop"
) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]:
    """
    Resize image.

    Args:
        image (`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`):
            The image input, can be a PIL image, numpy array or pytorch tensor.
        height (`int`):
            The height to resize to.
        width (`int`):
            The width to resize to.
        resize_mode (`str`, *optional*, defaults to `default`):
            The resize mode to use, can be one of `default` or `fill`. If `default`, will resize the image to fit
            within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`,
            will resize the image to fit within the specified width and height, maintaining the aspect ratio, and
            then center the image within the dimensions, filling empty with data from image. If `crop`, will resize
            the image to fit within the specified width and height, maintaining the aspect ratio, and then center
            the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
            supported for PIL image input.

    Returns:
        `PIL.Image.Image`, `np.ndarray` or `torch.Tensor`:
            The resized image.
    """
    if resize_mode != "default" and not isinstance(image, PIL.Image.Image):
        raise ValueError(f"Only PIL image input is supported for resize_mode {resize_mode}")
    if isinstance(image, PIL.Image.Image):
        if resize_mode == "default":
            image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
        elif resize_mode == "fill":
            image = self._resize_and_fill(image, width, height)
        elif resize_mode == "crop":
            image = self._resize_and_crop(image, width, height)
        else:
            raise ValueError(f"resize_mode {resize_mode} is not supported")

    elif isinstance(image, torch.Tensor):
        image = torch.nn.functional.interpolate(
            image,
            size=(height, width),
        )
    elif isinstance(image, np.ndarray):
        image = self.numpy_to_pt(image)
        image = torch.nn.functional.interpolate(
            image,
            size=(height, width),
        )
        image = self.pt_to_numpy(image)
    return image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in Output When Using PIL.Image vs numpy.array for Image and Mask Input. #10755

Difference in Output When Using PIL.Image vs numpy.array for Image and Mask Input. #10755

purple-k commented Feb 10, 2025 •

edited

Loading

purple-k commented Feb 10, 2025 •

edited

Loading

Difference in Output When Using PIL.Image vs numpy.array for Image and Mask Input. #10755

Difference in Output When Using PIL.Image vs numpy.array for Image and Mask Input. #10755

Comments

purple-k commented Feb 10, 2025 • edited Loading

pillow code

array code

purple-k commented Feb 10, 2025 • edited Loading

purple-k commented Feb 10, 2025 •

edited

Loading

purple-k commented Feb 10, 2025 •

edited

Loading