Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ImageInterface #1186

Open
h-mayorquin opened this issue Jan 27, 2025 · 1 comment
Open

Add ImageInterface #1186

h-mayorquin opened this issue Jan 27, 2025 · 1 comment

Comments

@h-mayorquin
Copy link
Collaborator

The goal of interfaces is to make conversions easier and we can make the creation of Images containers more straightforward in several ways:

  • Reducing boilerplate. The user provides the file_paths or a folder_path with images and we can load the images, extract resolution and orientation if they are available in the metadata and write to disk.
  • Automatically choose the correct data type based on the image metadata. We currently have three data types in pynwb, GrayscaleImage, RGBImage and RGBAImage. The interface will select the correct data type automatically based on the image metadata making life easier for the user.
  • We can set good chunking (probably None in most cases?) and compression defaults.
  • If there are too many images (say they will require more than 1GiB) we can write them with an iterator to enable the workflow to work in most computers.
@h-mayorquin
Copy link
Collaborator Author

I will add this as a PR later but meanwhile here is a draft of the iterator that I have written:

class SingleImageIterator(AbstractDataChunkIterator):
    """Simple iterator to return a single image. This avoids loading the entire image into memory at initializing
    and instead loads it at writing time one by one"""

    def __init__(self, filename):
        self._filename = Path(filename)
        
        # Get image information without loading the full image
        with Image.open(self._filename) as img:
            self.image_mode = img.mode
            self._image_shape = img.size[::-1]  # PIL uses (width, height) instead of (height, width)
            self._max_shape = (None, None)
            
            self.number_of_bands = len(img.getbands())
            if self.number_of_bands > 1:
                self._image_shape += (self.number_of_bands,)
                self._max_shape += (self.number_of_bands,)
            
            # Calculate file size in bytes
            self._size_bytes = self._filename.stat().st_size
            # Calculate approximate memory size when loaded as numpy array
            self._memory_size = np.prod(self._image_shape) * np.dtype(float).itemsize

        self._images_returned = 0  # Number of images returned in __next__

    def __iter__(self):
        """Return the iterator object"""
        return self

    def __next__(self):
        """
        Return the DataChunk with the single full image
        """
        if self._images_returned == 0:
            data = np.asarray(Image.open(self._filename))
            selection = (slice(None),) * data.ndim
            self._images_returned += 1
            return DataChunk(data=data, selection=selection)
        else:
            raise StopIteration

    def recommended_chunk_shape(self):
        """
        Recommend the chunk shape for the data array.
        """
        return self._image_shape

    def recommended_data_shape(self):
        """
        Recommend the initial shape for the data array.
        """
        return self._image_shape

    @property
    def dtype(self):
        """
        Define the data type of the array
        """
        return np.dtype(float)

    @property
    def maxshape(self):
        """
        Property describing the maximum shape of the data array that is being iterated over
        """
        return self._max_shape


    def __len__(self):
        return self._image_shape[0]

    @property
    def size_info(self):
        """
        Return dictionary with size information
        """
        return {
            'file_size_bytes': self._size_bytes,
            'memory_size_bytes': self._memory_size,
            'shape': self._image_shape,
            'mode': self.image_mode,
            'bands': self.number_of_bands
        }

It does not have the mapping between data and NWB data type yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant