A Dataset for multi-task learning of Monocular Depth Estimation and Mask Prediction. The data was generated by taking:
- ~100 random background images
- ~110 random foreground images + flipped images of these originals (Total = 220)
- Overlaying these 100 fg images on each of the background at 20 different places (Yup! sometimes you could see poeple flying)
- Using DenseDepth model to generate Depth maps of the overlayed images
MonoDepthMask Dataset consists of following images:
- bg_image: Images with Only Background E.g: Malls, classrooms, college_outdoors, lobbies etc;
- fg_bg_images: Images with an object/person overlayed randomly on a Background
- mask_images: Ground truth of Masked Images of foregroud object/person.
- depth_images: Ground truth of Depth Map generated from fg_bg_images.
- DepthMapDataSet.csv: CSV file for the dataset contains following columns:
Column Name | Column Description |
---|---|
ImageName | fg_bg_image |
MaskName | mask_image |
Depthname | depth_image |
BGImageName | bg_image |
BaseImageFName | Zip file containing fg_bg_images and mask_images |
DepthImageFName | Zip file containing depth_images |
BGType | Class to which the bg_image belongs |
BGImageFName | Zip file containing bg_images |
ImageType | Count | Dimension | Channel Space | Channelwise Mean | Channelwise StdDev | Link |
---|---|---|---|---|---|---|
fg_bg_images | 484320 | 250x250x3 | RGB | [0.56632738, 0.51567622, 0.45670792] | [0.1076622, 0.10650349, 0.12808967] | https://github.com/rajy4683/MonoMaskDepth/blob/master/README.md#fg_bg_images-and-mask_images |
bg_images | 484320 | 250x250x3 | RGB | [0.57469445, 0.52241555, 0.45992244] | [0.11322354, 0.11195428, 0.13441683] | https://github.com/rajy4683/MonoMaskDepth/blob/master/README.md#bg_images |
mask_images | 484320 | 250x250x1 | RGB | [0.0579508] | [0.001662] | https://github.com/rajy4683/MonoMaskDepth/blob/master/README.md#fg_bg_images-and-mask_images |
depth_images | 484320 | 320x240x1 | RGB | [0.3679109] | [0.03551773] | https://github.com/rajy4683/MonoMaskDepth/blob/master/README.md#depth_images |
All the above data is indexed in the below CSVs:
- FullDataSet (~480K)
- Training Data (~340K)
- Test Data (~150K)
- Sample Data (500)
-
How to create transparent foreground images:
- Download PNG/JPG format images of people with any background
- Upload individual images to https://www.remove.bg/upload
- Since I was using the free version so images had to be transformed one at a time
- Download and save the transparent images
-
How to create masks for above foreground images:
- Used a simple OpenCV based conversion
def generate_mask(img,debug=False): lower_white = np.array([1, 1,1,4]) upper_white = np.array([255,255,255,4]) mask = cv2.inRange(img, lower_white, upper_white) if debug == True: cv2_imshow(img) cv2_imshow(mask) return mask
-
How were fg overlayed over bg and created 20 variants:
- Please refer to this notebook for end-to-end flow
- Primarily used albumentations to generate flipped images and for resizing images to fit the background. Code can be found here
- Main advantage of albumentation was that it operates on masks/bboxes also in the same operation
- FG images were of the size (125,125) or (64,64)
- Range of 20 random positions within ((0, height_bg - height_fg), (0, width_bg - width_fg)) was used to prevent images being cropped at the edge. Code can be found here
- A csv file with a tuple of every background with 40 positions (flipped + regular) was created.
- Slices of this CSV file was run parallely on 4 Colab instances to generate 4 files listed in this section.
- All the files generated were stored locally on the colab instance
- All the input files were copied at the start of the run to the colab instance's local directory
- The files were later zipped and saved back on to Google drive.
- Currently analyzing how to make this process faster and streamlined
- To overcome disk space and colab file handling issue,
-
How did you create your depth images?
- Base Model used was DenseDepth
- The test utilities were modified to handle the following:
- From input zip files generated above, directly read ~300 images
- Resize these images using albumentations to 480x640
- Run the model on the inputs and save the output depth data in 'plasma' cmap. This will be modified to grayscale.
- Similar to above step, this step was also run of 4 colab instances in parallel to generate respective depth images
- Code for this handling can be found here
-
How did you calculate mean and stddev?
- Code for computation can be found in this notebook
- PyTorch based DepthDataset class was created
- This allows to use either PyTorch dataloaders/plain iterators to be used over the entire dataset.
- Using Knuth's algorithm, the mean and stddev were calculated over each channel of all the images.
- The dataset loading and iteration is currently very slow and will need to be improved drastically.
- https://drive.google.com/file/d/1---bC2E22KCE7g6X0lqVaPZsaJu1sGHr/view?usp=sharing
- https://drive.google.com/file/d/1--mweX6AYhvQnCyUfRaEbWqFQEPHRcUL/view?usp=sharing
- https://drive.google.com/file/d/1EpcRuBvlXJP2t4GS5zuf5iEXFbpnYlk5/view?usp=sharing
- https://drive.google.com/file/d/1ctsr5LOe3-P6SZfV_U5NFTqDTn126V8c/view?usp=sharing
- https://drive.google.com/file/d/1-LlJX-As3b0IMBOLZ0Li_0qdlwu15TMQ/view?usp=sharing
- https://drive.google.com/file/d/1-CaGwfdNp9kDzAdVO_AUvL42_wicpwqm/view?usp=sharing
- https://drive.google.com/file/d/1-RkySbCztvrLrgfNc4a64L6JdA9tw1kW/view?usp=sharing
- https://drive.google.com/file/d/1-8dVuLds3_WiO1IC2MKgLUGAbmBrcDyS/view?usp=sharing
- FullDataSet (~480K)
- Training Data (~340K)
- Test Data (~150K)
- Sample Data (500)