-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantized Depth Completion] Questions about implementation details #61
Comments
Dear HongYu, thank you for your interest!
|
Hi @VC86, May I ask further for the import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms.functional as TF
arr = np.array(range(25))
tensor = torch.from_numpy(arr).to(torch.float).reshape(1, 1, 5, 5)
grad_x_layer = nn.Conv2d(1, 1, kernel_size=(1, 3), stride=1, padding=(0, 1), bias=False, padding_mode='replicate')
grad_y_layer = nn.Conv2d(1, 1, kernel_size=(3, 1), stride=1, padding=(1, 0), bias=False, padding_mode='replicate')
with torch.no_grad():
grad_x_layer.weight = nn.Parameter(torch.tensor((-0.5, 0, 0.5)).reshape((1, 1, 1, 3)))
grad_y_layer.weight = nn.Parameter(torch.tensor((-0.5, 0, 0.5)).reshape((1, 1, 3, 1)))
grad_x = grad_x_layer(tensor)
grad_y = grad_y_layer(tensor)
minus_1 = -1 * torch.ones_like(tensor)
normals = torch.cat((grad_x, grad_y, minus_1), dim=1)
normals = normals / torch.linalg.norm(normals, dim=1, ord=2).unsqueeze(1)
print('input:\n', tensor)
print('grad_x:\n', grad_x)
print('grad_y:\n', grad_y)
print('normals:\n', normals) and the output: input:
tensor([[[[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., 24.]]]])
grad_x:
tensor([[[[0.5000, 1.0000, 1.0000, 1.0000, 0.5000],
[0.5000, 1.0000, 1.0000, 1.0000, 0.5000],
[0.5000, 1.0000, 1.0000, 1.0000, 0.5000],
[0.5000, 1.0000, 1.0000, 1.0000, 0.5000],
[0.5000, 1.0000, 1.0000, 1.0000, 0.5000]]]])
grad_y:
tensor([[[[2.5000, 2.5000, 2.5000, 2.5000, 2.5000],
[5.0000, 5.0000, 5.0000, 5.0000, 5.0000],
[5.0000, 5.0000, 5.0000, 5.0000, 5.0000],
[5.0000, 5.0000, 5.0000, 5.0000, 5.0000],
[2.5000, 2.5000, 2.5000, 2.5000, 2.5000]]]])
normals:
tensor([[[[ 0.1826, 0.3482, 0.3482, 0.3482, 0.1826],
[ 0.0976, 0.1925, 0.1925, 0.1925, 0.0976],
[ 0.0976, 0.1925, 0.1925, 0.1925, 0.0976],
[ 0.0976, 0.1925, 0.1925, 0.1925, 0.0976],
[ 0.1826, 0.3482, 0.3482, 0.3482, 0.1826]],
[[ 0.9129, 0.8704, 0.8704, 0.8704, 0.9129],
[ 0.9759, 0.9623, 0.9623, 0.9623, 0.9759],
[ 0.9759, 0.9623, 0.9623, 0.9623, 0.9759],
[ 0.9759, 0.9623, 0.9623, 0.9623, 0.9759],
[ 0.9129, 0.8704, 0.8704, 0.8704, 0.9129]],
[[-0.3651, -0.3482, -0.3482, -0.3482, -0.3651],
[-0.1952, -0.1925, -0.1925, -0.1925, -0.1952],
[-0.1952, -0.1925, -0.1925, -0.1925, -0.1952],
[-0.1952, -0.1925, -0.1925, -0.1925, -0.1952],
[-0.3651, -0.3482, -0.3482, -0.3482, -0.3651]]]]) Is it correct? Because after directly applying this implementation on GT depth in NYU Depth v2, the result is strange compared to the visualization in the paper. The minimal and reproducible snippets: import h5py
import numpy as np
import torch
import torch.nn as nn
import torchvision.transforms.functional as TF
from PIL import Image
with h5py.File('data/nyudepthv2/val/official/00001.h5', 'r') as f:
gt_depth = torch.from_numpy(np.array(f['depth'], dtype=np.float32)).unsqueeze(0).unsqueeze(0) # (B, 1, H, W)
rgb_img = Image.fromarray(np.transpose(f['rgb'], (1, 2, 0)))
grad_x_layer = nn.Conv2d(1, 1, kernel_size=(1, 3), stride=1, padding=(0, 1), bias=False, padding_mode='replicate')
grad_y_layer = nn.Conv2d(1, 1, kernel_size=(3, 1), stride=1, padding=(1, 0), bias=False, padding_mode='replicate')
with torch.no_grad():
grad_x_layer.weight = nn.Parameter(torch.tensor((-0.5, 0, 0.5)).reshape((1, 1, 1, 3)))
grad_y_layer.weight = nn.Parameter(torch.tensor((-0.5, 0, 0.5)).reshape((1, 1, 3, 1)))
grad_x = grad_x_layer(gt_depth)
grad_y = grad_y_layer(gt_depth)
minus_1 = -1 * torch.ones_like(gt_depth)
normals = torch.cat((grad_x, grad_y, minus_1), dim=1)
normals = normals / torch.linalg.norm(normals, dim=1, ord=2).unsqueeze(1)
normals = ((normals + 1) / 2 * 255).squeeze().to(torch.uint8)
normals = TF.to_pil_image(normals).save('normals.png')
rgb_img.save('rgb.png') Thank you so much! |
Your code looks correct and the test above is also numerically correct, but the normals aren't as I would expect them when visualized, indeed (although the way you convert them to UINT8 also looks correct).
|
Thanks again for your reply! From the information that you said you convert to millimeter before normals computing, I modified the code as following: import h5py
import numpy as np
import torch
import torch.nn.functional as F
import torchvision.transforms.functional as TF
from PIL import Image
with h5py.File('data/nyudepthv2/val/official/00001.h5', 'r') as f:
gt_depth = torch.from_numpy(np.array(f['depth'], dtype=np.float32)).unsqueeze(0).unsqueeze(0) # (B, 1, H, W)
rgb_img = Image.fromarray(np.transpose(f['rgb'], (1, 2, 0)))
# resize & center crop (480, 640) -> (240, 320) -> (224, 304)
gt_depth, rgb_img = TF.resize(gt_depth, (240, 320)), TF.resize(rgb_img, (240, 320))
gt_depth, rgb_img = TF.center_crop(gt_depth, (224, 304)), TF.center_crop(rgb_img, (224, 304))
# take the scale into account (meter to millimeter)
scaled_gt_depth = gt_depth * 1000.0
# compute normals
grad_x_weights = torch.tensor((-0.5, 0, 0.5), dtype=torch.float, requires_grad=False)
grad_x_weights = grad_x_weights.reshape((1, 1, 1, 3))
grad_y_weights = torch.tensor((-0.5, 0, 0.5), dtype=torch.float, requires_grad=False)
grad_y_weights = grad_y_weights.reshape((1, 1, 3, 1))
with torch.no_grad():
x_padded_dense_depth = F.pad(scaled_gt_depth, (1, 1, 0, 0), 'replicate')
y_padded_dense_depth = F.pad(scaled_gt_depth, (0, 0, 1, 1), 'replicate')
grad_x = F.conv2d(x_padded_dense_depth, grad_x_weights)
grad_y = F.conv2d(y_padded_dense_depth, grad_y_weights)
minus_1 = -1 * torch.ones_like(scaled_gt_depth)
normals = torch.cat((grad_x, grad_y, minus_1), dim=1)
normals = normals / torch.linalg.norm(normals, dim=1, ord=2).unsqueeze(1)
# visualization
print(f'normals stats: min={torch.min(normals):.2f}, max={torch.max(normals):.2f}')
TF.to_pil_image(((normals + 1) / 2 * 255).squeeze().to(torch.uint8)).save('normals.png')
rgb_img.save('rgb.png') and the output: normals stats: min=-1.00, max=1.00, median=-0.07 I think the visualization is far better than previous one. |
@james77777778 Did u replicate the results of this paper? Is it good after quantization |
First of all, thanks for the great work, but the source code is still missing.
Could you share the training/evaluating code and pretrained weights about this work?
Also, I'm trying to reimplement with PyTorch and I have some questions about the paper:
Thanks! Look forward to your kind reply.
The text was updated successfully, but these errors were encountered: