very slow data_time compared to released logs #1924

TongkunGuan · 2023-06-07T08:56:56Z

TongkunGuan
Jun 7, 2023

Screenshot of the official training log （DBNet18_TotalText）：

This is my log:
2023/06/07 16:45:03 - mmengine - INFO -

System environment:
sys.platform: linux
Python: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 1528941036
GPU 0,1,2,3: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda-11.1/bin
NVCC: Not Available
GCC: gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
PyTorch: 1.10.2
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.3
OpenCV: 4.7.0
MMEngine: 0.7.3

Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
Distributed launcher: pytorch
Distributed training: True
GPU number: 4

2023/06/07 16:45:03 - mmengine - INFO - Config:
model = dict(
type='DBNet',
backbone=dict(
type='mmdet.ResNet',
depth=18,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=True),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
norm_eval=False,
style='caffe'),
neck=dict(
type='FPNC', in_channels=[64, 128, 256, 512], lateral_channels=256),
det_head=dict(
type='DBHead',
in_channels=256,
module_loss=dict(type='DBModuleLoss'),
postprocessor=dict(type='DBPostprocessor', text_repr_type='quad')),
data_preprocessor=dict(
type='TextDetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32))
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
]
totaltext_textdet_data_root = 'data/totaltext'
totaltext_textdet_train = dict(
type='OCRDataset',
data_root='data/totaltext',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
])
totaltext_textdet_test = dict(
type='OCRDataset',
data_root='data/totaltext',
ann_file='textdet_test.json',
test_mode=True,
pipeline=[
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor'))
])
default_scope = 'mmocr'
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
randomness = dict(seed=None)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=5),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=20),
sampler_seed=dict(type='DistSamplerSeedHook'),
sync_buffer=dict(type='SyncBuffersHook'),
visualization=dict(
type='VisualizationHook',
interval=1,
enable=False,
show=False,
draw_gt=False,
draw_pred=False))
log_level = 'INFO'
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
load_from = None
resume = False
val_evaluator = dict(type='HmeanIOUMetric')
test_evaluator = dict(type='HmeanIOUMetric')
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='TextDetLocalVisualizer',
name='visualizer',
vis_backends=[dict(type='LocalVisBackend')])
opencv_num_threads = 0
mp_start_method = 'fork'
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.007, momentum=0.9, weight_decay=0.0001))
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [dict(type='PolyLR', power=0.9, eta_min=1e-07, end=1200)]
train_dataloader = dict(
batch_size=16,
num_workers=16,
pin_memory=True,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='OCRDataset',
data_root='data/totaltext',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(640, 640), keep_ratio=True),
dict(type='Pad', size=(640, 640)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]))
val_dataloader = dict(
batch_size=1,
num_workers=1,
pin_memory=True,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='data/totaltext',
ann_file='textdet_test.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
test_dataloader = dict(
batch_size=1,
num_workers=1,
pin_memory=True,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='OCRDataset',
data_root='data/totaltext',
ann_file='textdet_test.json',
test_mode=True,
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1333, 736), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(type='FixInvalidPolygon', min_poly_points=4),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]))
auto_scale_lr = dict(base_batch_size=16)
launcher = 'pytorch'
work_dir = './work_dirs/dbnet_resnet18_fpnc_1200e_totaltext'

2023/06/07 16:45:05 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook

before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook

before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook

before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

before_val_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook

before_val_iter:
(NORMAL ) IterTimerHook

after_val_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook

after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train:
(VERY_LOW ) CheckpointHook

before_test_epoch:
(NORMAL ) IterTimerHook

before_test_iter:
(NORMAL ) IterTimerHook

after_test_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook

after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_run:
(BELOW_NORMAL) LoggerHook

2023/06/07 16:45:06 - mmengine - INFO - load model from: torchvision://resnet18
2023/06/07 16:45:06 - mmengine - INFO - Loads checkpoint by torchvision backend from path: torchvision://resnet18
2023/06/07 16:45:06 - mmengine - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Name of parameter - Initialization information

backbone.conv1.weight - torch.Size([64, 3, 7, 7]):
PretrainedInit: load from torchvision://resnet18

backbone.bn1.weight - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.bn1.bias - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.conv1.weight - torch.Size([64, 64, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.bn1.weight - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.bn1.bias - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.conv2.weight - torch.Size([64, 64, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.bn2.weight - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.0.bn2.bias - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.conv1.weight - torch.Size([64, 64, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.bn1.weight - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.bn1.bias - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.conv2.weight - torch.Size([64, 64, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.bn2.weight - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer1.1.bn2.bias - torch.Size([64]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.conv1.weight - torch.Size([128, 64, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.bn1.weight - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.bn1.bias - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.conv2.weight - torch.Size([128, 128, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.bn2.weight - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.bn2.bias - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.downsample.0.weight - torch.Size([128, 64, 1, 1]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.downsample.1.weight - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.0.downsample.1.bias - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.conv1.weight - torch.Size([128, 128, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.bn1.weight - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.bn1.bias - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.conv2.weight - torch.Size([128, 128, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.bn2.weight - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer2.1.bn2.bias - torch.Size([128]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.conv1.weight - torch.Size([256, 128, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.bn1.weight - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.bn1.bias - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.conv2.weight - torch.Size([256, 256, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.bn2.weight - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.bn2.bias - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.downsample.0.weight - torch.Size([256, 128, 1, 1]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.downsample.1.weight - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.0.downsample.1.bias - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.conv1.weight - torch.Size([256, 256, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.bn1.weight - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.bn1.bias - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.conv2.weight - torch.Size([256, 256, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.bn2.weight - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer3.1.bn2.bias - torch.Size([256]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.conv1.weight - torch.Size([512, 256, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.bn1.weight - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.bn1.bias - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.conv2.weight - torch.Size([512, 512, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.bn2.weight - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.bn2.bias - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.downsample.0.weight - torch.Size([512, 256, 1, 1]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.downsample.1.weight - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.0.downsample.1.bias - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.conv1.weight - torch.Size([512, 512, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.bn1.weight - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.bn1.bias - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.conv2.weight - torch.Size([512, 512, 3, 3]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.bn2.weight - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

backbone.layer4.1.bn2.bias - torch.Size([512]):
PretrainedInit: load from torchvision://resnet18

neck.lateral_convs.0.conv.weight - torch.Size([256, 64, 1, 1]):
Initialized by user-defined init_weights in ConvModule

neck.lateral_convs.1.conv.weight - torch.Size([256, 128, 1, 1]):
Initialized by user-defined init_weights in ConvModule

neck.lateral_convs.2.conv.weight - torch.Size([256, 256, 1, 1]):
Initialized by user-defined init_weights in ConvModule

neck.lateral_convs.3.conv.weight - torch.Size([256, 512, 1, 1]):
Initialized by user-defined init_weights in ConvModule

neck.smooth_convs.0.conv.weight - torch.Size([64, 256, 3, 3]):
Initialized by user-defined init_weights in ConvModule

neck.smooth_convs.1.conv.weight - torch.Size([64, 256, 3, 3]):
Initialized by user-defined init_weights in ConvModule

neck.smooth_convs.2.conv.weight - torch.Size([64, 256, 3, 3]):
Initialized by user-defined init_weights in ConvModule

neck.smooth_convs.3.conv.weight - torch.Size([64, 256, 3, 3]):
Initialized by user-defined init_weights in ConvModule

det_head.binarize.0.weight - torch.Size([64, 256, 3, 3]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.1.weight - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.1.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.3.weight - torch.Size([64, 64, 2, 2]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.3.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.4.weight - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.4.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.6.weight - torch.Size([64, 1, 2, 2]):
The value is the same before and after calling init_weights of DBNet

det_head.binarize.6.bias - torch.Size([1]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.0.weight - torch.Size([64, 256, 3, 3]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.1.weight - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.1.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.3.weight - torch.Size([64, 64, 2, 2]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.3.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.4.weight - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.4.bias - torch.Size([64]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.6.weight - torch.Size([64, 1, 2, 2]):
The value is the same before and after calling init_weights of DBNet

det_head.threshold.6.bias - torch.Size([1]):
The value is the same before and after calling init_weights of DBNet
2023/06/07 16:45:06 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
2023/06/07 16:45:06 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
2023/06/07 16:45:06 - mmengine - INFO - Checkpoints will be saved to /home/guantongkun/Projects/STD/mmocr/work_dirs/dbnet_resnet18_fpnc_1200e_totaltext.
2023/06/07 16:47:08 - mmengine - INFO - Epoch(train) [1][ 5/20] lr: 7.0000e-03 eta: 6 days, 18:43:28 time: 24.4138 data_time: 12.6575 memory: 6773 loss: 15.3831 loss_prob: 12.2361 loss_thr: 2.2011 loss_db: 0.9459
2023/06/07 16:47:14 - mmengine - INFO - Epoch(train) [1][10/20] lr: 7.0000e-03 eta: 3 days, 13:07:39 time: 12.7745 data_time: 6.3293 memory: 6773 loss: 10.8112 loss_prob: 8.1802 loss_thr: 1.6932 loss_db: 0.9379
2023/06/07 16:47:19 - mmengine - INFO - Epoch(train) [1][15/20] lr: 7.0000e-03 eta: 2 days, 11:11:58 time: 1.1214 data_time: 0.0011 memory: 6773 loss: 5.7262 loss_prob: 3.5901 loss_thr: 1.1740 loss_db: 0.9620
2023/06/07 16:47:25 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_totaltext_20230607_164457
2023/06/07 16:47:25 - mmengine - INFO - Epoch(train) [1][20/20] lr: 7.0000e-03 eta: 1 day, 22:10:27 time: 1.0894 data_time: 0.0012 memory: 6773 loss: 5.1352 loss_prob: 2.9833 loss_thr: 1.1549 loss_db: 0.9971
2023/06/07 16:49:04 - mmengine - INFO - Epoch(train) [2][ 5/20] lr: 6.9947e-03 eta: 2 days, 15:22:40 time: 10.4632 data_time: 8.4200 memory: 6773 loss: 5.0528 loss_prob: 2.9071 loss_thr: 1.1457 loss_db: 1.0000
2023/06/07 16:49:09 - mmengine - INFO - Epoch(train) [2][10/20] lr: 6.9947e-03 eta: 2 days, 6:02:17 time: 10.4838 data_time: 8.4199 memory: 6773 loss: 5.0326 loss_prob: 2.8929 loss_thr: 1.1397 loss_db: 1.0000
2023/06/07 16:49:15 - mmengine - INFO - Epoch(train) [2][15/20] lr: 6.9947e-03 eta: 1 day, 23:21:27 time: 1.1075 data_time: 0.0015 memory: 6773 loss: 5.0037 loss_prob: 2.8684 loss_thr: 1.1353 loss_db: 1.0000
2023/06/07 16:49:20 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_totaltext_20230607_164457
2023/06/07 16:49:20 - mmengine - INFO - Epoch(train) [2][20/20] lr: 6.9947e-03 eta: 1 day, 18:17:36 time: 1.0708 data_time: 0.0018 memory: 6773 loss: 4.9785 loss_prob: 2.8439 loss_thr: 1.1347 loss_db: 1.0000
2023/06/07 16:51:03 - mmengine - INFO - Epoch(train) [3][ 5/20] lr: 6.9895e-03 eta: 2 days, 4:49:18 time: 10.8226 data_time: 8.2966 memory: 6773 loss: 4.9583 loss_prob: 2.8286 loss_thr: 1.1297 loss_db: 1.0000
2023/06/07 16:51:09 - mmengine - INFO - Epoch(train) [3][10/20] lr: 6.9895e-03 eta: 2 days, 0:16:51 time: 10.8679 data_time: 8.2962 memory: 6773 loss: 4.9430 loss_prob: 2.8215 loss_thr: 1.1216 loss_db: 0.9999
2023/06/07 16:51:15 - mmengine - INFO - Epoch(train) [3][15/20] lr: 6.9895e-03 eta: 1 day, 20:34:28 time: 1.1370 data_time: 0.0010 memory: 6773 loss: 4.9207 loss_prob: 2.8178 loss_thr: 1.1056 loss_db: 0.9973
2023/06/07 16:51:20 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_totaltext_20230607_164457
2023/06/07 16:51:20 - mmengine - INFO - Epoch(train) [3][20/20] lr: 6.9895e-03 eta: 1 day, 17:25:11 time: 1.0849 data_time: 0.0009 memory: 6773 loss: 4.8702 loss_prob: 2.8197 loss_thr: 1.1268 loss_db: 0.9237
2023/06/07 16:52:58 - mmengine - INFO - Epoch(train) [4][ 5/20] lr: 6.9842e-03 eta: 2 days, 0:15:56 time: 10.3282 data_time: 5.7911 memory: 6773 loss: 4.8344 loss_prob: 2.8201 loss_thr: 1.1547 loss_db: 0.8596
2023/06/07 16:53:03 - mmengine - INFO - Epoch(train) [4][10/20] lr: 6.9842e-03 eta: 1 day, 21:19:33 time: 10.3602 data_time: 5.7912 memory: 6773 loss: 4.7782 loss_prob: 2.8201 loss_thr: 1.1319 loss_db: 0.8262
2023/06/07 16:53:09 - mmengine - INFO - Epoch(train) [4][15/20] lr: 6.9842e-03 eta: 1 day, 18:48:50 time: 1.1299 data_time: 0.0013 memory: 6773 loss: 4.6503 loss_prob: 2.8263 loss_thr: 1.1211 loss_db: 0.7029
2023/06/07 16:53:14 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_totaltext_20230607_164457
2023/06/07 16:53:14 - mmengine - INFO - Epoch(train) [4][20/20] lr: 6.9842e-03 eta: 1 day, 16:33:49 time: 1.1076 data_time: 0.0016 memory: 6773 loss: 4.5212 loss_prob: 2.8308 loss_thr: 1.0923 loss_db: 0.5981
2023/06/07 16:54:48 - mmengine - INFO - Epoch(train) [5][ 5/20] lr: 6.9790e-03 eta: 1 day, 21:28:58 time: 9.8801 data_time: 7.8095 memory: 6773 loss: 4.4301 loss_prob: 2.8336 loss_thr: 1.0420 loss_db: 0.5545
2023/06/07 16:54:54 - mmengine - INFO - Epoch(train) [5][10/20] lr: 6.9790e-03 eta: 1 day, 19:22:58 time: 9.9481 data_time: 7.8095 memory: 6773 loss: 4.3626 loss_prob: 2.8351 loss_thr: 1.0086 loss_db: 0.5188
2023/06/07 16:55:00 - mmengine - INFO - Epoch(train) [5][15/20] lr: 6.9790e-03 eta: 1 day, 17:29:20 time: 1.1599 data_time: 0.0017 memory: 6773 loss: 4.3495 loss_prob: 2.8334 loss_thr: 0.9989 loss_db: 0.5172
2023/06/07 16:55:05 - mmengine - INFO - Exp name: dbnet_resnet18_fpnc_1200e_totaltext_20230607_164457
2023/06/07 16:55:05 - mmengine - INFO - Epoch(train) [5][20/20] lr: 6.9790e-03 eta: 1 day, 15:44:32 time: 1.0755 data_time: 0.0012 memory: 6773 loss: 4.3041 loss_prob: 2.8167 loss_thr: 0.9812 loss_db: 0.5062

main issues:
Although num_works=16, only a few CPU cores are occupied ---> slow data_time? --->where is the problem？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow data_time compared to released logs #1924

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

very slow data_time compared to released logs #1924

TongkunGuan Jun 7, 2023

This is my log: 2023/06/07 16:45:03 - mmengine - INFO -

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: pytorch Distributed training: True GPU number: 4

2023/06/07 16:45:05 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook (BELOW_NORMAL) LoggerHook

before_train: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (VERY_LOW ) CheckpointHook

before_train_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) DistSamplerSeedHook

before_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook

after_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

after_train_epoch: (NORMAL ) IterTimerHook (NORMAL ) SyncBuffersHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

before_val_epoch: (NORMAL ) IterTimerHook (NORMAL ) SyncBuffersHook

before_val_iter: (NORMAL ) IterTimerHook

after_val_iter: (NORMAL ) IterTimerHook (NORMAL ) VisualizationHook (BELOW_NORMAL) LoggerHook

after_val_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

after_train: (VERY_LOW ) CheckpointHook

before_test_epoch: (NORMAL ) IterTimerHook

before_test_iter: (NORMAL ) IterTimerHook

after_test_iter: (NORMAL ) IterTimerHook (NORMAL ) VisualizationHook (BELOW_NORMAL) LoggerHook

after_test_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook

after_run: (BELOW_NORMAL) LoggerHook

Replies: 0 comments

TongkunGuan
Jun 7, 2023

This is my log:
2023/06/07 16:45:03 - mmengine - INFO -

Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
Distributed launcher: pytorch
Distributed training: True
GPU number: 4

2023/06/07 16:45:05 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook

before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook

before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook

before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

before_val_epoch:
(NORMAL ) IterTimerHook
(NORMAL ) SyncBuffersHook

before_val_iter:
(NORMAL ) IterTimerHook

after_val_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook

after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train:
(VERY_LOW ) CheckpointHook

before_test_epoch:
(NORMAL ) IterTimerHook

before_test_iter:
(NORMAL ) IterTimerHook

after_test_iter:
(NORMAL ) IterTimerHook
(NORMAL ) VisualizationHook
(BELOW_NORMAL) LoggerHook

after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_run:
(BELOW_NORMAL) LoggerHook