Ongoing GSoC project, those are my experiments result.
Hyper-parameters:
batch_size=128
lr=3e-3
MAE on test set. Some result will be posted soon, I'm sorting them out.
NN Architecture | Model I | Model II | Model III | Epoch |
---|---|---|---|---|
ResNet34 (pretrained on ImageNet) | 0.2849 | 0.2162 | 0.1332 | 300 |
Pure VIT | 0.4420 (15*15) | 0.2437(8*8) | 0.2175 (8*8) | 300 |
CNN-T | 0.1459 | 300 | ||
MobileNet V2 (pretrained on ImageNet) | 0.1568 | 300 | ||
CvT-13 | 0.2548 | 0.1403 | 300 |
Scatter plot on test set with ResNet34
Arch | Model I | Model II | Model IIII |
---|---|---|---|
ResNet | ![]() |
![]() |
![]() |
Pure VIT | ![]() |
![]() |
![]() |
CNN-T | ![]() |
||
MobileNet V2 | ![]() |
||
CvT-13 | ![]() |
![]() |
-
DeepLense, (2021), GitHub repository.
-
Yurii's Post link.
-
ResNet34
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-
Pure VIT Code referenced from vit-pytorch.
-
CNN-T
Li, S.; Wu, C.; Xiong, N. Hybrid Architecture Based on CNN and Transformer for Strip Steel Surface Defect Classification. Electronics 2022, 11, 1200. https://doi.org/10.3390/electronics11081200
-
MobileNet V2 link
-
CVT
Wu H, Xiao B, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 22-31.