Una pregunta sobre la Sección 4.2 #12

qiqigit · 2024-09-13T09:39:23Z

@joanrod
Hola, Juan! Gracias por compartir este proyecto!

Tengo una pregunta sobre la Sección 4.2 de tu artículo.
No estoy seguro de qué "feature maps" se utilizan exactamente en el cálculo de OCR Perceptual loss.

Está escrito en la Sección 4.2 de la siguiente manera:
"... through the OCR model, and extract L feature maps from intermediate layers. Specifically, we store the activation map after each upsampling layer..."

However, in the code, it seems that the feature maps are extracted from the VGG16-BN part of the network, instead of the upsampling layers (which are in the UNet part of the network).
https://github.com/joanrod/ocr-vqgan/blob/68e36b568b59df275940296c164b1cf40585512b/taming/modules/losses/craft.py#L89

https://github.com/joanrod/ocr-vqgan/blob/68e36b568b59df275940296c164b1cf40585512b/taming/modules/losses/lpips.py#L28-L29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Una pregunta sobre la Sección 4.2 #12

Una pregunta sobre la Sección 4.2 #12

qiqigit commented Sep 13, 2024

Una pregunta sobre la Sección 4.2 #12

Una pregunta sobre la Sección 4.2 #12

Comments

qiqigit commented Sep 13, 2024