Skip to content

Latest commit



executable file
92 lines (76 loc) · 5.68 KB

File metadata and controls

executable file
92 lines (76 loc) · 5.68 KB


Photorealistic attention based text guided human image editing with a latent mapper for StyleGAN2.

image of my project

This work is a reimplementation of the paper FEAT: Face Editing with Attention with additional changes and improvements.


  1. Clone this repository
  2. CD into this repo: cd GanVinci
  3. Create conda environment from environment.yml conda env create -f environment.yml
  4. Download StyleGAN2 config-f weights from here
  5. Place StyleGAN2 weights under checkpoints/


To train a text guided image edit (e.g. beard, smiling_person, open_mouth, blond_hair etc.) execute:


with the following parameters

  • --clip_text type str, help "edit text e.g. beard, smile or open_mouth",
  • --batch_sizebatch size (need to be one if --(fe)male_only is activated, type int, default 1
  • --lr learnrate, type float, default=0.0001
  • --lambda_att latent attention regression loss factor, type float, default=0.005
  • --lambda_tv total variation loss factor, type float, default 0.00001
  • --lambda_l2 l2 loss factor, type float, default 0.8
  • --att_layer layer of attention map, type int, default 8
  • --att_channel number of channels of attention map, type int, default 32
  • --att_start start attention layer of the latent mapper, type int default 0
  • --lr_step_size learning rate step size for scheduler, type int, default 5000
  • --lr_gamma gamma for learning rate of scheduler, type float, default 0.5
  • --alpha factor of latent mapper type float, default 0.5
  • --clip_only_steps amount of steps training only using clip loss for better convergence in some edits, type int, default 0
  • --size output image size of the generator, type int, default 1024
  • --iterations number of samples to be generated for each image, type int, default 20000
  • --truncation truncation ratio, type float, default 1
  • --truncation_mean number of vectors to calculate mean for the truncation, type int, default 4096
  • --stylegan2_ckpt path to the StyleGAN2 model checkpoint, type str, default
  • --channel_multiplier channel multiplier of the generator. config-f = 2, else = 1, type int, default 2
  • --male_only flag that only uses images of male people
  • --female_only flag that only uses images of female people

In the bash_examples/folder are a few inference invokes provided.


For inference it is required to have the trained edit checkpoints placed under the folder structure like the following example

├── 0-8/
│   ├── beard/
│   │   │   ├──  checkpoints/  
│   │   │   │   ├──
│   │   │   │   ├──    
│   │   ... ... ...
│   │   │   │   └──    
│   ...

To apply a trained text guided image edit execute:


with the following parameters

  • --clip_text name of edit (e.g. beard, smile etc.), if "" standard styleGAN2 image generation is applied, type str, default ""
  • --alpha factor of latent mapper, type float, default 0.1
  • --att_layer layer of attention map, type int, default 8
  • --att_channel number of channels of attention map, type int, default 32
  • --att_start start attention layer of the latent mapper, type int, default 0
  • --mask_threshold threshold for mask apply based on predicted pixels,, type float, default 0.8
  • --train_iter iteration steps of edit checkpoint, type str, default ""
  • --size output image size of the generator, type int, default 1024
  • --sample number of samples to be generated for each image type, int, default 1
  • --pics number of images to be generated, type, int, default 20
  • --truncation truncation ratio, type float, default 1
  • --truncation_mean number of vectors to calculate mean for the truncation, type int, default 4096
  • --ckpt path to the model checkpoint, type str, default
  • --channel_multiplier channel multiplier of the generator. config-f = 2, else = 1, type int, default 2
  • --seed random seed for image generation, type int, default 0
  • --male_only flag that only uses images of female people
  • --female_only flag that only uses images of female people

In the bash_examples/folder are a few inference invokes provided.

Pre-trained Edits

You can download some weights of pre-trained edits here. To apply a pre-trained edit leave the folder structure as it is and place everything under edits/ how explained in the inference section.


This code borrows heavily from stylegan2-pytorch and the model is based on the paper FEAT: Face Editing with Attention.