Photorealistic attention based text guided human image editing with a latent mapper for StyleGAN2.
This work is a reimplementation of the paper FEAT: Face Editing with Attention with additional changes and improvements.
- Clone this repository
https://github.com/Psarpei/GanVinci.git
- CD into this repo:
cd GanVinci
- Create conda environment from environment.yml
conda env create -f environment.yml
- Download StyleGAN2 config-f weights from here
- Place StyleGAN2 weights under
checkpoints/
To train a text guided image edit (e.g. beard
, smiling_person
, open_mouth
, blond_hair
etc.) execute:
python3 train_FEAT.py
with the following parameters
--clip_text
typestr
, help "edit text e.g. beard, smile or open_mouth",--batch_size
batch size (need to be one if --(fe)male_only is activated, typeint
, default1
--lr
learnrate, typefloat
, default=0.0001--lambda_att
latent attention regression loss factor, typefloat
, default=0.005--lambda_tv
total variation loss factor, typefloat
, default0.00001
--lambda_l2
l2 loss factor, typefloat
, default0.8
--att_layer
layer of attention map, typeint
, default8
--att_channel
number of channels of attention map, typeint
, default32
--att_start
start attention layer of the latent mapper, typeint
default0
--lr_step_size
learning rate step size for scheduler, typeint
, default5000
--lr_gamma
gamma for learning rate of scheduler, typefloat
, default0.5
--alpha
factor of latent mapper typefloat
, default0.5
--clip_only_steps
amount of steps training only using clip loss for better convergence in some edits, typeint
, default0
--size
output image size of the generator, typeint
, default1024
--iterations
number of samples to be generated for each image, typeint
, default20000
--truncation
truncation ratio, typefloat
, default1
--truncation_mean
number of vectors to calculate mean for the truncation, typeint
, default4096
--stylegan2_ckpt
path to the StyleGAN2 model checkpoint, typestr
, defaultstylegan2-ffhq-config-f.pt
--channel_multiplier
channel multiplier of the generator. config-f = 2, else = 1, typeint
, default2
--male_only
flag that only uses images of male people--female_only
flag that only uses images of female people
In the bash_examples/
folder are a few inference invokes provided.
For inference it is required to have the trained edit checkpoints placed under the folder structure like the following example
edits/
├── 0-8/
│ ├── beard/
│ │ │ ├── checkpoints/
│ │ │ │ ├── 01000_beard.pt
│ │ │ │ ├── 02000_beard.pt
│ │ ... ... ...
│ │ │ │ └── 20000_beard.pt
│ ...
...
To apply a trained text guided image edit execute:
python3 generate.py
with the following parameters
--clip_text
name of edit (e.g.beard
,smile
etc.), if "" standard styleGAN2 image generation is applied, typestr
, default""
--alpha
factor of latent mapper, typefloat
, default0.1
--att_layer
layer of attention map, typeint
, default8
--att_channel
number of channels of attention map, typeint
, default32
--att_start
start attention layer of the latent mapper, typeint
, default0
--mask_threshold
threshold for mask apply based on predicted pixels,, typefloat
, default0.8
--train_iter
iteration steps of edit checkpoint, typestr
, default""
--size
output image size of the generator, typeint
, default1024
--sample
number of samples to be generated for each image type,int
, default1
--pics
number of images to be generated, type,int
, default20
--truncation
truncation ratio, typefloat
, default1
--truncation_mean
number of vectors to calculate mean for the truncation, typeint
, default4096
--ckpt
path to the model checkpoint, typestr
, defaultstylegan2-ffhq-config-f.pt
--channel_multiplier
channel multiplier of the generator. config-f = 2, else = 1, typeint
, default2
--seed
random seed for image generation, typeint
, default0
--male_only
flag that only uses images of female people--female_only
flag that only uses images of female people
In the bash_examples/
folder are a few inference invokes provided.
You can download some weights of pre-trained edits here.
To apply a pre-trained edit leave the folder structure as it is and place everything under edits/
how explained in the inference section.
This code borrows heavily from stylegan2-pytorch and the model is based on the paper FEAT: Face Editing with Attention.