corresponding Webui version: v1.10.1
corresponding Forge version: v2.0.1
Stable Diffusion is a text-to-image generative AI model, similar to online services like Midjourney
and Bing
. Users can input prompts (text descriptions), and the model will generate images based on these prompts. The main advantage of Stable Diffusion
is that it is open-source, completely free to use, and can be run locally without any censorship.
Furthermore, there are many community-developed extensions (tools) that can perform a wide range of functions, as well as numerous community-trained models that can achieve various styles and concepts, giving users full creative control over the generations.
This section will cover some basic information on how to start generating images locally
To run AI models locally, a Graphics Card (GPU) from Nvidia is required
Note
While it is possible to run Stable Diffusion
on AMD, Apple, or even Intel GPUs, the setup can be more complicated and the processing speed is usually slower
When choosing a GPU, VRAM is the most important spec. Sufficient VRAM is essential to fit the entire model into memory; processing speed is only relevant once the model can be fully loaded. If your GPU does not have enough VRAM, the model will be partially loaded into system swap memory instead, significantly reducing the processing speed by up to ten times. Additionally, aim for the RTX 30 series or later, as older GPUs have limited support for half precision.
These are the reasons why, RTX 3060 (12 GB
VRAM) is generally recommended over RTX 3070 (8 GB
VRAM) and RTX 2080 (11 GB
VRAM).
- Budget: RTX 3060 (
12 GB
VRAM) - Entry Level: RTX 4060 Ti (
16 GB
VRAM) - Enthusiast: Second-Hand RTX 3090 (
24 GB
VRAM) - Professional: RTX 4090 (
24 GB
VRAM) - Baller: RTX 5090 (
32 GB
VRAM)
What I am using: RTX 4070 Ti Super (
16 GB
VRAM)
Tip
If you do not have a capable system, there are also (paid) online services for Stable Diffusion
Listed below are some popular User Interfaces to run Stable Diffusion:
-
Automatic1111 Webui: The classic frontend that helped popularizing Stable Diffusion in the early days, with a large community and extension supports; though development seems to be halted as of now
-
Forge Webui: A frontend based on Automatic1111, with a more optimized memory management that allows even low-end devices to run Stable Diffusion; though some may find Gradio 4 unresponsive; and development seems to be halted as of now
Tip
Installation Guide for Automatic1111 / Forge
-
ComfyUI: An advanced node-based frontend, providing the maximum flexibility for users to build custom complex workflows; though the learning curve is steeper as a result
-
Fooocus: A rather simple frontend, suitable for those who simply want beautiful artworks by just writing a prompt
-
SwarmUI, InvokeAI, SD.Next: Other notable frontend; though personally I've never tried them
-
Stability Matrix: A platform that simplifies the frontend installation and model management
What I am using: Forge Classic, based on the "previous" (Gradio 3) version of Forge, with a few minor updates and optimization added
This section will cover various things the word "model" can refer to in the context of Stable Diffusion
There has been numerous architectures released over the years. Listed below are the most widely adopted versions as of now:
Stable Diffusion 1.5
: The good ol' version that basically brought local image generation into the mainstream. As the smallest model with a base resolution of only512x512
and an inability to generate hands reliably, its quality is less comparable nowadays. But it remains quite popular thanks to its lower system requirements.- (hereinafter referred to as SD1 below)
Stable Diffusion XL
: The newer and bigger version with a base resolution of1024x1024
. It now understands prompts better and has a lower chance of generating eldritch abominations. But it also has considerably higher system requirements.- (hereinafter referred to as SDXL below)
Pony Diffusion V6 XL
: A specialized version ofSDXL
that underwent extensive training, causing many features to be incompatible between the two, resulting in a distinct category of its own. This model was specially fine-tuned on "Booru tags" to generate anime/cartoon images, with good prompt comprehension and the ability to generate correct hands/feet.- (hereinafter referred to as Pony below)
Illustrious
: Another specialized version ofSDXL
that was fine-tuned on "Booru tags" to generate anime images. It has even better prompt comprehension without a broken Text Encoder. Additionally, since the training dataset was not obfuscated, many popular characters can be directly prompted.- TL;DR:
Illustrious
series is significantly better thanPony
series nowadays
- TL;DR:
Tip
Booru Tags refers to the comma-separated tags used by anime image boards; see this post of Hatsune Miku for example
-
Flux
: One of the latest architectures, capable of generating legible text and next-gen quality, with next-gen hardware requirements...- (Don't even bother if you have less than 12 GB VRAM)
- There are 3 variants:
pro
: Only available through paid online servicesdev
: Distilled frompro
, uses standard number of steps (non-commercial)schnell
: Further distilled fromdev
, requires fewer number of steps
Technically, this is not Stable Diffusion
-
Stable Diffusion 3.5
: Another one of the latest architectures- There are 2 variants:
large
: Quality is comparable toFlux
; can also generate textsmedium
: Slightly worse quality; but is also lighter to run
- (Though, seems that there is no wide adoption yet)
- (hereinafter referred to as SD3 below)
- There are 2 variants:
Important
Models of different architectures are not compatible with each other (eg. a LoRA trained for SD1
will not work with SDXL
, etc.)
What I am using: Hassaku XL (Illustrious) - A model further trained on Illustrious
Checkpoint is what "model" usually refers to. It is the whole package that contains the UNet, Text Encoder, and VAE mentioned below.
- For
SD1
andSDXL
, models are usually distributed in a single checkpoint - For
Flux
andSD3
, the components are usually distributed separately due to the large size
Note
Put Checkpoint in ~webui\models\Stable-diffusion
U-Net is the core component that processes the latent noises to generate the image.
Note
Put UNet in ~webui\models\Stable-diffusion
Note
For Flux
and SD3
, the "core component" is DiT instead of U-Net, but people generally still call them UNet
on the internet
Text Encoder is the component that converts the human-readable prompts into vectors that the model understands.
SD1
,SDXL
,SD3
, andFlux
all use the small Clip text encoder(s) (~300 MB
)SD3
andFlux
additionally use the big T5 text encoder (~10 GB
atfp16
precision)
Note
Put Text Encoder in ~webui\models\text_encoder
VAE is the component that converts RGB images to / from latent space
Note
Put VAE in ~webui\models\VAE
Embedding (or Textual Inversion) is a technique used to train the Text Encoder
to learn specific concepts (eg. Characters, Style, etc.)
Note
Put Embedding in ~webui\embeddings
LoRA (and LyCORIS) is a technique used to train the UNet
(and optionally Text Encoder
) to learn specific concepts (eg. Characters, Style, etc.)
Important
Most LoRA
requires trigger words to function correctly, make sure to read the description
Note
Put LoRA in ~webui\models\Lora
- To change Checkpoint, select from the
Stable Diffusion checkpoint
dropdown at the top of the UI- To use the separated components in Forge, refer to this Announcement
- To use Embedding, click on the model card to add the filename to the prompt field
- To use LoRA, click on the model card to add the syntax to the prompt field
- To manually set a VAE, select from the
SD VAE
dropdown in theVAE
section of theSettings
tab
Tip
- If you don't see your newly added models, remember to click the Refresh button first
- By default, the Webui should detect the architecture of LoRAs and hide incompatible ones automatically; if you encounter a situation where known-compatible LoRAs are not showing up, you can enable the
Always show all networks on the Lora page
toggle in theExtra Networks
section of theSettings
tab
CivitAI is a site that hosts all sorts of models and user-generated resources like guides. When browsing, you can click the Filters
to look for a specific model type or architecture. Every page contains user comments, ratings, and samples. Remember to check the Base Model
in the Details
before downloading.
Note
.safetensors
is a format to store the weights safely; always choose .safetensors
if available
On HuggingFace, you may come across repositories where each component is separated into sub-folders, such as epiCRealism. This format is used by the Diffusers library. However, the Webui does not support this format. Usually you can find a link to the checkpoint format used by the Webui instead in the Model card
.
This section will cover some
noun.
used within the Webui
txt2img
: Generate an image based on promptsimg2img
: Generate an image based on another image and promptsExtras
: Postprocess images (incl. Upscale, Caption, Crop, etc.)- You can download upscaler models from OpenModelDB
Note
Put upscaler in ~webui\models\ESRGAN
What I am using: 2x-NomosUni-esrgan-multijpg
PNG Info
: You can upload a generated image to see the infotext (ie. the prompts and parameters used), provided that the metadata was not removed from the imageCheckpoint Merger
:The easiest way to spam more junks onto CivitAITrain
: Broken; use Kohya_SS insteadSettings
: Settings 💀Extensions
: Install & Manage Extensions
Prompt
: The text for what you want in the outputNegative Prompt
: The text for what you don’t want in the outputSampling Method
: Read this article for explanations and examples- TL;DR:
Euler a
tends to generate smoother images, suitable for anime models;DPM++ 2M Karras
tends to generate detailed images, suitable for realistic models
- TL;DR:
Note
Since Webui v1.9.0
, the Sampler and Scheduler are now two separated dropdowns
Sampling Steps
: The number of denoising iterationsHires. Fix
: Upscale then run through the pipeline a second time to improve outputRefiner
: Obsolete; just ignore it...Width/Height
: The resolution of the generated image
Important
Keep it at 512x512
for SD1 models; around 1024x1024
for SDXL, SD3, Flux models; and keep both width
and height
at multiples of 64
(eg. 1152x896
)
Batch Count
: How many batches to generate (in series)Batch Size
: How many images per batch (in parallel)CFG Scale
: How strong the prompts influence the output4
~8
is generally fine- Low value generates blurry images; High value generates "burnt" (really high-contrast and distorted) images
- Generally, the lower the steps, the lower the CFG
- eg. for LCM, Lightning, Turbo, set it to
1
instead
- eg. for LCM, Lightning, Turbo, set it to
- For
Flux
models, CFG should be set to1
as well
Seed
: The random seed that affects the latent noise generation- You should get the same output if you use the same parameters and seed
Tip
Go to the Stable Diffusion
section of the Settings
tab, and change Random number generator source
to CPU
for the maximum repeatability across different systems
This section will cover some settings recommended to change or frequently asked about
These settings modify the saving behavior
- File format for images:
png
is lossless with the largest filesizejpg
is lossy with the smallest filesizewebp
is in-between, but takes slightly longer to save
- Save copy of large images as JPG:
- If
enabled
, when you generate a high-resolution image, it would additionally save a.jpg
once the filesize threshold is reached
- If
These settings can improve the generation speed
- Cross attention optimization:
xformers
if enabled;sdp - scaled dot product
otherwiseAutomatic
for Forge
- Pad prompt/negative prompt -
Enable
- Persistent cond cache -
Enable
- Batch cond/uncond -
Enable
These settings affect the generation details
- Enable quantization in K samplers:
- I myself did
enable
this, though the effect seem minimal - Seems to be always enabled for Forge
- I myself did
- Emphasis mode:
- Setting it to
No norm
prevents a problem where certainSDXL
models tend to generate pure noises
- Setting it to
- Clip skip:
Long story short,some models work better with2
; but setting it to2
does not worsen the results for other models anyway. So just set it to2
...
These settings affect the preview during generation
- Live preview method:
TAESD
, significantly faster
- Return image with chosen live preview method on interrupt:
Enable
, makes interrupts faster
These settings modify the behavior of the Webui
- Automatically open webui in browser on startup:
- Change this if you don't want the browser to start automatically
- Show gradio deprecation warnings in console:
Disable
, like why...
Extensions are basically 3rd-party add-ons that provide additional features not native to the Webui
- Go to the Extensions tab
- Switch to the Install from URL sub-tab
- In the
URL for extension's git repository
field, paste in the link to the Extension's GitHub page - (Optional) In certain use cases, you may need to fill out the
Specific branch name
field, usually for development or compatibility reasons - (Optional) You can fill out
Local directory name
to set a custom folder name (The practical use of this is to sort Extensions) - Click the Install button
- Some Extensions may install additional dependencies; this can take some time
- Once the installation is finished, an
Installed into ...
line will appear under the Install button - Switch to the Installed sub-tab
- Click the Apply and restart UI button
- Go to the Extensions tab
- Click the Check for updates button
- Once finished, some Extensions may show the
behind HEAD
label in theUpdate
column - Click the Apply and restart UI button
- Some Extensions may install additional dependencies; this can take some time
Tip
Sometimes, after clicking the Apply and restart UI
button, the browser will refresh first before the Webui is actually ready, and you may see a bunch of disconnected errors or broken UI texts. When this happens, do not use the Webui yet. Wait for the console to show the Running on local URL: ...
line, then manually refresh (F5
) the browser again. After which, the Webui will be good to go again.
Some essential Extensions that basically everyone should have
- ControlNet
- RegionalPrompter / ForgeCouple
- MultiDiffusion & TiledVAE
Some Extensions written by yours truly
- Prompt Format
- Tabs Extension
- Easy Tag Insert
- Image Comparison
- IC Light
- Yapping
- Boomer
Some more places to learn about Stable Diffusion
- Reddit r/StableDiffusion
- Webui Features
- Youtube Sebastian Kamph
- Youtube OlivioSarikas
- Forge Troubleshoot
- eg.
Press anykey to continue...
- eg.
- If you appreciate my works and wish to support me, you can buy me a coffee~