Merge pull request #3221 from vladmandic/dev

merge dev to master
vladmandic · Jun 13, 2024 · c8b5ed4 · c8b5ed4
2 parents 84f9caa + 2b17186
commit c8b5ed4
Show file tree

Hide file tree

Showing 121 changed files with 328,919 additions and 885 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,98 @@
 # Change Log for SD.Next
 
+## Update for 2024-06-13
+
+### Highlights for 2024-06-13
+
+First, yes, it's here and supported: [**StabilityAI Stable Diffusion 3 Medium**](https://stability.ai/news/stable-diffusion-3-medium)  
+for details on how to download and use, see [Wiki](https://github.com/vladmandic/automatic/wiki/SD3)
+
+#### What else?
+
+A lot of work on state-of-the-art multi-lingual models with both [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT) and [MuLan](https://github.com/mulanai/MuLan)  
+Plus tons of minor features such as optimized initial install experience, **T-Gate** and **ResAdapter**, additional ModernUI themes (both light and dark) and fixes since the last release which was only 2 weeks ago!
+
+### Full Changelog for 2024-06-13
+
+#### New Models
+
+- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)  
+  yup, supported!  
+  quote: *"Stable Diffusion 3 Medium is a multimodal diffusion transformer (MMDiT) model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency"*  
+  sdnext also supports switching optional T5 text encoder on-the-fly as well as loading model from either diffusers repo or safetensors single-file  
+  for details, see [Wiki](https://github.com/vladmandic/automatic/wiki/SD3)
+- [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT) bilingual english/chinese diffusion transformer model  
+  note: this is a very large model at ~17GB, but can be used with less VRAM using model offloading  
+  simply select from networks -> models -> reference, model will be auto-downloaded on first use  
+
+#### New Functionality
+
+- [MuLan](https://github.com/mulanai/MuLan) Multi-langunage prompts
+  write your prompts forin ~110 auto-detected languages!  
+  compatible with *SD15* and *SDXL*  
+  enable in scripts -> MuLan and set encoder to `InternVL-14B-224px` encoder  
+  *note*: right now this is more of a proof-of-concept before smaller and/or quantized models are released  
+  model will be auto-downloaded on first use: note its huge size of 27GB  
+  even executing it in FP16 will require ~16GB of VRAM for text encoder alone  
+  examples:  
+  - English: photo of a beautiful woman wearing a white bikini on a beach with a city skyline in the background
+  - Croatian: fotografija lijepe žene u bijelom bikiniju na plaži s gradskim obzorom u pozadini
+  - Italian: Foto di una bella donna che indossa un bikini bianco su una spiaggia con lo skyline di una città sullo sfondo
+  - Spanish: Foto de una hermosa mujer con un bikini blanco en una playa con un horizonte de la ciudad en el fondo
+  - German: Foto einer schönen Frau in einem weißen Bikini an einem Strand mit einer Skyline der Stadt im Hintergrund
+  - Arabic: صورة لامرأة جميلة ترتدي بيكيني أبيض على شاطئ مع أفق المدينة في الخلفية
+  - Japanese: 街のスカイラインを背景にビーチで白いビキニを着た美しい女性の写真
+  - Chinese: 一个美丽的女人在海滩上穿着白色比基尼的照片, 背景是城市天际线
+  - Korean: 도시의 스카이라인을 배경으로 해변에서 흰색 비키니를 입은 아름 다운 여성의 사진
+- [T-Gate](https://github.com/HaozheLiu-ST/T-GATE) Speed up generations by gating at which step cross-attention is no longer needed  
+  enable via scripts -> t-gate  
+  compatible with *SD15*  
+- **PCM LoRAs** allow for fast denoising using less steps with standard *SD15* and *SDXL* models  
+  download from <https://huggingface.co/Kijai/converted_pcm_loras_fp16/tree/main>
+- [ByteDance ResAdapter](https://github.com/bytedance/res-adapter) resolution-free model adapter  
+  allows to use resolutions from 0.5 to 2.0 of original model resolution, compatible with *SD15* and *SDXL*
+  enable via scripts -> resadapter and select desired model
+- **Kohya HiRes Fix** allows for higher resolution generation using standard *SD15* models  
+  enable via scripts -> kohya-hires-fix  
+  *note*: alternative to regular hidiffusion method, but with different approach to scaling  
+- additional built-in 4 great custom trained **ControlNet SDXL** models from Xinsir: OpenPose, Canny, Scribble, AnimePainter  
+  thanks @lbeltrame
+- add torch **full deterministic mode**
+  enable in settings -> compute -> use deterministic mode  
+  typical differences are not large and its disabled by default as it does have some performance impact  
+- new sampler: **Euler FlowMatch**  
+
+#### Improvements
+
+- additional modernui themes
+- reintroduce prompt attention normalization, disabled by default, enable in settings -> execution  
+  this can drastically help with unbalanced prompts  
+- further work on improving python 3.12 functionality and remove experimental flag  
+  note: recommended version remains python 3.11 for all users except if you're using directml and then its python 3.10  
+- improved **installer** for initial installs  
+  initial install will do single-pass install of all required packages with correct versions  
+  subsequent runs will check package versions as necessary  
+- add env variable `SD_PIP_DEBUG` to write `pip.log` for all pip operations  
+  also improved installer logging  
+- add python version check for `torch-directml`  
+- do not install `tensorflow` by default  
+- improve metadata/infotext parser  
+  add `cli/image-exif.py` that can be used to view/extract metadata from images  
+- lower overhead on generate calls  
+- auto-synchronize modernui and core branches  
+- add option to pad prompt with zeros, thanks @Disty
+
+#### Fixes
+
+- cumulative fixes since the last release  
+- fix apply/unapply hidiffusion for sd15  
+- fix controlnet reference enabled check  
+- fix face-hires with control batch count  
+- install pynvml on-demand  
+- apply rollback-vae option to latest torch versions, thanks @Iaotle  
+- face hires skip if strength is 0  
+- restore all sampler configuration on sampler change  
+
 ## Update for 2024-06-02
 
 - fix textual inversion loading

diff --git a/README.md b/README.md
@@ -16,9 +16,9 @@
 ## Table of contents
 
 - [SD.Next Features](#sdnext-features)
-- [Backend support](#backend-support)
 - [Model support](#model-support)
 - [Platform support](#platform-support)
+- [Backend support](#backend-support)
 - [Examples](#examples)
 - [Install](#install)
 - [Notes](#notes)
@@ -31,7 +31,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
 - Multiple UIs!  
   ▹ **Standard | Modern**  
 - Multiple diffusion models!  
-  ▹ **Stable Diffusion 1.5/2.1 | SD-XL | LCM | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | Würstchen | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | etc.**
+  ▹ **Stable Diffusion 1.5/2.1/XL/3.0 | LCM | Lightning | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | Würstchen | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | HunyuanDiT | etc.**
 - Built-in Control for Text, Image, Batch and video processing!  
   ▹ **ControlNet | ControlNet XS | Control LLLite | T2I Adapters | IP Adapters**  
 - Multiplatform!  
@@ -54,30 +54,19 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
 
 *Main interface using **ModernUI***:  
 ![Screenshot-Dark](html/screenshot-modernui.jpg)
+![Screenshot-Dark](html/screenshot-modernui-sd3.jpg)
 
 For screenshots and informations on other available themes, see [Themes Wiki](https://github.com/vladmandic/automatic/wiki/Themes)
 
 <br>
 
-## Backend support
-
-**SD.Next** supports two main backends: *Diffusers* and *Original*:
-
-- **Diffusers**: Based on new [Huggingface Diffusers](https://huggingface.co/docs/diffusers/index) implementation  
-  Supports *all* models listed below  
-  This backend is set as default for new installations  
-  See [wiki article](https://github.com/vladmandic/automatic/wiki/Diffusers) for more information  
-- **Original**: Based on [LDM](https://github.com/Stability-AI/stablediffusion) reference implementation and significantly expanded on by [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)  
-  This backend and is fully compatible with most existing functionality and extensions written for *A1111 SDWebUI*  
-  Supports **SD 1.x** and **SD 2.x** models  
-  All other model types such as *SD-XL, LCM, Stable Cascade, PixArt, Playground, Segmind, Kandinsky, etc.* require backend **Diffusers**  
-
 ## Model support
 
 Additional models will be added as they become available and there is public interest in them
 
 - [RunwayML Stable Diffusion](https://github.com/Stability-AI/stablediffusion/) 1.x and 2.x *(all variants)*  
 - [StabilityAI Stable Diffusion XL](https://github.com/Stability-AI/generative-models)  
+- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)  
 - [StabilityAI Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) Base, XT 1.0, XT 1.1
 - [LCM: Latent Consistency Models](https://github.com/openai/consistency_models)  
 - [Playground](https://huggingface.co/playgroundai/playground-v2-256px-base) *v1, v2 256, v2 512, v2 1024 and latest v2.5*  
@@ -90,6 +79,7 @@ Additional models will be added as they become available and there is public int
 - [PixArt-α XL 2](https://github.com/PixArt-alpha/PixArt-alpha) *Medium and Large*  
 - [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma)  
 - [Warp Wuerstchen](https://huggingface.co/blog/wuertschen)  
+- [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT)
 - [Tsinghua UniDiffusion](https://github.com/thu-ml/unidiffuser)
 - [DeepFloyd IF](https://github.com/deep-floyd/IF) *Medium and Large*
 - [ModelScope T2V](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b)
@@ -102,11 +92,12 @@ Additional models will be added as they become available and there is public int
 
 
 Also supported are modifiers such as:
-- **LCM** and **Turbo** (*adversarial diffusion distillation*) networks
+- **LCM**, **Turbo** and **Lightning** (*adversarial diffusion distillation*) networks
 - All **LoRA** types such as LoCon, LyCORIS, HADA, IA3, Lokr, OFT
 - **IP-Adapters** for SD 1.5 and SD-XL
 - **InstantID**, **FaceSwap**, **FaceID**, **PhotoMerge**  
 - **AnimateDiff** for SD 1.5
+- **MuLAN** multi-language support  
 
 ## Platform support
 
@@ -120,6 +111,19 @@ Also supported are modifiers such as:
 - *Apple M1/M2* on *OSX* using built-in support in Torch with **MPS** optimizations  
 - *ONNX/Olive*  
 
+## Backend support
+
+**SD.Next** supports two main backends: *Diffusers* and *Original*:
+
+- **Diffusers**: Based on new [Huggingface Diffusers](https://huggingface.co/docs/diffusers/index) implementation  
+  Supports *all* models listed below  
+  This backend is set as default for new installations  
+  See [wiki article](https://github.com/vladmandic/automatic/wiki/Diffusers) for more information  
+- **Original**: Based on [LDM](https://github.com/Stability-AI/stablediffusion) reference implementation and significantly expanded on by [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)  
+  This backend and is fully compatible with most existing functionality and extensions written for *A1111 SDWebUI*  
+  Supports **SD 1.x** and **SD 2.x** models  
+  All other model types such as *SD-XL, LCM, Stable Cascade, PixArt, Playground, Segmind, Kandinsky, etc.* require backend **Diffusers**  
+
 ## Examples
 
 *IP Adapters*:

diff --git a/TODO.md b/TODO.md
@@ -2,37 +2,18 @@
 
 Main ToDo list can be found at [GitHub projects](https://github.com/users/vladmandic/projects)
 
-## Fix
-
-- ultralytics package install
-
 ## Future Candidates
 
-- stable diffusion 3.0: unreleased
-- boxdiff <https://github.com/huggingface/diffusers/pull/7947>
 - animatediff-sdxl <https://github.com/huggingface/diffusers/pull/6721>
 - async lowvram: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14855>
 - fp8: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14031>
-- profiling: <https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/716>
-- kohya-hires-fix: <https://github.com/huggingface/diffusers/pull/7633>
-- hunyuan-dit: <https://github.com/huggingface/diffusers/pull/8290>
 - init latents: variations, img2img
 - diffusers public callbacks  
 - include reference styles
 - lora: sc lora, dora, etc
-- controlnet: additional models
-- resadapter: <https://github.com/bytedance/res-adapter>
-- t-gate: <https://huggingface.co/docs/diffusers/main/en/optimization/tgate>
 
 ## Experimental
 
-- [MuLan](https://github.com/mulanai/MuLan) Multi-langunage prompts - wirte your prompts in ~110 auto-detected languages!
-  Compatible with SD15 and SDXL
-  Enable in scripts -> MuLan and set encoder to `InternVL-14B-224px` encoder
-  (that is currently only supported encoder, but others will be added)
-  Note: Model will be auto-downloaded on first use: note its huge size of 27GB
-  Even executing it in FP16 context will require ~16GB of VRAM for text encoder alone
-  *Note*: Uses fixed prompt parser, so no prompt attention will be used
 - [SDXL Flash Mini](https://huggingface.co/sd-community/sdxl-flash-mini)  
   SDXL type that weighs less, consumes less video memory, and the quality has not dropped much  
   to use, simply select from *networks -> models -> reference -> SDXL Flash Mini*  

diff --git a/cli/image-exif.py b/cli/image-exif.py
@@ -4,70 +4,17 @@
 import io
 import re
 import sys
-import json
+import importlib.util
 from PIL import Image, ExifTags, TiffImagePlugin, PngImagePlugin
 from rich import print # pylint: disable=redefined-builtin
 
 
-def unquote(text):
-    if len(text) == 0 or text[0] != '"' or text[-1] != '"':
-        return text
-    try:
-        return json.loads(text)
-    except Exception:
-        return text
-
-
-def parse_generation_parameters(infotext):
-    if not isinstance(infotext, str):
-        return {}
-    re_param = re.compile(r'\s*([\w ]+):\s*("(?:\\"[^,]|\\"|\\|[^\"])+"|[^,]*)(?:,|$)') # multi-word: value
-    re_size = re.compile(r"^(\d+)x(\d+)$") # int x int
-    basic_params = ['steps', 'seed', 'width', 'height', 'sampler', 'size', 'cfg scale', 'hires'] # first param is one of those
-
-    sanitized = infotext.replace('prompt:', 'Prompt:').replace('negative prompt:', 'Negative prompt:').replace('Negative Prompt', 'Negative prompt') # cleanup everything in brackets so re_params can work
-    sanitized = re.sub(r'<[^>]*>', lambda match: ' ' * len(match.group()), sanitized)
-    sanitized = re.sub(r'\([^)]*\)', lambda match: ' ' * len(match.group()), sanitized)
-    sanitized = re.sub(r'\{[^}]*\}', lambda match: ' ' * len(match.group()), sanitized)
-
-    params = dict(re_param.findall(sanitized))
-    params = { k.strip():params[k].strip() for k in params if k.lower() not in ['hashes', 'lora', 'embeddings', 'prompt', 'negative prompt']} # remove some keys
-    if len(list(params)) == 0:
-        first_param = None
-    else:
-        try:
-            first_param, first_param_idx = next((s, i) for i, s in enumerate(params) if any(x in s.lower() for x in basic_params))
-        except Exception:
-            first_param, first_param_idx = next(iter(params)), 0
-        if first_param_idx > 0:
-            for _i in range(first_param_idx):
-                params.pop(next(iter(params)))
-    params_idx = sanitized.find(f'{first_param}:') if first_param else -1
-    negative_idx = infotext.find("Negative prompt:")
-
-    prompt = infotext[:params_idx] if negative_idx == -1 else infotext[:negative_idx] # prompt can be with or without negative prompt
-    negative = infotext[negative_idx:params_idx] if negative_idx >= 0 else ''
+module_file = os.path.abspath(__file__)
+module_dir = os.path.dirname(module_file)
+module_spec = importlib.util.spec_from_file_location('infotext', os.path.join(module_dir, '..', 'modules', 'infotext.py'))
+infotext = importlib.util.module_from_spec(module_spec)
+module_spec.loader.exec_module(infotext)
 
-    for k, v in params.copy().items(): # avoid dict-has-changed
-        if len(v) > 0 and v[0] == '"' and v[-1] == '"':
-            v = unquote(v)
-        m = re_size.match(v)
-        if v.replace('.', '', 1).isdigit():
-            params[k] = float(v) if '.' in v else int(v)
-        elif v == "True":
-            params[k] = True
-        elif v == "False":
-            params[k] = False
-        elif m is not None:
-            params[f"{k}-1"] = int(m.group(1))
-            params[f"{k}-2"] = int(m.group(2))
-        elif k == 'VAE' and v == 'TAESD':
-            params["Full quality"] = False
-        else:
-            params[k] = v
-    params["Prompt"] = prompt.replace('Prompt:', '').strip()
-    params["Negative prompt"] = negative.replace('Negative prompt:', '').strip()
-    return params
 
 
 class Exif: # pylint: disable=single-string-used-for-slots
@@ -132,7 +79,7 @@ def decode(self, s: bytes):
 
     def parse(self):
         x = self.exif.pop('parameters', None) or self.exif.pop('UserComment', None)
-        res = parse_generation_parameters(x)
+        res = infotext.parse(x)
         return res
 
     def get_bytes(self):

diff --git a/cli/simple-txt2img.py b/cli/simple-txt2img.py
@@ -48,6 +48,7 @@ def generate(args): # pylint: disable=redefined-outer-name
     options['sampler_name'] = args.sampler
     options['width'] = int(args.width)
     options['height'] = int(args.height)
+    options['restore_faces'] = args.faces
     data = post('/sdapi/v1/txt2img', options)
     t1 = time.time()
     if 'images' in data:
@@ -71,6 +72,7 @@ def generate(args): # pylint: disable=redefined-outer-name
     parser.add_argument('--height', required=False, default=512, help='image height')
     parser.add_argument('--steps', required=False, default=20, help='number of steps')
     parser.add_argument('--seed', required=False, default=-1, help='initial seed')
+    parser.add_argument('--faces', action='store_true', help='restore faces')
     parser.add_argument('--sampler', required=False, default='Euler a', help='sampler name')
     parser.add_argument('--output', required=False, default=None, help='output image file')
     parser.add_argument('--model', required=False, help='model name')

diff --git a/configs/sd3/model_index.json b/configs/sd3/model_index.json
@@ -0,0 +1,41 @@
+{
+  "_class_name": "StableDiffusion3Pipeline",
+  "_diffusers_version": "0.29.0.dev0",
+  "_name_or_path": "stabilityai/stable-diffusion-3-medium",
+  "scheduler": [
+    "diffusers",
+    "FlowMatchEulerDiscreteScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModelWithProjection"
+  ],
+  "text_encoder_2": [
+    "transformers",
+    "CLIPTextModelWithProjection"
+  ],
+  "text_encoder_3": [
+    "transformers",
+    "T5EncoderModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "tokenizer_2": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "tokenizer_3": [
+    "transformers",
+    "T5TokenizerFast"
+  ],
+  "transformer": [
+    "diffusers",
+    "SD3Transformer2DModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
diff --git a/configs/sd3/scheduler/scheduler_config.json b/configs/sd3/scheduler/scheduler_config.json
@@ -0,0 +1,6 @@
+{
+  "_class_name": "FlowMatchEulerDiscreteScheduler",
+  "_diffusers_version": "0.29.0.dev0",
+  "num_train_timesteps": 1000,
+  "shift": 3.0
+}