Merge branch 'dev' into master

vladmandic · Mar 1, 2024 · 013d5a0 · 013d5a0
2 parents d182cd5 + 4345794
commit 013d5a0
Show file tree

Hide file tree

Showing 68 changed files with 1,570 additions and 2,606 deletions.
diff --git a/.pylintrc b/.pylintrc
@@ -16,6 +16,7 @@ ignore-paths=/usr/lib/.*$,
              ^modules/dml/.*$,
              ^modules/models/diffusion/.*$,
              ^modules/xadapters/.*$,
+             ^modules/tcd/.*$,
 ignore-patterns=
 ignored-modules=
 jobs=0

diff --git a/README.md b/README.md
@@ -56,19 +56,6 @@ For screenshots and informations on other available themes, see [Themes Wiki](ht
   Supports **SD 1.x** and **SD 2.x** models  
   All other model types such as *SD-XL, LCM, PixArt, Segmind, Kandinsky, etc.* require backend **Diffusers**  
 
-## Control
-
-**SD.Next** comes with built-in control for all types of text2image, image2image, video2video and batch processing
-
-*Control interface*:  
-![Screenshot-Control](html/screenshot-control.jpg)
-
-*Control processors*:  
-![Screenshot-Process](html/screenshot-processors.jpg)
-
-*Masking*:
-![Screenshot-Mask](html/screenshot-mask.jpg)
-
 ## Model support
 
 Additional models will be added as they become available and there is public interest in them
@@ -110,7 +97,6 @@ Also supported are modifiers such as:
 *InstantID*:  
 ![Screenshot-InstantID](html/screenshot-instantid.jpg)
 
-
 > [!IMPORTANT]
 > - Loading any model other than standard SD 1.x / SD 2.x requires use of backend **Diffusers**  
 > - Loading any other models using **Original** backend is not supported  
@@ -151,51 +137,89 @@ Also supported are modifiers such as:
 
 Once SD.Next is installed, simply run `webui.ps1` or `webui.bat` (*Windows*) or `webui.sh` (*Linux or MacOS*)
 
-Below is partial list of all available parameters, run `webui --help` for the full list:
+List of available parameters, run `webui --help` for the full & up-to-date list:
 
     Server options:
-      --config CONFIG                  Use specific server configuration file, default: config.json
-      --ui-config UI_CONFIG            Use specific UI configuration file, default: ui-config.json
-      --medvram                        Split model stages and keep only active part in VRAM, default: False
-      --lowvram                        Split model components and keep only active part in VRAM, default: False
-      --ckpt CKPT                      Path to model checkpoint to load immediately, default: None
-      --vae VAE                        Path to VAE checkpoint to load immediately, default: None
-      --data-dir DATA_DIR              Base path where all user data is stored, default:
-      --models-dir MODELS_DIR          Base path where all models are stored, default: models
-      --share                          Enable UI accessible through Gradio site, default: False
-      --insecure                       Enable extensions tab regardless of other options, default: False
-      --listen                         Launch web server using public IP address, default: False
-      --auth AUTH                      Set access authentication like "user:pwd,user:pwd""
-      --autolaunch                     Open the UI URL in the system's default browser upon launch
-      --docs                           Mount API docs, default: False
-      --no-hashing                     Disable hashing of checkpoints, default: False
-      --no-metadata                    Disable reading of metadata from models, default: False
-      --backend {original,diffusers}   force model pipeline type
+      --config CONFIG                                    Use specific server configuration file, default: config.json
+      --ui-config UI_CONFIG                              Use specific UI configuration file, default: ui-config.json
+      --medvram                                          Split model stages and keep only active part in VRAM, default: False
+      --lowvram                                          Split model components and keep only active part in VRAM, default: False
+      --ckpt CKPT                                        Path to model checkpoint to load immediately, default: None
+      --vae VAE                                          Path to VAE checkpoint to load immediately, default: None
+      --data-dir DATA_DIR                                Base path where all user data is stored, default:
+      --models-dir MODELS_DIR                            Base path where all models are stored, default: models
+      --allow-code                                       Allow custom script execution, default: False
+      --share                                            Enable UI accessible through Gradio site, default: False
+      --insecure                                         Enable extensions tab regardless of other options, default: False
+      --use-cpu USE_CPU [USE_CPU ...]                    Force use CPU for specified modules, default: []
+      --listen                                           Launch web server using public IP address, default: False
+      --port PORT                                        Launch web server with given server port, default: 7860
+      --freeze                                           Disable editing settings
+      --auth AUTH                                        Set access authentication like "user:pwd,user:pwd""
+      --auth-file AUTH_FILE                              Set access authentication using file, default: None
+      --autolaunch                                       Open the UI URL in the system's default browser upon launch
+      --docs                                             Mount API docs, default: False
+      --api-only                                         Run in API only mode without starting UI
+      --api-log                                          Enable logging of all API requests, default: False
+      --device-id DEVICE_ID                              Select the default CUDA device to use, default: None
+      --cors-origins CORS_ORIGINS                        Allowed CORS origins as comma-separated list, default: None
+      --cors-regex CORS_REGEX                            Allowed CORS origins as regular expression, default: None
+      --tls-keyfile TLS_KEYFILE                          Enable TLS and specify key file, default: None
+      --tls-certfile TLS_CERTFILE                        Enable TLS and specify cert file, default: None
+      --tls-selfsign                                     Enable TLS with self-signed certificates, default: False
+      --server-name SERVER_NAME                          Sets hostname of server, default: None
+      --no-hashing                                       Disable hashing of checkpoints, default: False
+      --no-metadata                                      Disable reading of metadata from models, default: False
+      --disable-queue                                    Disable queues, default: False
+      --subpath SUBPATH                                  Customize the URL subpath for usage with reverse proxy
+      --backend {original,diffusers}                     force model pipeline type
+      --allowed-paths ALLOWED_PATHS [ALLOWED_PATHS ...]  add additional paths to paths allowed for web access
 
     Setup options:
-      --debug                          Run installer with debug logging, default: False
-      --reset                          Reset main repository to latest version, default: False
-      --upgrade                        Upgrade main repository to latest version, default: False
-      --requirements                   Force re-check of requirements, default: False
-      --quick                          Run with startup sequence only, default: False
-      --use-directml                   Use DirectML if no compatible GPU is detected, default: False
-      --use-openvino                   Use Intel OpenVINO backend, default: False
-      --use-ipex                       Force use Intel OneAPI XPU backend, default: False
-      --use-cuda                       Force use nVidia CUDA backend, default: False
-      --use-rocm                       Force use AMD ROCm backend, default: False
-      --use-xformers                   Force use xFormers cross-optimization, default: False
-      --skip-requirements              Skips checking and installing requirements, default: False
-      --skip-extensions                Skips running individual extension installers, default: False
-      --skip-git                       Skips running all GIT operations, default: False
-      --skip-torch                     Skips running Torch checks, default: False
-      --skip-all                       Skips running all checks, default: False
-      --experimental                   Allow unsupported versions of libraries, default: False
-      --reinstall                      Force reinstallation of all requirements, default: False
-      --safe                           Run in safe mode with no user extensions
-
+      --reset                                            Reset main repository to latest version, default: False
+      --upgrade                                          Upgrade main repository to latest version, default: False
+      --requirements                                     Force re-check of requirements, default: False
+      --quick                                            Bypass version checks, default: False
+      --use-directml                                     Use DirectML if no compatible GPU is detected, default: False
+      --use-openvino                                     Use Intel OpenVINO backend, default: False
+      --use-ipex                                         Force use Intel OneAPI XPU backend, default: False
+      --use-cuda                                         Force use nVidia CUDA backend, default: False
+      --use-rocm                                         Force use AMD ROCm backend, default: False
+      --use-zluda                                        Force use ZLUDA, AMD GPUs only, default: False
+      --use-xformers                                     Force use xFormers cross-optimization, default: False
+      --skip-requirements                                Skips checking and installing requirements, default: False
+      --skip-extensions                                  Skips running individual extension installers, default: False
+      --skip-git                                         Skips running all GIT operations, default: False
+      --skip-torch                                       Skips running Torch checks, default: False
+      --skip-all                                         Skips running all checks, default: False
+      --skip-env                                         Skips setting of env variables during startup, default: False
+      --experimental                                     Allow unsupported versions of libraries, default: False
+      --reinstall                                        Force reinstallation of all requirements, default: False
+      --test                                             Run test only and exit
+      --version                                          Print version information
+      --ignore                                           Ignore any errors and attempt to continue
+      --safe                                             Run in safe mode with no user extensions
+
+    Logging options:
+      --log LOG                                          Set log file, default: None
+      --debug                                            Run installer with debug logging, default: False
+      --profile                                          Run profiler, default: False
 
 ## Notes
 
+### Control
+
+**SD.Next** comes with built-in control for all types of text2image, image2image, video2video and batch processing
+
+*Control interface*:  
+![Screenshot-Control](html/screenshot-control.jpg)
+
+*Control processors*:  
+![Screenshot-Process](html/screenshot-processors.jpg)
+
+*Masking*:
+![Screenshot-Mask](html/screenshot-mask.jpg)
+
 ### **Extensions**
 
 SD.Next comes with several extensions pre-installed:

diff --git a/TODO.md b/TODO.md
@@ -5,18 +5,14 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
 ## Candidates for next release
 
 - defork
-- stable cascade: <https://github.com/vladmandic/automatic/wiki/Stable-Cascade>
+- stable diffusion 3.0
 - ipadapter masking: <https://github.com/huggingface/diffusers/pull/6847>
-- init latents: variations, tiling, img2img
 - x-adapter: <https://github.com/showlab/X-Adapter>
-- diffusers public callbacks  
-- image2video: pia and vgen pipelines  
-- video2video
 - async lowvram: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14855>
+- init latents: variations, tiling, img2img
+- diffusers public callbacks  
 - remove builtin: controlnet
 - remove builtin: image-browser
-- remove training: ti
-- remove training: hypernetwork
 
 ## Control missing features
 

diff --git a/cli/simple-info.py b/cli/simple-info.py
@@ -0,0 +1,57 @@
+#!/usr/bin/env python
+import os
+import time
+import base64
+import logging
+import argparse
+import requests
+import urllib3
+
+
+sd_url = os.environ.get('SDAPI_URL', "http://127.0.0.1:7860")
+sd_username = os.environ.get('SDAPI_USR', None)
+sd_password = os.environ.get('SDAPI_PWD', None)
+
+
+logging.basicConfig(level = logging.INFO, format = '%(asctime)s %(levelname)s: %(message)s')
+log = logging.getLogger(__name__)
+urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+
+
+def auth():
+    if sd_username is not None and sd_password is not None:
+        return requests.auth.HTTPBasicAuth(sd_username, sd_password)
+    return None
+
+
+def get(endpoint: str, dct: dict = None):
+    req = requests.get(f'{sd_url}{endpoint}', json=dct, timeout=300, verify=False, auth=auth())
+    if req.status_code != 200:
+        return { 'error': req.status_code, 'reason': req.reason, 'url': req.url }
+    else:
+        return req.json()
+
+
+def post(endpoint: str, dct: dict = None):
+    req = requests.post(f'{sd_url}{endpoint}', json = dct, timeout=300, verify=False, auth=auth())
+    if req.status_code != 200:
+        return { 'error': req.status_code, 'reason': req.reason, 'url': req.url }
+    else:
+        return req.json()
+
+
+def info(args): # pylint: disable=redefined-outer-name
+    t0 = time.time()
+    with open(args.input, 'rb') as f:
+        content = f.read()
+    data = post('/sdapi/v1/png-info', { 'image': base64.b64encode(content).decode() })
+    t1 = time.time()
+    log.info(f'received: {data} time={t1-t0:.2f}')
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description = 'simple-info')
+    parser.add_argument('--input', required=True, help='input image')
+    args = parser.parse_args()
+    log.info(f'info: {args}')
+    info(args)
diff --git a/configs/playground-v2.5-1024px-aesthetic.fp16_vae.json b/configs/playground-v2.5-1024px-aesthetic.fp16_vae.json
@@ -0,0 +1,43 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.27.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ],
+  "latents_mean": [
+    -1.6574,
+    1.886,
+    -1.383,
+    2.5155
+  ],
+  "latents_std": [
+    8.4927,
+    5.9022,
+    6.5498,
+    5.2299
+  ],
+  "scaling_factor": 0.5
+}
diff --git a/html/locale_en.json b/html/locale_en.json
@@ -133,7 +133,7 @@
   {"id":"","label":"Refiner start","localized":"","hint":"Refiner pass will start when base model is this much complete (set to 0 or 1 to run after full base model run)"},
   {"id":"","label":"Refiner steps","localized":"","hint":"Number of steps to use for refiner pass"},
   {"id":"","label":"Secondary CFG Scale","localized":"","hint":"CFG scale used for refiner pass"},
-  {"id":"","label":"Guidance rescale","localized":"","hint":"Rescale CFG generated noise to avoid overexposed images"},
+  {"id":"","label":"Rescale guidance","localized":"","hint":"Rescale CFG generated noise to avoid overexposed images"},
   {"id":"","label":"Secondary Prompt","localized":"","hint":"Prompt used for both second encoder in base model (if it exists) and for refiner pass (if enabled)"},
   {"id":"","label":"Secondary negative prompt","localized":"","hint":"Negative prompt used for both second encoder in base model (if it exists) and for refiner pass (if enabled)"},
   {"id":"","label":"Width","localized":"","hint":"Image width"},

diff --git a/html/reference.json b/html/reference.json
@@ -169,6 +169,11 @@
     "desc": "Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground. Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, according to Playground’s user study.",
     "preview": "playgroundai--playground-v2-1024px-aesthetic.jpg"
   },
+  "Playground v2.5": {
+    "path": "playground-v2.5-1024px-aesthetic.fp16.safetensors@https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/resolve/main/playground-v2.5-1024px-aesthetic.fp16.safetensors?download=true",
+    "desc": "Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2. Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.",
+    "preview": "playgroundai--playground-v2-1024px-aesthetic.jpg"
+  },
   "DeepFloyd IF Medium": {
     "path": "DeepFloyd/IF-I-M-v1.0",
     "desc": "DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model, that can generate pictures with new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset. It is modular and composed of frozen text mode and three pixel cascaded diffusion modules, each designed to generate images of increasing resolution: 64x64, 256x256, and 1024x1024.",
@@ -184,6 +189,14 @@
     "desc": "Amused is a lightweight text to image model based off of the muse architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.",
     "preview": "amused--amused-512.jpg"
   },
+  "KOALA 700M": {
+    "path": "huggingface/etri-vilab/koala-700m-llava-cap",
+    "variant": "fp16",
+    "skip": true,
+    "desc": "Fast text-to-image model, called KOALA, by compressing SDXL's U-Net and distilling knowledge from SDXL into our model. KOALA-700M can generate a 1024x1024 image in less than 1.5 seconds on an NVIDIA 4090 GPU, which is more than 2x faster than SDXL.",
+    "preview": "etri-vilab--koala-700m-llava-cap.jpg"
+  },
+
   "Tsinghua UniDiffuser": {
     "path": "thu-ml/unidiffuser-v1",
     "desc": "UniDiffuser is a unified diffusion framework to fit all distributions relevant to a set of multi-modal data in one transformer. UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.\nSpecifically, UniDiffuser employs a variation of transformer, called U-ViT, which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from Stable Diffusion, a pretrained image ViT-B/32 CLIP encoder, a pretrained text ViT-L CLIP encoder, and a GPT-2 text decoder finetuned by ourselves.",