Skip to content

Commit

Permalink
Merge branch 'dev' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
vladmandic authored Mar 1, 2024
2 parents d182cd5 + 4345794 commit 013d5a0
Show file tree
Hide file tree
Showing 68 changed files with 1,570 additions and 2,606 deletions.
1 change: 1 addition & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ ignore-paths=/usr/lib/.*$,
^modules/dml/.*$,
^modules/models/diffusion/.*$,
^modules/xadapters/.*$,
^modules/tcd/.*$,
ignore-patterns=
ignored-modules=
jobs=0
Expand Down
128 changes: 76 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,6 @@ For screenshots and informations on other available themes, see [Themes Wiki](ht
Supports **SD 1.x** and **SD 2.x** models
All other model types such as *SD-XL, LCM, PixArt, Segmind, Kandinsky, etc.* require backend **Diffusers**

## Control

**SD.Next** comes with built-in control for all types of text2image, image2image, video2video and batch processing

*Control interface*:
![Screenshot-Control](html/screenshot-control.jpg)

*Control processors*:
![Screenshot-Process](html/screenshot-processors.jpg)

*Masking*:
![Screenshot-Mask](html/screenshot-mask.jpg)

## Model support

Additional models will be added as they become available and there is public interest in them
Expand Down Expand Up @@ -110,7 +97,6 @@ Also supported are modifiers such as:
*InstantID*:
![Screenshot-InstantID](html/screenshot-instantid.jpg)


> [!IMPORTANT]
> - Loading any model other than standard SD 1.x / SD 2.x requires use of backend **Diffusers**
> - Loading any other models using **Original** backend is not supported
Expand Down Expand Up @@ -151,51 +137,89 @@ Also supported are modifiers such as:

Once SD.Next is installed, simply run `webui.ps1` or `webui.bat` (*Windows*) or `webui.sh` (*Linux or MacOS*)

Below is partial list of all available parameters, run `webui --help` for the full list:
List of available parameters, run `webui --help` for the full & up-to-date list:

Server options:
--config CONFIG Use specific server configuration file, default: config.json
--ui-config UI_CONFIG Use specific UI configuration file, default: ui-config.json
--medvram Split model stages and keep only active part in VRAM, default: False
--lowvram Split model components and keep only active part in VRAM, default: False
--ckpt CKPT Path to model checkpoint to load immediately, default: None
--vae VAE Path to VAE checkpoint to load immediately, default: None
--data-dir DATA_DIR Base path where all user data is stored, default:
--models-dir MODELS_DIR Base path where all models are stored, default: models
--share Enable UI accessible through Gradio site, default: False
--insecure Enable extensions tab regardless of other options, default: False
--listen Launch web server using public IP address, default: False
--auth AUTH Set access authentication like "user:pwd,user:pwd""
--autolaunch Open the UI URL in the system's default browser upon launch
--docs Mount API docs, default: False
--no-hashing Disable hashing of checkpoints, default: False
--no-metadata Disable reading of metadata from models, default: False
--backend {original,diffusers} force model pipeline type
--config CONFIG Use specific server configuration file, default: config.json
--ui-config UI_CONFIG Use specific UI configuration file, default: ui-config.json
--medvram Split model stages and keep only active part in VRAM, default: False
--lowvram Split model components and keep only active part in VRAM, default: False
--ckpt CKPT Path to model checkpoint to load immediately, default: None
--vae VAE Path to VAE checkpoint to load immediately, default: None
--data-dir DATA_DIR Base path where all user data is stored, default:
--models-dir MODELS_DIR Base path where all models are stored, default: models
--allow-code Allow custom script execution, default: False
--share Enable UI accessible through Gradio site, default: False
--insecure Enable extensions tab regardless of other options, default: False
--use-cpu USE_CPU [USE_CPU ...] Force use CPU for specified modules, default: []
--listen Launch web server using public IP address, default: False
--port PORT Launch web server with given server port, default: 7860
--freeze Disable editing settings
--auth AUTH Set access authentication like "user:pwd,user:pwd""
--auth-file AUTH_FILE Set access authentication using file, default: None
--autolaunch Open the UI URL in the system's default browser upon launch
--docs Mount API docs, default: False
--api-only Run in API only mode without starting UI
--api-log Enable logging of all API requests, default: False
--device-id DEVICE_ID Select the default CUDA device to use, default: None
--cors-origins CORS_ORIGINS Allowed CORS origins as comma-separated list, default: None
--cors-regex CORS_REGEX Allowed CORS origins as regular expression, default: None
--tls-keyfile TLS_KEYFILE Enable TLS and specify key file, default: None
--tls-certfile TLS_CERTFILE Enable TLS and specify cert file, default: None
--tls-selfsign Enable TLS with self-signed certificates, default: False
--server-name SERVER_NAME Sets hostname of server, default: None
--no-hashing Disable hashing of checkpoints, default: False
--no-metadata Disable reading of metadata from models, default: False
--disable-queue Disable queues, default: False
--subpath SUBPATH Customize the URL subpath for usage with reverse proxy
--backend {original,diffusers} force model pipeline type
--allowed-paths ALLOWED_PATHS [ALLOWED_PATHS ...] add additional paths to paths allowed for web access

Setup options:
--debug Run installer with debug logging, default: False
--reset Reset main repository to latest version, default: False
--upgrade Upgrade main repository to latest version, default: False
--requirements Force re-check of requirements, default: False
--quick Run with startup sequence only, default: False
--use-directml Use DirectML if no compatible GPU is detected, default: False
--use-openvino Use Intel OpenVINO backend, default: False
--use-ipex Force use Intel OneAPI XPU backend, default: False
--use-cuda Force use nVidia CUDA backend, default: False
--use-rocm Force use AMD ROCm backend, default: False
--use-xformers Force use xFormers cross-optimization, default: False
--skip-requirements Skips checking and installing requirements, default: False
--skip-extensions Skips running individual extension installers, default: False
--skip-git Skips running all GIT operations, default: False
--skip-torch Skips running Torch checks, default: False
--skip-all Skips running all checks, default: False
--experimental Allow unsupported versions of libraries, default: False
--reinstall Force reinstallation of all requirements, default: False
--safe Run in safe mode with no user extensions

--reset Reset main repository to latest version, default: False
--upgrade Upgrade main repository to latest version, default: False
--requirements Force re-check of requirements, default: False
--quick Bypass version checks, default: False
--use-directml Use DirectML if no compatible GPU is detected, default: False
--use-openvino Use Intel OpenVINO backend, default: False
--use-ipex Force use Intel OneAPI XPU backend, default: False
--use-cuda Force use nVidia CUDA backend, default: False
--use-rocm Force use AMD ROCm backend, default: False
--use-zluda Force use ZLUDA, AMD GPUs only, default: False
--use-xformers Force use xFormers cross-optimization, default: False
--skip-requirements Skips checking and installing requirements, default: False
--skip-extensions Skips running individual extension installers, default: False
--skip-git Skips running all GIT operations, default: False
--skip-torch Skips running Torch checks, default: False
--skip-all Skips running all checks, default: False
--skip-env Skips setting of env variables during startup, default: False
--experimental Allow unsupported versions of libraries, default: False
--reinstall Force reinstallation of all requirements, default: False
--test Run test only and exit
--version Print version information
--ignore Ignore any errors and attempt to continue
--safe Run in safe mode with no user extensions

Logging options:
--log LOG Set log file, default: None
--debug Run installer with debug logging, default: False
--profile Run profiler, default: False

## Notes

### Control

**SD.Next** comes with built-in control for all types of text2image, image2image, video2video and batch processing

*Control interface*:
![Screenshot-Control](html/screenshot-control.jpg)

*Control processors*:
![Screenshot-Process](html/screenshot-processors.jpg)

*Masking*:
![Screenshot-Mask](html/screenshot-mask.jpg)

### **Extensions**

SD.Next comes with several extensions pre-installed:
Expand Down
10 changes: 3 additions & 7 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,14 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
## Candidates for next release

- defork
- stable cascade: <https://github.com/vladmandic/automatic/wiki/Stable-Cascade>
- stable diffusion 3.0
- ipadapter masking: <https://github.com/huggingface/diffusers/pull/6847>
- init latents: variations, tiling, img2img
- x-adapter: <https://github.com/showlab/X-Adapter>
- diffusers public callbacks
- image2video: pia and vgen pipelines
- video2video
- async lowvram: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14855>
- init latents: variations, tiling, img2img
- diffusers public callbacks
- remove builtin: controlnet
- remove builtin: image-browser
- remove training: ti
- remove training: hypernetwork

## Control missing features

Expand Down
57 changes: 57 additions & 0 deletions cli/simple-info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env python
import os
import time
import base64
import logging
import argparse
import requests
import urllib3


sd_url = os.environ.get('SDAPI_URL', "http://127.0.0.1:7860")
sd_username = os.environ.get('SDAPI_USR', None)
sd_password = os.environ.get('SDAPI_PWD', None)


logging.basicConfig(level = logging.INFO, format = '%(asctime)s %(levelname)s: %(message)s')
log = logging.getLogger(__name__)
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


def auth():
if sd_username is not None and sd_password is not None:
return requests.auth.HTTPBasicAuth(sd_username, sd_password)
return None


def get(endpoint: str, dct: dict = None):
req = requests.get(f'{sd_url}{endpoint}', json=dct, timeout=300, verify=False, auth=auth())
if req.status_code != 200:
return { 'error': req.status_code, 'reason': req.reason, 'url': req.url }
else:
return req.json()


def post(endpoint: str, dct: dict = None):
req = requests.post(f'{sd_url}{endpoint}', json = dct, timeout=300, verify=False, auth=auth())
if req.status_code != 200:
return { 'error': req.status_code, 'reason': req.reason, 'url': req.url }
else:
return req.json()


def info(args): # pylint: disable=redefined-outer-name
t0 = time.time()
with open(args.input, 'rb') as f:
content = f.read()
data = post('/sdapi/v1/png-info', { 'image': base64.b64encode(content).decode() })
t1 = time.time()
log.info(f'received: {data} time={t1-t0:.2f}')


if __name__ == "__main__":
parser = argparse.ArgumentParser(description = 'simple-info')
parser.add_argument('--input', required=True, help='input image')
args = parser.parse_args()
log.info(f'info: {args}')
info(args)
43 changes: 43 additions & 0 deletions configs/playground-v2.5-1024px-aesthetic.fp16_vae.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"_class_name": "AutoencoderKL",
"_diffusers_version": "0.27.0.dev0",
"act_fn": "silu",
"block_out_channels": [
128,
256,
512,
512
],
"down_block_types": [
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D",
"DownEncoderBlock2D"
],
"force_upcast": true,
"in_channels": 3,
"latent_channels": 4,
"layers_per_block": 2,
"norm_num_groups": 32,
"out_channels": 3,
"sample_size": 1024,
"up_block_types": [
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D",
"UpDecoderBlock2D"
],
"latents_mean": [
-1.6574,
1.886,
-1.383,
2.5155
],
"latents_std": [
8.4927,
5.9022,
6.5498,
5.2299
],
"scaling_factor": 0.5
}
2 changes: 1 addition & 1 deletion html/locale_en.json
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
{"id":"","label":"Refiner start","localized":"","hint":"Refiner pass will start when base model is this much complete (set to 0 or 1 to run after full base model run)"},
{"id":"","label":"Refiner steps","localized":"","hint":"Number of steps to use for refiner pass"},
{"id":"","label":"Secondary CFG Scale","localized":"","hint":"CFG scale used for refiner pass"},
{"id":"","label":"Guidance rescale","localized":"","hint":"Rescale CFG generated noise to avoid overexposed images"},
{"id":"","label":"Rescale guidance","localized":"","hint":"Rescale CFG generated noise to avoid overexposed images"},
{"id":"","label":"Secondary Prompt","localized":"","hint":"Prompt used for both second encoder in base model (if it exists) and for refiner pass (if enabled)"},
{"id":"","label":"Secondary negative prompt","localized":"","hint":"Negative prompt used for both second encoder in base model (if it exists) and for refiner pass (if enabled)"},
{"id":"","label":"Width","localized":"","hint":"Image width"},
Expand Down
13 changes: 13 additions & 0 deletions html/reference.json
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,11 @@
"desc": "Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground. Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, according to Playground’s user study.",
"preview": "playgroundai--playground-v2-1024px-aesthetic.jpg"
},
"Playground v2.5": {
"path": "playground-v2.5-1024px-aesthetic.fp16.safetensors@https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/resolve/main/playground-v2.5-1024px-aesthetic.fp16.safetensors?download=true",
"desc": "Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2. Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.",
"preview": "playgroundai--playground-v2-1024px-aesthetic.jpg"
},
"DeepFloyd IF Medium": {
"path": "DeepFloyd/IF-I-M-v1.0",
"desc": "DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model, that can generate pictures with new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset. It is modular and composed of frozen text mode and three pixel cascaded diffusion modules, each designed to generate images of increasing resolution: 64x64, 256x256, and 1024x1024.",
Expand All @@ -184,6 +189,14 @@
"desc": "Amused is a lightweight text to image model based off of the muse architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.",
"preview": "amused--amused-512.jpg"
},
"KOALA 700M": {
"path": "huggingface/etri-vilab/koala-700m-llava-cap",
"variant": "fp16",
"skip": true,
"desc": "Fast text-to-image model, called KOALA, by compressing SDXL's U-Net and distilling knowledge from SDXL into our model. KOALA-700M can generate a 1024x1024 image in less than 1.5 seconds on an NVIDIA 4090 GPU, which is more than 2x faster than SDXL.",
"preview": "etri-vilab--koala-700m-llava-cap.jpg"
},

"Tsinghua UniDiffuser": {
"path": "thu-ml/unidiffuser-v1",
"desc": "UniDiffuser is a unified diffusion framework to fit all distributions relevant to a set of multi-modal data in one transformer. UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.\nSpecifically, UniDiffuser employs a variation of transformer, called U-ViT, which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from Stable Diffusion, a pretrained image ViT-B/32 CLIP encoder, a pretrained text ViT-L CLIP encoder, and a GPT-2 text decoder finetuned by ourselves.",
Expand Down
Loading

0 comments on commit 013d5a0

Please sign in to comment.