Stable Diffusion AI绘画界的 扛把子

2025年09月13日 00:12:51
86
AI绘图 AI图像修复 AI图像风格迁移 免费商用
stablediffusion Stability-AI/stablediffusion

提到AI画图,没人能绕开Stable Diffusion!由Stability AI开源,支持文本生成图像、图像修复、风格迁移,关键是完全免费商用(非商用更没问题),普通电脑装个WebUI就能玩到飞起。

项目大小 73.44 KB
涉及语言 Python 100.00%
许可协议 LICENSE
仓库同步说明
  • • 同步需要仓库写入权限以创建目标仓库
  • • 使用平台账号授权登录后将同步到您平台下的个人仓库

Stable Diffusion Version 2

t2i
t2i
t2i

This repository contains Stable Diffusion models trained from scratch and will be continuously updated with
new checkpoints. The following list provides an overview of all currently available models. More coming soon.

News

March 24, 2023

Stable UnCLIP 2.1

December 7, 2022

Version 2.1

  • New stable diffusion model (Stable Diffusion 2.1-v, Hugging Face) at 768x768 resolution and (Stable Diffusion 2.1-base, HuggingFace) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.
    Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python

November 24, 2022

Version 2.0

  • New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.
  • The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.
  • Added a x4 upscaling latent text-guided diffusion model.
  • New depth-guided stable diffusion model, finetuned from SD 2.0-base. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.
  • A text-guided inpainting model, finetuned from SD 2.0-base.

We follow the original repository and provide basic inference scripts to sample from the models.


The original Stable Diffusion model was created in a collaboration with CompVis and RunwayML and builds upon the work:

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach*,
Andreas Blattmann*,
Dominik Lorenz,
Patrick Esser,
Björn Ommer

CVPR '22 Oral |
GitHub | arXiv | Project page

and many others.

Stable Diffusion is a latent text-to-image diffusion model.


Requirements

You can update an existing latent diffusion environment by running

1
2
3
conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

xformers efficient attention

For more efficiency and speed on GPUs,
we highly recommended installing the xformers
library.

Tested on A100 with CUDA 11.4.
Installation needs a somewhat recent version of nvcc and gcc/g++, obtain those, e.g., via

1
2
3
4
export CUDA_HOME=/usr/local/cuda-11.4
conda install -c nvidia/label/cuda-11.4.0 cuda-nvcc
conda install -c conda-forge gcc
conda install -c conda-forge gxx_linux-64==9.5.0

Then, run the following (compiling takes up to 30 min).

1
2
3
4
5
6
7
cd ..
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e .
cd ../stablediffusion

Upon successful installation, the code will automatically default to memory efficient attention
for the self- and cross-attention layers in the U-Net and autoencoder.

General Disclaimer

Stable Diffusion models are general text-to-image diffusion models and therefore mirror biases and (mis-)conceptions that are present
in their training data. Although efforts were made to reduce the inclusion of explicit pornographic material, we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations.
The weights are research artifacts and should be treated as such.

Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding model card.
The weights are available via the StabilityAI organization at Hugging Face under the CreativeML Open RAIL+±M License.

Stable Diffusion v2

Stable Diffusion v2 refers to a specific configuration of the model
architecture that uses a downsampling-factor 8 autoencoder with an 865M UNet
and OpenCLIP ViT-H/14 text encoder for the diffusion model. The SD 2-v model produces 768x768 px outputs.

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0) and 50 DDIM sampling steps show the relative improvements of the checkpoints:

sd evaluation results

Text-to-Image

txt2img-stable2
txt2img-stable2

Stable Diffusion 2 is a latent diffusion model conditioned on the penultimate text embeddings of a CLIP ViT-H/14 text encoder.
We provide a reference script for sampling.

Reference Sampling Script

This script incorporates an invisible watermarking of the outputs, to help viewers identify the images as machine-generated.
We provide the configs for the SD2-v (768px) and SD2-base (512px) model.

First, download the weights for SD2.1-v and SD2.1-base.

To sample from the SD2.1-v model, run the following:

1
python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt  --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 

or try out the Web Demo: ![Hugging Face Spaces](https://img.shields.io/badge/🤗 Hugging Face-Spaces-blue).

To sample from the base model, use

1
python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt  --config  

By default, this uses the DDIM sampler, and renders images of size 768x768 (which it was trained on) in 50 steps.
Empirically, the v-models can be sampled with higher guidance scales.

Note: The inference config for all model versions is designed to be used with EMA-only checkpoints.
For this reason use_ema=False is set in the configuration, otherwise the code will try to switch from
non-EMA to EMA weights.

Enable Intel® Extension for PyTorch* optimizations in Text-to-Image script

If you’re planning on running Text-to-Image on Intel® CPU, try to sample an image with TorchScript and Intel® Extension for PyTorch* optimizations. Intel® Extension for PyTorch* extends PyTorch by enabling up-to-date features optimizations for an extra performance boost on Intel® hardware. It can optimize memory layout of the operators to Channel Last memory format, which is generally beneficial for Intel CPUs, take advantage of the most advanced instruction set available on a machine, optimize operators and many more.

Prerequisites

Before running the script, make sure you have all needed libraries installed. (the optimization was checked on Ubuntu 20.04). Install jemalloc, numactl, Intel® OpenMP and Intel® Extension for PyTorch*.

1
2
3
apt-get install numactl libjemalloc-dev
pip install intel-openmp
pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable

To sample from the SD2.1-v model with TorchScript+IPEX optimizations, run the following. Remember to specify desired number of instances you want to run the program on (more).

1
MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000 python -m intel_extension_for_pytorch.cpu.launch --ninstance  --enable_jemalloc scripts/txt2img.py --prompt "a corgi is playing guitar, oil on canvas" --ckpt  --config configs/stable-diffusion/intel/v2-inference-v-fp32.yaml  --H 768 --W 768 --precision full --device cpu --torchscript --ipex

To sample from the base model with IPEX optimizations, use

1
MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000 python -m intel_extension_for_pytorch.cpu.launch --ninstance  --enable_jemalloc scripts/txt2img.py --prompt "a corgi is playing guitar, oil on canvas" --ckpt  --config configs/stable-diffusion/intel/v2-inference-fp32.yaml  --n_samples 1 --n_iter 4 --precision full --device cpu --torchscript --ipex

If you’re using a CPU that supports bfloat16, consider sample from the model with bfloat16 enabled for a performance boost, like so

1
2
3
4
# SD2.1-v
MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000 python -m intel_extension_for_pytorch.cpu.launch --ninstance  --enable_jemalloc scripts/txt2img.py --prompt "a corgi is playing guitar, oil on canvas" --ckpt  --config configs/stable-diffusion/intel/v2-inference-v-bf16.yaml --H 768 --W 768 --precision full --device cpu --torchscript --ipex --bf16
# SD2.1-base
MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000 python -m intel_extension_for_pytorch.cpu.launch --ninstance  --enable_jemalloc scripts/txt2img.py --prompt "a corgi is playing guitar, oil on canvas" --ckpt  --config configs/stable-diffusion/intel/v2-inference-bf16.yaml --precision full --device cpu --torchscript --ipex --bf16

Image Modification with Stable Diffusion

depth2img-stable2

Depth-Conditional Stable Diffusion

To augment the well-established img2img functionality of Stable Diffusion, we provide a shape-preserving stable diffusion model.

Note that the original method for image modification introduces significant semantic changes w.r.t. the initial image.
If that is not desired, download our depth-conditional stable diffusion model and the dpt_hybrid MiDaS model weights, place the latter in a folder midas_models and sample via

1
python scripts/gradio/depth2img.py configs/stable-diffusion/v2-midas-inference.yaml

or

1
streamlit run scripts/streamlit/depth2img.py configs/stable-diffusion/v2-midas-inference.yaml

This method can be used on the samples of the base model itself.
For example, take this sample generated by an anonymous discord user.
Using the gradio or streamlit script depth2img.py, the MiDaS model first infers a monocular depth estimate given this input,
and the diffusion model is then conditioned on the (relative) depth output.

depth2image

This model is particularly useful for a photorealistic style; see the examples.
For a maximum strength of 1.0, the model removes all pixel-based information and only relies on the text prompt and the inferred monocular depth estimate.

depth2img-stable3

Classic Img2Img

For running the “classic” img2img, use

1
python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img  --strength 0.8 --ckpt

and adapt the checkpoint and config paths accordingly.

Image Upscaling with Stable Diffusion

upscaling-x4
After downloading the weights, run

1
python scripts/gradio/superresolution.py configs/stable-diffusion/x4-upscaling.yaml

or

1
streamlit run scripts/streamlit/superresolution.py -- configs/stable-diffusion/x4-upscaling.yaml

for a Gradio or Streamlit demo of the text-guided x4 superresolution model.
This model can be used both on real inputs and on synthesized examples. For the latter, we recommend setting a higher
noise_level, e.g. noise_level=100.

Image Inpainting with Stable Diffusion

inpainting-stable2

Download the SD 2.0-inpainting checkpoint and run

1
python scripts/gradio/inpainting.py configs/stable-diffusion/v2-inpainting-inference.yaml

or

1
streamlit run scripts/streamlit/inpainting.py -- configs/stable-diffusion/v2-inpainting-inference.yaml

for a Gradio or Streamlit demo of the inpainting model.
This scripts adds invisible watermarking to the demo in the RunwayML repository, but both should work interchangeably with the checkpoints/configs.

Shout-Outs

License

The code in this repository is released under the MIT License.

The weights are available via the StabilityAI organization at Hugging Face, and released under the CreativeML Open RAIL+±M License License.

BibTeX

1
2
3
4
5
6
7
8
@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models},
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

                

                

免责声明 © 2026 - 虚宝阁

本站部分源码来源于网络,版权归属原开发者,用户仅获得使用权。依据《计算机软件保护条例》第十六条,禁止:

  • 逆向工程破解技术保护措施
  • 未经许可的分发行为
  • 去除源码中的原始版权标识

※ 本站源码仅用于学习和研究,禁止用于商业用途。如有侵权, 请及时联系我们进行处理。

侵权举报请提供: 侵权页面URL | 权属证明模板

响应时效:收到完整材料后48小时内处理

相关推荐

scira: AI 驱动搜索引擎

scira: AI 驱动搜索引擎

极简的 AI 驱动搜索引擎,支持引用来源

10827 2025-10-20
YOLOv8-目标检测界的闪电侠

YOLOv8-目标检测界的闪电侠

YOLO系列的最新版,主打“又快又准”的目标检测。能瞬间识别图片/视频里的人、车、动物、物体,在普通显卡上就能实时处理视频流,工业级场景都在用它。

47034 2025-09-13
presenton AI PPT 生成器

presenton AI PPT 生成器

一个免费的、能完全在你自己电脑上运行的 AI PPT 生成工具。和那些必须联网、依赖服务商云服务的工具不同,Presenton 的核心优势在于本地优先和开放可控。 你的数据你做主, 所有生成演示文稿的过程都在你的电脑上完成。这意味着你的内容创意、上传的文件等敏感信息,无需上传到第三方云端服务器,隐私更有保障。自由选择AI模型,它不绑定任何一家 AI 服务商。你可以灵活选择。

2412 2025-09-14
Seelen-UI: 高度可定制的 Windows 桌面美化工具

Seelen-UI: 高度可定制的 Windows 桌面美化工具

一款免费开源的 Windows 桌面增强工具,专注于高度自定义和效率提升。它采用 Rust 语言开发,结合 Tauri 框架与 Web 技术,支持窗口平铺管理、应用启动器、Dock、任务栏、动态壁纸、插件扩展等功能。

13691 2025-10-04
Whisper-OpenAI的语音魔术师

Whisper-OpenAI的语音魔术师

OpenAI开源的语音识别模型,能把语音转文字、文字转语音,还支持99种语言!关键是准确率超高,连带口音的中文、英文都能轻松识别,简直是会议记录、视频字幕的救星。

89153 2025-09-13
ConvertX-自托管的在线文件转换工具

ConvertX-自托管的在线文件转换工具

一个开源的在线文件格式转换工具,支持超过 1000 种主流文档、图片、音视频等多类型文件格式,不依赖第三方服务。它即装即用、操作便捷,并提供文件夹批量处理、实时进度条显示等功能。

8500 2025-09-13
LLaMA-Meta家的平民大模型

LLaMA-Meta家的平民大模型

这是Meta(脸书母公司)开源的大语言模型家族,从70亿参数到700亿参数应有尽有,主打一个“轻量能跑、开源免费”。普通人下载后,在消费级显卡上就能微调,不用再眼巴巴看着大厂模型流口水~

58804 2025-09-13
markdown-it

markdown-it

markdown-it是一款快速且易扩展的Markdown解析器,遵循CommonMark规范并添加语法扩展。它配置灵活、速度快且安全,还有丰富的社区插件。

20415 2025-09-09
jdenticon 独特几何头像生成器

jdenticon 独特几何头像生成器

一个用于生成独特且容易识别图像(identicons)的 JavaScript 库,可根据任意字符串(用户名、哈希值等)生成独特的几何图形,支持输出为 SVG 和 PNG 格式。

1591 2025-09-13
n8n: AI自动化工作流工具

n8n: AI自动化工作流工具

基于节点的自动化工作流工具,能帮助用户轻松创建和管理复杂的自动化流程,无需编写大量代码。并且内置了AI能力,支持 400+ 应用和服务!

146989 2025-09-26

仓库下载

gitee

GitHub 下载代理

文件信息

文件名
文件大小
文件类型
代理耗时