Open-Sora

Open-Sora/docs/README_zh.md at main · hpcaitech/Open-Sora

Global navigation

About Blog Terms Privacy Security Status

Navigate back to

hpcaitech /
Open-Sora

Command palette

Search syntax tips

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new...

Issues Pull requests

Additional navigation options

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

/ ... /

hpcaitech / Open-Sora /

Clear Command Palette

Tip: Type # to search pull requests

Type ? for help and tips

Tip: Type # to search issues

Type ? for help and tips

Tip: Type # to search discussions

Type ? for help and tips

Tip: Type ! to search projects

Type ? for help and tips

Tip: Type @ to search teams

Type ? for help and tips

Tip: Type @ to search people and organizations

Type ? for help and tips

Tip: Type > to activate command mode

Type ? for help and tips

Tip: Go to your accessibility settings to change your keyboard shortcuts

Type ? for help and tips

Tip: Type author:@me to search your content

Type ? for help and tips

Tip: Type is:pr to filter to pull requests

Type ? for help and tips

Tip: Type is:issue to filter to issues

Type ? for help and tips

Tip: Type is:project to filter to projects

Type ? for help and tips

Tip: Type is:open to filter to open content

Type ? for help and tips

We’ve encountered an error and some results aren't available at this time. Type a new search or try again later.

No results matched your search

Open-Sora：完全开源的高效复现类Sora视频生成方案

Open-Sora项目是一项致力于高效制作高质量视频，并使所有人都能使用其模型、工具和内容的计划。通过采用开源原则，Open-Sora 不仅实现了先进视频生成技术的低成本普及，还提供了一个精简且用户友好的方案，简化了视频制作的复杂性。通过 Open-Sora，我们希望更多开发者一起探索内容创作领域的创新、创造和包容。 [English]

Open-Sora 项目目前处在早期阶段，并将持续更新。

📰 资讯

[2024.03.18] 🔥 我们发布了Open-Sora 1.0，这是一个完全开源的视频生成项目。
Open-Sora 1.0 支持视频数据预处理、加速训练、推理等全套流程。
我们提供的模型权重只需 3 天的训练就能生成 2 秒的 512x512 视频。
[2024.03.04] Open-Sora：开源Sora复现方案，成本降低46%，序列扩充至近百万

🎥 最新视频

2s 512×512	2s 512×512	2s 512×512

A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop.	A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff.	The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall.

A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...]	The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...]	A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...]

视频经过降采样处理为.gif格式，以便显示。点击查看原始视频。为便于显示，文字经过修剪，全文请参见此处。在我们的图片库中查看更多样本。

🔆 新功能

📍Open-Sora-v1 已发布。这里提供了模型权重。只需 400K 视频片段和在单卡 H800 上训200天（类比Stable Video Diffusion 的 152M 样本），我们就能生成 2 秒的 512×512 视频。
✅ 从图像扩散模型到视频扩散模型的三阶段训练。我们提供每个阶段的权重。
✅ 支持训练加速，包括Transformer加速、更快的 T5 和 VAE 以及序列并行。在对 64x512x512 视频进行训练时，Open-Sora 可将训练速度提高55%。详细信息请参见训练加速。
✅ 我们提供用于数据预处理的视频切割和字幕工具。有关说明请点击此处，我们的数据收集计划请点击数据集。
✅ 我们发现来自VideoGPT的 VQ-VAE 质量较低，因此采用了来自Stability-AI 的高质量 VAE。我们还发现使用添加了时间维度的采样会导致生成质量降低。更多讨论，请参阅我们的报告。
✅ 我们研究了不同的架构，包括 DiT、Latte 和我们提出的 STDiT。我们的STDiT在质量和速度之间实现了更好的权衡。更多讨论，请参阅我们的报告。
✅ 支持剪辑和 T5 文本调节。
✅ 通过将图像视为单帧视频，我们的项目支持在图像和视频（如 ImageNet 和 UCF101）上训练 DiT。更多说明请参见指令解析。
✅ 利用DiT、Latte 和 PixArt 的官方权重支持推理。

✅ 重构代码库。请参阅结构，了解项目结构以及如何使用配置文件。

下一步计划【按优先级排序】

完成数据处理流程（包括密集光流、美学评分、文本图像相似性、重复数据删除等）。更多信息请参见数据集。[项目进行中］
训练视频-VAE。 [项目进行中]

支持图像和视频调节。
评估流程。
加入更好的调度程序，如 SD3 中的rectified flow程序。
支持可变长宽比、分辨率和持续时间。
发布后支持 SD3。

安装

# create a virtual env
conda create -n opensora python=3.10

# install torch
# the command below is for CUDA 12.1, choose install commands from 
# https://pytorch.org/get-started/locally/ based on your own CUDA version
pip3 install torch torchvision

# install flash attention (optional)
pip install packaging ninja
pip install flash-attn --no-build-isolation

# install apex (optional)
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

# install xformers
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

# install this project
git clone https://github.com/hpcaitech/Open-Sora
cd Open-Sora
pip install -v .

安装完成后，建议阅读结构，了解项目结构以及如何使用配置文件。

模型权重

分辨率	数据	迭代次数	批量大小	GPU 天数 (H800)	网址
16×256×256	366K	80k	8×64	117	🔗
16×256×256	20K HQ	24k	8×64	45	🔗
16×512×512	20K HQ	20k	2×64	35	🔗

我们模型的权重部分由PixArt-α 初始化。参数数量为 724M。有关训练的更多信息，请参阅我们的报告。有关数据集的更多信息，请参阅数据。HQ 表示高质量。 :warning: 局限性：我们的模型是在有限的预算内训练出来的。质量和文本对齐度相对较差。特别是在生成人类时，模型表现很差，无法遵循详细的指令。我们正在努力改进质量和文本对齐。

推理

要使用我们提供的权重进行推理，首先要将T5权重下载到pretrained_models/t5_ckpts/t5-v1_1-xxl 中。然后下载模型权重。运行以下命令生成样本。请参阅此处自定义配置。

# Sample 16x256x256 (5s/sample)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

# Sample 16x512x512 (20s/sample, 100 time steps)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth

# Sample 64x512x512 (40s/sample, 100 time steps)
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth

# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)
# sequence parallelism is enabled automatically when nproc_per_node is larger than 1
torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth

我们在 H800 GPU 上进行了速度测试。如需使用其他模型进行推理，请参阅此处获取更多说明。

数据处理

高质量数据是高质量模型的关键。这里有我们使用过的数据集和数据收集计划。我们提供处理视频数据的工具。目前，我们的数据处理流程包括以下步骤：

下载数据集。[文件]
将视频分割成片段。 [文件]
生成视频字幕。 [文件]

训练

要启动训练，首先要将T5权重下载到pretrained_models/t5_ckpts/t5-v1_1-xxl 中。然后运行以下命令在单个节点上启动训练。

# 1 GPU, 16x256x256
torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x512.py --data-path YOUR_CSV_PATH
# 8 GPUs, 64x512x512
torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT

要在多个节点上启动训练，请根据ColossalAI 准备一个主机文件，并运行以下命令。

colossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT

有关其他模型的训练和高级使用方法，请参阅此处获取更多说明。

贡献

如果您希望为该项目做出贡献，可以参考贡献指南.

声明

DiT: Scalable Diffusion Models with Transformers.
OpenDiT: An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.
PixArt: An open-source DiT-based text-to-image model.
Latte: An attempt to efficiently train DiT for video.
StabilityAI VAE: A powerful image VAE model.
CLIP: A powerful text-image embedding model.
T5: A powerful text encoder.
LLaVA: A powerful image captioning model based on Yi-34B.

我们对他们的出色工作和对开源的慷慨贡献表示感谢。

引用

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Yang You},
  title = {Open-Sora: Democratizing Efficient Video Production for All},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.

Star 走势

You can’t perform that action at this time.

posted @ 2024-03-20 21:23 freedragon 阅读(59) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

dragon

Open-Sora

Navigate back to

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_zh.md

README_zh.md

Open-Sora：完全开源的高效复现类Sora视频生成方案

Open-Sora 项目目前处在早期阶段，并将持续更新。

📰 资讯

🎥 最新视频

🔆 新功能

下一步计划【按优先级排序】

目录

安装

模型权重

推理

数据处理

训练

贡献

声明

引用

Star 走势

公告

dragon

Open-Sora

Global navigation

Navigate back to

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

Breadcrumbs

README_zh.md

Latest commit

History

Breadcrumbs

README_zh.md

File metadata and controls

Open-Sora： 完全开源的高效复现类Sora视频生成方案

Open-Sora 项目目前处在早期阶段，并将持续更新。

📰 资讯

🎥 最新视频

🔆 新功能

下一步计划【按优先级排序】

目录

安装

模型权重

推理

数据处理

训练

贡献

声明

引用

Star 走势

Footer

Footer navigation

公告

Open-Sora：完全开源的高效复现类Sora视频生成方案