NVIDIA FastGen: Fast Generation from Diffusion Models

Weili Nie • Julius Berner • Chao Liu • Arash Vahdat

FastGen is a PyTorch-based framework for building fast generative models using various distillation and acceleration techniques. It supports:

large-scale training with ≥10B parameters.
different tasks and modalities, including T2I, I2V, and V2V.
various distillation methods, including consistency models, distribution matching distillation, self-forcing, and more.

Repository Structure

fastgen/
├── fastgen/
│   ├── callbacks/           # Training callbacks (EMA, profiling, etc.)
│   ├── configs/             # Configuration system
│   │   ├── experiments/     # Experiment configs
│   │   └── methods/         # Method-specific configs
│   ├── datasets/            # Dataset loaders
│   ├── methods/             # Training methods (CM, DMD2, SFT, KD etc.)
│   ├── networks/            # Neural network architectures
│   ├── third_party/         # Third-party dependencies
│   ├── trainer.py           # Main training loop
│   └── utils/               # Utilities (distributed, checkpointing)
├── scripts/                 # Inference and evaluation scripts
├── tests/                   # Unit tests
├── Makefile                 # Development commands (lint, format, test)
└── train.py                 # Main training entry point

Setup

Recommended: Use the provided Docker container for a consistent environment. See CONTRIBUTING.md for Docker setup instructions. Otherwise, create a new conda environment with conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen.

Installation

git clone https://gitlab-master.nvidia.com/genair/fastgen.git
cd fastgen
pip install -e .

Credentials (Optional)

For W&B logging, get your API key and save it to credentials/wandb_api.txt or set the WANDB_API_KEY environment variable. Without either of these, W&B will prompt for your API key interactively. For more details, including S3 storage and other environment variables, see fastgen/configs/README.md.

Quick Start

Before running the following commands, download the CIFAR-10 dataset and pretrained EDM models:

python scripts/download_data.py --dataset cifar10

For other datasets and models, see fastgen/networks/README.md and fastgen/datasets/README.md.

Basic Training

python train.py --config=fastgen/configs/experiments/EDM/config_dmd2_test.py

If you run out-of-memory, try a smaller batch-size, e.g., dataloader_train.batch_size=32, which automatically uses gradient accumulation to match the global batch-size.

Expected Output: See the training log for a link to the run on wandb.ai. Training outputs go to $FASTGEN_OUTPUT_ROOT/{project}/{group}/{name}/. With default settings, outputs are organized as follows:

FASTGEN_OUTPUT/fastgen/cifar10/debug/
├── checkpoints/    # Model checkpoints in the format {iteration:07d}.pth
│   ├── 0001000.pth
│   └── ...
├── config.yaml     # Resolved configuration for reproducibility
├── wandb_id.txt    # W&B run ID for resuming
└── ...

DDP/FSDP2 Training

For multi-GPU training, use DDP:

torchrun --nproc_per_node=8 train.py \
    --config=fastgen/configs/experiments/EDM/config_dmd2_test.py - \
    trainer.ddp=True log_config.name=test_ddp

For large models, use FSDP2 for model sharding by replacing trainer.ddp=True with trainer.fsdp=True.

Inference

python scripts/inference/image_model_inference.py --config fastgen/configs/experiments/EDM/config_dmd2_test.py \
  --classes=10 --prompt_file=scripts/inference/prompts/classes.txt --ckpt=FASTGEN_OUTPUT/fastgen/cifar10/debug/checkpoints/0002000.pth - log_config.name=test_inference

For other inferences modes and FID evaluations, see scripts/README.md.

Command-Line Overrides

Override any config parameter using Hydra-style syntax (note the - separator):

python train.py --config=path/to/config.py - key=value nested.key=value

Documentation

Detailed documentation is available in each component's README:

Component	Documentation	Description
Methods	fastgen/methods/README.md	Training methods (sCM, MeanFlow, DMD2, Self-Forcing, etc.)
Networks	fastgen/networks/README.md	Network architectures (EDM, SD, SDXL, Flux, WAN, CogVideoX, Cosmos) and pretrained models
Configs	fastgen/configs/README.md	Configuration system, environment variables, and creating custom configs
Datasets	fastgen/datasets/README.md	Dataset preparation and WebDataset loaders
Callbacks	fastgen/callbacks/README.md	Training callbacks (EMA, logging, gradient clipping, etc.)
Inference	scripts/README.md	Inference modes (T2I, T2V, I2V, V2V, etc.) and FID evaluation
Third Party	fastgen/third_party/README.md	Third-party dependencies (Depth Anything V2, etc.)

Supported Methods

Category	Methods
Consistency Models	CM, sCM, TCM, MeanFlow
Distribution Matching	DMD2, f-Distill, LADD, CausVid, Self-Forcing
Fine-Tuning	SFT, CausalSFT
Knowledge Distillation	KD, CausalKD

See fastgen/methods/README.md for details.

Supported Networks and Data

FastGen is designed to be agnostic to the network and data and you can add your own architectures and datasets (see fastgen/networks/README.md and fastgen/datasets/README.md). For reference, we provide the following implementations:

Data	Networks
Image	EDM, EDM2, DiT, SD 1.5, SDXL, Flux
Video	WAN (T2V, I2V, VACE), CogVideoX, Cosmos Predict2

See fastgen/networks/README.md for details. Not all combinations of methods and networks are currently supported. We provide typical use-cases in our predefined configs in fastgen/configs/experiments.

We plan to provide distilled student checkpoints for CIFAR-10 and ImageNet soon.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

We thank everyone who has helped design, build, and test FastGen!

Core contributors: Weili Nie, Julius Berner, Chao Liu
Other contributors: James Lucas, David Pankratz, Sihyun Yu, Willis Ma, Yilun Xu, Shengqu Cai, Xinyin Ma, Yanke Song
Collaborators: Sophia Zalewski, Wei Xiong, Christian Laforte, Sajad Norouzi, Kaiwen Zheng, Miloš Hašan, Saeed Hadadan, Gene Liu, David Dynerman, Grace Lam, Pooya Jannaty, Jan Kautz, and many more.
Project lead: Arash Vahdat

License

This project is licensed under the Apache License 2.0 - see LICENSE for details. Third-party licenses are documented in licenses/README.md.

Reference

@article{fastgen2026,
  title={NVIDIA FastGen: Fast Generation from Diffusion Models},
  author={Nie, Weili and Berner, Julius and Liu, Chao and Vahdat, Arash},
  url={https://github.com/NVlabs/FastGen},
  year={2026},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA FastGen: Fast Generation from Diffusion Models

Repository Structure

Setup

Installation

Credentials (Optional)

Quick Start

Basic Training

DDP/FSDP2 Training

Inference

Command-Line Overrides

Documentation

Supported Methods

Supported Networks and Data

Contributing

License

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
fastgen		fastgen
licenses		licenses
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

License

juliusberner/FastGen

Folders and files

Latest commit

History

Repository files navigation

NVIDIA FastGen: Fast Generation from Diffusion Models

Repository Structure

Setup

Installation

Credentials (Optional)

Quick Start

Basic Training

DDP/FSDP2 Training

Inference

Command-Line Overrides

Documentation

Supported Methods

Supported Networks and Data

Contributing

License

Reference

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages