Weili Nie • Julius Berner • Chao Liu • Arash Vahdat
FastGen is a PyTorch-based framework for building fast generative models using various distillation and acceleration techniques. It supports:
- large-scale training with ≥10B parameters.
- different tasks and modalities, including T2I, I2V, and V2V.
- various distillation methods, including consistency models, distribution matching distillation, self-forcing, and more.
fastgen/
├── fastgen/
│ ├── callbacks/ # Training callbacks (EMA, profiling, etc.)
│ ├── configs/ # Configuration system
│ │ ├── experiments/ # Experiment configs
│ │ └── methods/ # Method-specific configs
│ ├── datasets/ # Dataset loaders
│ ├── methods/ # Training methods (CM, DMD2, SFT, KD etc.)
│ ├── networks/ # Neural network architectures
│ ├── third_party/ # Third-party dependencies
│ ├── trainer.py # Main training loop
│ └── utils/ # Utilities (distributed, checkpointing)
├── scripts/ # Inference and evaluation scripts
├── tests/ # Unit tests
├── Makefile # Development commands (lint, format, test)
└── train.py # Main training entry point
Recommended: Use the provided Docker container for a consistent environment. See CONTRIBUTING.md for Docker setup instructions. Otherwise, create a new conda environment with conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen.
git clone https://gitlab-master.nvidia.com/genair/fastgen.git
cd fastgen
pip install -e .For W&B logging, get your API key and save it to credentials/wandb_api.txt or set the WANDB_API_KEY environment variable.
Without either of these, W&B will prompt for your API key interactively.
For more details, including S3 storage and other environment variables, see fastgen/configs/README.md.
Before running the following commands, download the CIFAR-10 dataset and pretrained EDM models:
python scripts/download_data.py --dataset cifar10For other datasets and models, see fastgen/networks/README.md and fastgen/datasets/README.md.
python train.py --config=fastgen/configs/experiments/EDM/config_dmd2_test.pyIf you run out-of-memory, try a smaller batch-size, e.g., dataloader_train.batch_size=32, which automatically uses gradient accumulation to match the global batch-size.
Expected Output: See the training log for a link to the run on wandb.ai. Training outputs go to $FASTGEN_OUTPUT_ROOT/{project}/{group}/{name}/. With default settings, outputs are organized as follows:
FASTGEN_OUTPUT/fastgen/cifar10/debug/
├── checkpoints/ # Model checkpoints in the format {iteration:07d}.pth
│ ├── 0001000.pth
│ └── ...
├── config.yaml # Resolved configuration for reproducibility
├── wandb_id.txt # W&B run ID for resuming
└── ...
For multi-GPU training, use DDP:
torchrun --nproc_per_node=8 train.py \
--config=fastgen/configs/experiments/EDM/config_dmd2_test.py - \
trainer.ddp=True log_config.name=test_ddpFor large models, use FSDP2 for model sharding by replacing trainer.ddp=True with trainer.fsdp=True.
python scripts/inference/image_model_inference.py --config fastgen/configs/experiments/EDM/config_dmd2_test.py \
--classes=10 --prompt_file=scripts/inference/prompts/classes.txt --ckpt=FASTGEN_OUTPUT/fastgen/cifar10/debug/checkpoints/0002000.pth - log_config.name=test_inferenceFor other inferences modes and FID evaluations, see scripts/README.md.
Override any config parameter using Hydra-style syntax (note the - separator):
python train.py --config=path/to/config.py - key=value nested.key=valueDetailed documentation is available in each component's README:
| Component | Documentation | Description |
|---|---|---|
| Methods | fastgen/methods/README.md | Training methods (sCM, MeanFlow, DMD2, Self-Forcing, etc.) |
| Networks | fastgen/networks/README.md | Network architectures (EDM, SD, SDXL, Flux, WAN, CogVideoX, Cosmos) and pretrained models |
| Configs | fastgen/configs/README.md | Configuration system, environment variables, and creating custom configs |
| Datasets | fastgen/datasets/README.md | Dataset preparation and WebDataset loaders |
| Callbacks | fastgen/callbacks/README.md | Training callbacks (EMA, logging, gradient clipping, etc.) |
| Inference | scripts/README.md | Inference modes (T2I, T2V, I2V, V2V, etc.) and FID evaluation |
| Third Party | fastgen/third_party/README.md | Third-party dependencies (Depth Anything V2, etc.) |
| Category | Methods |
|---|---|
| Consistency Models | CM, sCM, TCM, MeanFlow |
| Distribution Matching | DMD2, f-Distill, LADD, CausVid, Self-Forcing |
| Fine-Tuning | SFT, CausalSFT |
| Knowledge Distillation | KD, CausalKD |
See fastgen/methods/README.md for details.
FastGen is designed to be agnostic to the network and data and you can add your own architectures and datasets (see fastgen/networks/README.md and fastgen/datasets/README.md). For reference, we provide the following implementations:
| Data | Networks |
|---|---|
| Image | EDM, EDM2, DiT, SD 1.5, SDXL, Flux |
| Video | WAN (T2V, I2V, VACE), CogVideoX, Cosmos Predict2 |
See fastgen/networks/README.md for details. Not all combinations of methods and networks are currently supported. We provide typical use-cases in our predefined configs in fastgen/configs/experiments.
We plan to provide distilled student checkpoints for CIFAR-10 and ImageNet soon.
We welcome contributions! Please see CONTRIBUTING.md for details.
We thank everyone who has helped design, build, and test FastGen!
- Core contributors: Weili Nie, Julius Berner, Chao Liu
- Other contributors: James Lucas, David Pankratz, Sihyun Yu, Willis Ma, Yilun Xu, Shengqu Cai, Xinyin Ma, Yanke Song
- Collaborators: Sophia Zalewski, Wei Xiong, Christian Laforte, Sajad Norouzi, Kaiwen Zheng, Miloš Hašan, Saeed Hadadan, Gene Liu, David Dynerman, Grace Lam, Pooya Jannaty, Jan Kautz, and many more.
- Project lead: Arash Vahdat
This project is licensed under the Apache License 2.0 - see LICENSE for details. Third-party licenses are documented in licenses/README.md.
@article{fastgen2026,
title={NVIDIA FastGen: Fast Generation from Diffusion Models},
author={Nie, Weili and Berner, Julius and Liu, Chao and Vahdat, Arash},
url={https://github.com/NVlabs/FastGen},
year={2026},
}
