feat: update knowledge distillation tutorial for using vllm with Qwen model #2960

RexBearIU · 2026-01-16T10:50:01Z

Description

This pull request significantly updates and modernizes the knowledge distillation tutorial for MaxText, aligning it with current best practices and tooling. The guide now uses Qwen3-32B as the teacher model (via vLLM) and Llama-3.1-8B as the student, streamlines the setup with Hyperdisk storage, and provides new scripts and commands for dataset generation and fine-tuning. The instructions have been clarified, unnecessary conversion steps removed for the teacher, and the fine-tuning process updated for the latest MaxText and vLLM workflows.

Tests

Manually triggered the distillation pipeline and monitored the execution flow step-by-step. Confirmed that the training loop finished and resources were released.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

SurbhiJainUSC · 2026-01-20T18:25:11Z

docs/tutorials/posttraining/knowledge_distillation.md

-export USERNAME_OR_ORG = <Owner of Hugging Face repository>
-export RUN_NAME = <unique name for the run>
+export HF_TOKEN=<your-hf-token> # e.g., hf_BA6...
+export BASE_DIRECTORY=<your-base-directory>  # e.g., knowledge-distillation


nit: # e.g., gs://

Thanks. We use a mounted Hyperdisk because writing large model files and many small I/O ops directly to gs:// is often much slower. The tutorial writes to /mnt/hyperdisk for performance and reproducibility, and I fixed the duplicated env export in the doc.

SurbhiJainUSC · 2026-01-20T18:26:07Z

docs/tutorials/posttraining/knowledge_distillation.md

-uv pip install -r dependencies/requirements/requirements.txt
+To install MaxText and its dependencies for post-training (including vLLM for the teacher), run the following:
+
+1. Follow the [MaxText installation instructions](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#install-maxtext):


nit: maxtext).

SurbhiJainUSC · 2026-01-20T18:28:51Z

docs/tutorials/posttraining/knowledge_distillation.md

+
+We will use vLLM to generate the dataset from the teacher model.
+
+Create a python script named `generate_distillation_data_vllm.py` with the following content (this script writes a Parquet dataset compatible with MaxText SFT):


Can we make this script available in MaxText like this: https://github.com/AI-Hypercomputer/maxtext/blob/main/tools/data_generation/generate_distillation_data.py

Agreed. I’ve already created the script.

codecov · 2026-01-21T08:18:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

… model

RexBearIU force-pushed the jackyf/docs/distillation branch from 14091ae to 2fb1059 Compare January 19, 2026 14:24

RexBearIU marked this pull request as ready for review January 19, 2026 14:29

RexBearIU requested review from A9isha, RissyRan, bvandermoon, gagika, gobbleturk, jacoguzo, jiangjy1982, richjames0, shralex and vipannalla as code owners January 19, 2026 14:29

SurbhiJainUSC reviewed Jan 20, 2026

View reviewed changes

RexBearIU force-pushed the jackyf/docs/distillation branch from 2fb1059 to 8005986 Compare January 21, 2026 08:05

RexBearIU requested review from NicoGrande, NuojCheng, aireenmei, hengtaoguo, jesselu-google, khatwanimohit and suexu1025 as code owners January 21, 2026 08:05

RexBearIU force-pushed the jackyf/docs/distillation branch from 8005986 to 84aa2ed Compare January 21, 2026 09:14

feat: update knowledge distillation tutorial for using vllm with Qwen…

eb215d2

… model

RexBearIU force-pushed the jackyf/docs/distillation branch from 84aa2ed to eb215d2 Compare January 21, 2026 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: update knowledge distillation tutorial for using vllm with Qwen model #2960

feat: update knowledge distillation tutorial for using vllm with Qwen model #2960

RexBearIU commented Jan 16, 2026 •

edited

Loading

Uh oh!

SurbhiJainUSC Jan 20, 2026

Uh oh!

RexBearIU Jan 21, 2026

Uh oh!

SurbhiJainUSC Jan 20, 2026

Uh oh!

SurbhiJainUSC Jan 20, 2026

Uh oh!

RexBearIU Jan 21, 2026

Uh oh!

codecov bot commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		We will use vLLM to generate the dataset from the teacher model.

		Create a python script named `generate_distillation_data_vllm.py` with the following content (this script writes a Parquet dataset compatible with MaxText SFT):

feat: update knowledge distillation tutorial for using vllm with Qwen model #2960

Are you sure you want to change the base?

feat: update knowledge distillation tutorial for using vllm with Qwen model #2960

Conversation

RexBearIU commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

SurbhiJainUSC Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

RexBearIU Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

SurbhiJainUSC Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

RexBearIU Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 21, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RexBearIU commented Jan 16, 2026 •

edited

Loading