Skip to content

Conversation

@RexBearIU
Copy link
Collaborator

@RexBearIU RexBearIU commented Jan 16, 2026

Description

This pull request significantly updates and modernizes the knowledge distillation tutorial for MaxText, aligning it with current best practices and tooling. The guide now uses Qwen3-32B as the teacher model (via vLLM) and Llama-3.1-8B as the student, streamlines the setup with Hyperdisk storage, and provides new scripts and commands for dataset generation and fine-tuning. The instructions have been clarified, unnecessary conversion steps removed for the teacher, and the fine-tuning process updated for the latest MaxText and vLLM workflows.

Tests

Manually triggered the distillation pipeline and monitored the execution flow step-by-step. Confirmed that the training loop finished and resources were released.
image

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

export USERNAME_OR_ORG = <Owner of Hugging Face repository>
export RUN_NAME = <unique name for the run>
export HF_TOKEN=<your-hf-token> # e.g., hf_BA6...
export BASE_DIRECTORY=<your-base-directory> # e.g., knowledge-distillation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: # e.g., gs://

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We use a mounted Hyperdisk because writing large model files and many small I/O ops directly to gs:// is often much slower. The tutorial writes to /mnt/hyperdisk for performance and reproducibility, and I fixed the duplicated env export in the doc.

uv pip install -r dependencies/requirements/requirements.txt
To install MaxText and its dependencies for post-training (including vLLM for the teacher), run the following:

1. Follow the [MaxText installation instructions](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#install-maxtext):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maxtext).


We will use vLLM to generate the dataset from the teacher model.

Create a python script named `generate_distillation_data_vllm.py` with the following content (this script writes a Parquet dataset compatible with MaxText SFT):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I’ve already created the script.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@RexBearIU RexBearIU force-pushed the jackyf/docs/distillation branch from 8005986 to 84aa2ed Compare January 21, 2026 09:14
@RexBearIU RexBearIU force-pushed the jackyf/docs/distillation branch from 84aa2ed to eb215d2 Compare January 21, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants