Skip to content

Conversation

@p1k0pan
Copy link

@p1k0pan p1k0pan commented Jan 25, 2026

After training Qwen3-VL-8B with Megatron, it was unable to convert torch_dist to hf. Adding convert code.
Tested on Qwen3-VL-8B, not sure whether suitable for Qwen3-VL moe model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant