Skip to content

Conversation

@RissyRan
Copy link
Collaborator

Description

Add flops calculation for DeepSeek v3.2, and this PR depends on this change

  • Add indexer flops helper function
  • Add option to combine indexer flops inside of MLA
  • Added a unit test to rough estimate tflops

Tests

  • Tested DS v2-16b, no impact
# Before the change
Per train step:
 Total TFLOPs: 593.27 
 split as 81.24% learnable weight flops and 18.76% attention flops
before the change: 593.272422531072

# After the change
Per train step:
 Total TFLOPs: 593.27 
 split as 81.24% learnable weight flops and 18.76% attention flops
after change: 593.272422531072
  • Tested DS v3.2, diff with indexer (expected)
# Enable this feature
Per train step:
 Total TFLOPs: 4288.47 
 split as 85.91% learnable weight flops and 14.09% attention flops
enable use_sparse_indexer: 4288.4690104811525

# Disable this feature like v3
Per train step:
 Total TFLOPs: 4103.37 
 split as 87.74% learnable weight flops and 12.26% attention flops
disable use_sparse_indexer: 4103.370952409088

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 9.09091% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/MaxText/maxtext_utils.py 9.09% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants