Skip to content

Conversation

@viiccwen
Copy link
Contributor

Purpose of PR

Fixes a bug in launch_l2_norm_batch (f64) where attempting to process more than 65535 samples would result in an invalid CUDA kernel launch. The fix adds early validation to return an error when num_samples exceeds the CUDA 1D grid dimension limit.

Related Issues or PRs

closes #967

Changes Made

  • Bug fix
  • New feature
  • Refactoring
  • Documentation
  • Test
  • CI/CD pipeline
  • Other

Breaking Changes

  • Yes
  • No

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes
  • Successfully built and ran all unit tests or manual tests locally
  • PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
  • Code follows ASF guidelines

@viiccwen
Copy link
Contributor Author

cc @rich7420, @ryankert01

@guan404ming guan404ming added this to the Qumat 0.5.1 milestone Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit

2 participants