[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

viiccwen · 2026-01-28T17:33:27Z

Purpose of PR

Fixes a bug in launch_l2_norm_batch (f64) where attempting to process more than 65535 samples would result in an invalid CUDA kernel launch. The fix adds early validation to return an error when num_samples exceeds the CUDA 1D grid dimension limit.

Related Issues or PRs

closes #967

Changes Made

Breaking Changes

Yes
No

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes
Successfully built and ran all unit tests or manual tests locally
PR title follows "MAHOUT-XXX: Brief Description" format (if related to an issue)
Code follows ASF guidelines

…ension limit

viiccwen · 2026-01-28T17:57:13Z

cc @rich7420, @ryankert01

fix: Fix invalid CUDA kernel launch when num_samples exceeds grid dim…

76021d0

…ension limit

guan404ming added this to the Qumat 0.5.1 milestone Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

viiccwen commented Jan 28, 2026

Uh oh!

viiccwen commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

Are you sure you want to change the base?

[QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit #968

Conversation

viiccwen commented Jan 28, 2026

Purpose of PR

Related Issues or PRs

Changes Made

Breaking Changes

Checklist

Uh oh!

viiccwen commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants