Fix AutoTuner tactic timing (%globaltimer) for Confidential Computing(CC) by elvischenv · Pull Request #3638 · flashinfer-ai/flashinfer

elvischenv · 2026-06-15T07:29:12Z

Targets release-v0.6.11 (the v0.6.11.post1 line).

Under Confidential Computing, cudaEventElapsedTime is unreliable on the bounce-buffer path (can return negative values), so AutoTuner.choose_one's min(measured_time) ranking picks a near-random tactic per rank and bakes it into the tuning cache. Time the candidate run with the GPU %globaltimer register (tiny JIT stamp kernel) instead — same return signature, so choose_one and the cache format are unchanged.

Controlled by FLASHINFER_AUTOTUNE_TIMER (auto|globaltimer|cudaevent); auto uses %globaltimer only when CC is detected (NVML), so off-CC behavior is unchanged. FLASHINFER_CONFIDENTIAL_COMPUTE=1/0 overrides detection.

See CC_AUTOTUNER_FIX.md.

Under Confidential Computing, cudaEventElapsedTime is unreliable (can return negative values on the bounce-buffer path), so AutoTuner.choose_one's min(measured_time) ranking picks a near-random tactic per rank and bakes it into the tuning cache. Time the candidate run with the GPU %globaltimer register (tiny JIT stamp kernel) instead; same return signature, so choose_one and the cache format are unchanged. Controlled by FLASHINFER_AUTOTUNE_TIMER (auto|globaltimer|cudaevent); auto uses %globaltimer only when CC is detected (NVML), so off-CC is unchanged. FLASHINFER_CONFIDENTIAL_COMPUTE=1/0 overrides detection. Mirrors TensorRT-LLM PR #11657. See CC_AUTOTUNER_FIX.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-15T07:29:20Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b6e46fa-2158-4f81-99db-b22199371554

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a Confidential Computing (CC) safe autotuner timing mechanism using the GPU's %globaltimer register to replace the unreliable cudaEventElapsedTime under CC environments. It includes CC detection via NVML, a JIT-compiled stamp kernel, and configuration controls. The feedback suggests optimizing the timing retrieval in pure_profile by copying the CUDA tensor to the CPU in a single transfer (ts.cpu().tolist()) instead of calling .item() twice, which reduces host-device communication overhead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-15T07:30:18Z

+                        _run_kernels()
+                    gt_stamp(ts[1:2])
+                    stream.synchronize()
+                    return (ts[1].item() - ts[0].item()) / 1e6 / repeat


Calling .item() twice on a CUDA tensor triggers two separate synchronous device-to-host copies. Since stream.synchronize() has already been called, we can copy the entire tensor to the CPU in a single transfer and unpack it using .tolist(). This reduces host-device communication overhead during profiling.

Suggested change

return (ts[1].item() - ts[0].item()) / 1e6 / repeat

t0, t1 = ts.cpu().tolist()

return (t1 - t0) / 1e6 / repeat

nvpohanh · 2026-06-15T12:02:39Z

@elvischenv the description looks outdated. could you update it?

gemini-code-assist Bot reviewed Jun 15, 2026

View reviewed changes

elvischenv changed the title ~~Confidential Computing: CC-safe AutoTuner tactic timing (%globaltimer)~~ Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix AutoTuner tactic timing (%globaltimer) for Confidential Computing(CC)#3638

Fix AutoTuner tactic timing (%globaltimer) for Confidential Computing(CC)#3638
elvischenv wants to merge 1 commit into
flashinfer-ai:release-v0.6.11from
elvischenv:cc-autotuner-fixed

elvischenv commented Jun 15, 2026

coderabbitai Bot commented Jun 15, 2026

Review skipped

gemini-code-assist Bot left a comment

gemini-code-assist Bot Jun 15, 2026

nvpohanh commented Jun 15, 2026

Labels

3 participants

	return (ts[1].item() - ts[0].item()) / 1e6 / repeat
	t0, t1 = ts.cpu().tolist()
	return (t1 - t0) / 1e6 / repeat

Uh oh!

Conversation

elvischenv commented Jun 15, 2026

coderabbitai Bot commented Jun 15, 2026

Review skipped

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist Bot Jun 15, 2026

Choose a reason for hiding this comment

nvpohanh commented Jun 15, 2026

Labels

3 participants