feat: experimental two-phase (head-chunked) Ulysses all-to-all by csgoogle · Pull Request #428 · AI-Hypercomputer/maxdiffusion

csgoogle · 2026-06-24T13:56:30Z

Add an opt-in ULYSSES_ATTENTION_CHUNKS env var to split the Ulysses all-to-all into per-head-group passes, so XLA's async-collective scheduler can overlap one group's attention compute with the next group's all-to-all. Defaults to 1 (current single-shot path, no behavior change). Numerically identical to single-shot since heads are independent.

Notes:

Requires async-collective LIBTPU flags to actually overlap.
Gain is largest when all-to-all is a meaningful fraction of attention time (high context-parallelism / shorter sequences); at WAN 2.2 720p (seq~75600) it is compute-bound so the win is small (~3% in microbench), but for seqlen ~24k we observe ~10% gains

github-actions · 2026-06-24T13:56:43Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

google-cla · 2026-06-24T13:56:46Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Perseus14 · 2026-06-24T20:13:14Z

+  # math is identical to the single-shot path (heads are independent); requires
+  # async-collective LIBTPU flags to actually overlap, and the per-chunk head
+  # count must still be shardable across the context axis.
+  num_chunks = int(os.environ.get("ULYSSES_ATTENTION_CHUNKS", "1"))


Let's move this to config file to be used for any ulysses type kernel

Perseus14 · 2026-06-24T20:14:24Z

        f"got heads={num_heads} and context_shards={num_shards}."
    )
+
+  # EXPERIMENTAL: split the all-to-all into `num_chunks` head-groups so XLA's


Does this work on ulysses + ring as well?

yes, updated the code.

Add a ulysses_attention_chunks attention config to split the Ulysses all-to-all into head-group passes. The chunked path lets XLA overlap all-to-all collectives with head-parallel local attention compute while preserving the existing single-shot path by default. Apply the same chunking to plain Ulysses and Ulysses+Ring, and allow the final chunk to carry the remainder when the requested chunk count does not divide the Ulysses head groups evenly. Add mocked attention tests for numerical and layout equivalence across chunk counts.

csgoogle force-pushed the sagarchapara/ulysses-two-phase branch from 7240f50 to 0d936f8 Compare June 24, 2026 13:59

csgoogle requested a review from Perseus14 June 24, 2026 14:05

Perseus14 reviewed Jun 24, 2026

View reviewed changes

Perseus14 requested a review from eltsai June 24, 2026 20:14

csgoogle force-pushed the sagarchapara/ulysses-two-phase branch from 0d936f8 to 14eb661 Compare June 30, 2026 19:56

csgoogle force-pushed the sagarchapara/ulysses-two-phase branch from 14eb661 to e09195e Compare June 30, 2026 20:02

csgoogle marked this pull request as ready for review July 1, 2026 05:52

csgoogle requested a review from entrpn as a code owner July 1, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: experimental two-phase (head-chunked) Ulysses all-to-all#428

feat: experimental two-phase (head-chunked) Ulysses all-to-all#428
csgoogle wants to merge 1 commit into
mainfrom
sagarchapara/ulysses-two-phase

csgoogle commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026

google-cla Bot commented Jun 24, 2026

Perseus14 Jun 24, 2026

csgoogle Jun 30, 2026

Perseus14 Jun 24, 2026

csgoogle Jun 30, 2026

Labels

2 participants

Uh oh!

Conversation

csgoogle commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions Bot commented Jun 24, 2026

google-cla Bot commented Jun 24, 2026

Perseus14 Jun 24, 2026

Choose a reason for hiding this comment

csgoogle Jun 30, 2026

Choose a reason for hiding this comment

Perseus14 Jun 24, 2026

Choose a reason for hiding this comment

csgoogle Jun 30, 2026

Choose a reason for hiding this comment

Labels

2 participants

csgoogle commented Jun 24, 2026 •

edited

Loading