fix(qwen-tts): install flash-attn on cuda13 images (#9293) by localai-bot · Pull Request #10293 · mudler/LocalAI

localai-bot · 2026-06-12T22:24:06Z

Re: #9293 (the qwen-tts backend part)

Problem

On CUDA-13 the Qwen TTS backend logs flash-attn warnings and falls back to SDPA. The cuda12 image installs flash-attn via requirements-cublas12-after.txt, but there was no requirements-cublas13-after.txt, so cuda13-qwen-tts never installed flash_attn.

Fix

Add backend/python/qwen-tts/requirements-cublas13-after.txt containing flash-attn, mirroring the cublas12 variant.

Scope

This addresses the qwen-backend performance part of the issue. The separate vllm-omni-fails-entirely part (related to #8536) needs its own reproduction and is not covered here. Additive requirements change, not built locally.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

The cuda12 image installs flash-attn via requirements-cublas12-after.txt, but there was no cublas13 equivalent, so cuda13-qwen-tts never installed flash_attn and fell back to SDPA with warnings. Add the matching requirements-cublas13-after.txt. Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293
localai-bot wants to merge 1 commit into
masterfrom
fix/9293-qwen-tts-cuda13-flashattn

localai-bot commented Jun 12, 2026

Labels

2 participants