Skip to content

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293

Open
localai-bot wants to merge 1 commit into
masterfrom
fix/9293-qwen-tts-cuda13-flashattn
Open

fix(qwen-tts): install flash-attn on cuda13 images (#9293)#10293
localai-bot wants to merge 1 commit into
masterfrom
fix/9293-qwen-tts-cuda13-flashattn

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Re: #9293 (the qwen-tts backend part)

Problem

On CUDA-13 the Qwen TTS backend logs flash-attn warnings and falls back to SDPA. The cuda12 image installs flash-attn via requirements-cublas12-after.txt, but there was no requirements-cublas13-after.txt, so cuda13-qwen-tts never installed flash_attn.

Fix

Add backend/python/qwen-tts/requirements-cublas13-after.txt containing flash-attn, mirroring the cublas12 variant.

Scope

This addresses the qwen-backend performance part of the issue. The separate vllm-omni-fails-entirely part (related to #8536) needs its own reproduction and is not covered here. Additive requirements change, not built locally.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

The cuda12 image installs flash-attn via requirements-cublas12-after.txt,
but there was no cublas13 equivalent, so cuda13-qwen-tts never installed
flash_attn and fell back to SDPA with warnings. Add the matching
requirements-cublas13-after.txt.

Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants