Tags · mudler/LocalAI

v4.5.6

feat(config): default swa_full:true for sliding-window-attention mode…

…ls (#10611)

LocalAI enables a cross-request prompt-prefix cache (cache_reuse, see
core/config/serving_defaults.go) so repeated prefixes — system prompts,
RAG context, agent scaffolds, multi-turn chat — are not reprocessed every
turn. For sliding-window-attention (SWA) models (Gemma 2/3, Cohere2,
Llama 4, ...) this silently does nothing: llama.cpp defaults to a reduced
SWA KV cache sized to the sliding window, and that reduced cache cannot
preserve a prompt prefix across requests, so every turn reprocesses the
whole prompt anyway.

llama.cpp's --swa-full (params.swa_full, already wired through the
LocalAI llama.cpp backend's `swa_full` option) keeps the full KV cache so
the shared prefix is reused. Enable it automatically, but only for models
that are actually SWA: detection reads the gguf-parser-normalized
`<arch>.attention.sliding_window` metadata (which also applies llama.cpp's
family rules, e.g. Phi-3 → not SWA), right where the GGUF is already
parsed for defaults. It is never applied to dense models (pure memory
waste) and never overrides an explicit user `swa_full`/`n_swa` choice.

Tradeoff: the full SWA cache scales with context_size, so it costs more
memory at large contexts — hence the SWA gating and the documented
`swa_full:false` opt-out.

Assisted-by: Claude:claude-opus-4-8 [Claude Code] golangci-lint

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 30, 2026
02b007a
zip
tar.gz
Notes
Downloads

v4.5.5

fix(backends): whisper darwin run.sh loads whichever fallback lib exi…

…sts (.so/.dylib) (#10553)

fix(backends): whisper darwin run.sh loads whichever fallback lib exists

The macOS branch hardcoded WHISPER_LIBRARY=$CURDIR/libgowhisper-fallback.dylib,
but the cmake build emits a Mach-O named libgowhisper-fallback.so on darwin, so
the Go loader panicked at runtime ("dlopen ...dylib: no such file") and the
backend exited ("grpc service not ready") — breaking e.g. the silero-vad-ggml
VAD on darwin. Pick whichever of .dylib/.so is present so it is robust to the
build's naming either way.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 27, 2026
d11b202
zip
tar.gz
Notes
Downloads

v4.5.4

fix(backends): derive darwin RUN_BINARY from the exec line only (#10541)

golang-darwin.sh's packaging check derived the launch binary by grepping every
$CURDIR/... reference in run.sh and taking the last one. Backends that pick a
runtime CPU variant assign it via unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so`
lines, so the heuristic returned `libgo<x>-avx512.so` — a variant Darwin never
builds (arm64 builds only fallback) — and the check then failed with
"package/libgo<x>-avx512.so not found ... refusing to package (#10267)",
breaking the darwin builds for whisper, sam3-cpp, vibevoice-cpp and friends.

Scan only the `exec` line(s) (the actual launch contract) and tolerate a
quoted `exec "$CURDIR"/<binary>`. parakeet-cpp's parakeet-cpp-grpc and the
quoted-only backends (sherpa/piper/opus) resolve correctly; no Linux change.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 27, 2026
14b29eb
zip
tar.gz
Notes
Downloads

v4.5.3

fix(llama-cpp): stop reinterpreting plain-string message content as J…

…SON (#10524) (#10538)

The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the
tokenizer-template path and blindly json::parse'd each message's content
string. LocalAI's Go layer always flattens content to a plain string, so a
user prompt that merely looks like JSON (e.g. mealie's ingredient array
["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and
rejected by oaicompat_chat_params_parse with "unsupported content[].type".

Normalize content per role instead: user/system/developer content is opaque
text and is never JSON-sniffed; assistant/tool content still collapses a literal
JSON null/object (tool-call bookkeeping) to a string, but a plain string is
never turned into an array/scalar. The array defense is role-independent, so the
role gate only governs the benign null/object case.

While here, extract the duplicated per-message reconstruction and the
pre-template content sanitization into shared, unit-tested helpers
(message_content.h) so the streaming (PredictStream) and non-streaming (Predict)
paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the
dead tool-role parse branches, and the redundant Predict-only tool_calls branch,
while preserving the prior #7324 (null content -> "") and #7528 (tool array
content -> string) fixes.

Tests:
- backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for
  all three helpers (#10524, #7324, #7528, multimodal), discovered and run by
  `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired
  as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON).
- core/schema/message_test.go: Go regression pinning that ToProto flattens a
  JSON-array-looking text part to the verbatim string.
- prepare.sh now copies message_content.h into the build tree.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 26, 2026
f0d0bff
zip
tar.gz
Notes
Downloads

v4.5.2

fix(backends): make the opus backend build and package on macOS/Darwin (

#10523)

The opus Go backend (WebRTC audio codec) never built on macOS, so the
published master-metal-darwin-arm64-opus image shipped source only — no
opus binary and no libopusshim — because every step assumed Linux.

- Makefile: hardcoded libopusshim.so with no OS handling. Mirror
  sherpa-onnx: SHIM_EXT=so / dylib on Darwin and build
  libopusshim.$(SHIM_EXT). On Darwin link the shim with
  -undefined dynamic_lookup so it resolves opus_encoder_ctl from the
  already globally-loaded libopus (codec.go dlopens it RTLD_GLOBAL
  first) instead of baking an absolute Homebrew path into the dylib,
  keeping the packaged shim relocatable.
- run.sh: hardcoded LD_LIBRARY_PATH + libopusshim.so even on macOS. Add
  a Darwin branch exporting DYLD_LIBRARY_PATH and the .dylib shim, like
  sherpa-onnx/run.sh.
- package.sh: bundle libopusshim.$(SHIM_EXT) and libopus*.dylib (not
  just .so) into package/lib so the OCI image (which ships package/.)
  is self-contained on a runtime with no Homebrew; add a Darwin arch
  branch so it doesn't warn/skip.
- backend_build_darwin.yml: install + link opus and pkg-config via brew
  so the Makefile's `pkg-config opus` resolves on the macOS runner, and
  cache opus' Cellar dir.

Go code is unchanged; darwin build is validated in CI.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 26, 2026
6afe127
zip
tar.gz
Notes
Downloads

v4.5.1

fix(backends): ship the package/ dir for darwin go backend images (#1…

…0522)

fix(backends): ship the package/ dir for darwin go backends

golang-darwin.sh packaged the whole backend source/build dir as the OCI
image (backend/go/$BACKEND/.), so the runtime dylibs ended up under
package/lib and backend-assets/lib while run.sh looks in $CURDIR/lib. As a
result a backend like sherpa-onnx could not dlopen its libsherpa-shim.dylib
at runtime and exited immediately (the model then 500s with "grpc service
not ready"); it started fine only when run from inside package/.

Ship package/. instead — the self-contained run.sh + binary + lib/ bundle —
matching the Linux Dockerfile.golang (`COPY .../package/. ./`). Backends
that don't assemble a package/ fall back to the backend dir, and the
binary-existence guard now checks the directory actually shipped.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 26, 2026
f58dcef
zip
tar.gz
Notes
Downloads

v4.5.0

chore(model-gallery): ⬆️ update checksum (#10469)

⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

Jun 23, 2026
deb430f
zip
tar.gz
Notes
Downloads

v4.4.3

test(e2e): live-server voice-recognition gate test (#10324)

Add mock-backend VoiceEmbed/VoiceVerify (deterministic DC-offset speaker
discrimination) and a verify-mode gated realtime pipeline, then drive the
real HTTP/WS stack: an authorized speaker reaches response.done while an
unauthorized one is dropped before the LLM with a speaker_not_authorized
event.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 13, 2026
4d3d54d
zip
tar.gz
Notes
Downloads

v4.4.2

fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packag…

…es (#10257)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

Jun 11, 2026
58cdc05
zip
tar.gz
Notes
Downloads

v4.4.1

docs: fix broken relref to realtime page (#10255)

Hugo fails the gh-pages build with REF_NOT_FOUND because the relref
in model-configuration.md uses the 'docs/' prefix; refs are resolved
relative to content/, so the page lives at 'features/openai-realtime'
(as the other ref in the same file already uses).


Assisted-by: Claude Code:claude-fable-5

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Jun 11, 2026
f618636
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v4.5.6

v4.5.5

v4.5.4

v4.5.3

v4.5.2

v4.5.1

v4.5.0

v4.4.3

v4.4.2

v4.4.1

Uh oh!

Tags: mudler/LocalAI