Skip to content

Tags: mudler/LocalAI

Tags

v4.5.6

Toggle v4.5.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(config): default swa_full:true for sliding-window-attention mode…

…ls (#10611)

LocalAI enables a cross-request prompt-prefix cache (cache_reuse, see
core/config/serving_defaults.go) so repeated prefixes — system prompts,
RAG context, agent scaffolds, multi-turn chat — are not reprocessed every
turn. For sliding-window-attention (SWA) models (Gemma 2/3, Cohere2,
Llama 4, ...) this silently does nothing: llama.cpp defaults to a reduced
SWA KV cache sized to the sliding window, and that reduced cache cannot
preserve a prompt prefix across requests, so every turn reprocesses the
whole prompt anyway.

llama.cpp's --swa-full (params.swa_full, already wired through the
LocalAI llama.cpp backend's `swa_full` option) keeps the full KV cache so
the shared prefix is reused. Enable it automatically, but only for models
that are actually SWA: detection reads the gguf-parser-normalized
`<arch>.attention.sliding_window` metadata (which also applies llama.cpp's
family rules, e.g. Phi-3 → not SWA), right where the GGUF is already
parsed for defaults. It is never applied to dense models (pure memory
waste) and never overrides an explicit user `swa_full`/`n_swa` choice.

Tradeoff: the full SWA cache scales with context_size, so it costs more
memory at large contexts — hence the SWA gating and the documented
`swa_full:false` opt-out.

Assisted-by: Claude:claude-opus-4-8 [Claude Code] golangci-lint

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.5

Toggle v4.5.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(backends): whisper darwin run.sh loads whichever fallback lib exi…

…sts (.so/.dylib) (#10553)

fix(backends): whisper darwin run.sh loads whichever fallback lib exists

The macOS branch hardcoded WHISPER_LIBRARY=$CURDIR/libgowhisper-fallback.dylib,
but the cmake build emits a Mach-O named libgowhisper-fallback.so on darwin, so
the Go loader panicked at runtime ("dlopen ...dylib: no such file") and the
backend exited ("grpc service not ready") — breaking e.g. the silero-vad-ggml
VAD on darwin. Pick whichever of .dylib/.so is present so it is robust to the
build's naming either way.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.4

Toggle v4.5.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(backends): derive darwin RUN_BINARY from the exec line only (#10541)

golang-darwin.sh's packaging check derived the launch binary by grepping every
$CURDIR/... reference in run.sh and taking the last one. Backends that pick a
runtime CPU variant assign it via unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so`
lines, so the heuristic returned `libgo<x>-avx512.so` — a variant Darwin never
builds (arm64 builds only fallback) — and the check then failed with
"package/libgo<x>-avx512.so not found ... refusing to package (#10267)",
breaking the darwin builds for whisper, sam3-cpp, vibevoice-cpp and friends.

Scan only the `exec` line(s) (the actual launch contract) and tolerate a
quoted `exec "$CURDIR"/<binary>`. parakeet-cpp's parakeet-cpp-grpc and the
quoted-only backends (sherpa/piper/opus) resolve correctly; no Linux change.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.3

Toggle v4.5.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(llama-cpp): stop reinterpreting plain-string message content as J…

…SON (#10524) (#10538)

The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the
tokenizer-template path and blindly json::parse'd each message's content
string. LocalAI's Go layer always flattens content to a plain string, so a
user prompt that merely looks like JSON (e.g. mealie's ingredient array
["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and
rejected by oaicompat_chat_params_parse with "unsupported content[].type".

Normalize content per role instead: user/system/developer content is opaque
text and is never JSON-sniffed; assistant/tool content still collapses a literal
JSON null/object (tool-call bookkeeping) to a string, but a plain string is
never turned into an array/scalar. The array defense is role-independent, so the
role gate only governs the benign null/object case.

While here, extract the duplicated per-message reconstruction and the
pre-template content sanitization into shared, unit-tested helpers
(message_content.h) so the streaming (PredictStream) and non-streaming (Predict)
paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the
dead tool-role parse branches, and the redundant Predict-only tool_calls branch,
while preserving the prior #7324 (null content -> "") and #7528 (tool array
content -> string) fixes.

Tests:
- backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for
  all three helpers (#10524, #7324, #7528, multimodal), discovered and run by
  `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired
  as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON).
- core/schema/message_test.go: Go regression pinning that ToProto flattens a
  JSON-array-looking text part to the verbatim string.
- prepare.sh now copies message_content.h into the build tree.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.2

Toggle v4.5.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(backends): make the opus backend build and package on macOS/Darwin (

#10523)

The opus Go backend (WebRTC audio codec) never built on macOS, so the
published master-metal-darwin-arm64-opus image shipped source only — no
opus binary and no libopusshim — because every step assumed Linux.

- Makefile: hardcoded libopusshim.so with no OS handling. Mirror
  sherpa-onnx: SHIM_EXT=so / dylib on Darwin and build
  libopusshim.$(SHIM_EXT). On Darwin link the shim with
  -undefined dynamic_lookup so it resolves opus_encoder_ctl from the
  already globally-loaded libopus (codec.go dlopens it RTLD_GLOBAL
  first) instead of baking an absolute Homebrew path into the dylib,
  keeping the packaged shim relocatable.
- run.sh: hardcoded LD_LIBRARY_PATH + libopusshim.so even on macOS. Add
  a Darwin branch exporting DYLD_LIBRARY_PATH and the .dylib shim, like
  sherpa-onnx/run.sh.
- package.sh: bundle libopusshim.$(SHIM_EXT) and libopus*.dylib (not
  just .so) into package/lib so the OCI image (which ships package/.)
  is self-contained on a runtime with no Homebrew; add a Darwin arch
  branch so it doesn't warn/skip.
- backend_build_darwin.yml: install + link opus and pkg-config via brew
  so the Makefile's `pkg-config opus` resolves on the macOS runner, and
  cache opus' Cellar dir.

Go code is unchanged; darwin build is validated in CI.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.1

Toggle v4.5.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(backends): ship the package/ dir for darwin go backend images (#1…

…0522)

fix(backends): ship the package/ dir for darwin go backends

golang-darwin.sh packaged the whole backend source/build dir as the OCI
image (backend/go/$BACKEND/.), so the runtime dylibs ended up under
package/lib and backend-assets/lib while run.sh looks in $CURDIR/lib. As a
result a backend like sherpa-onnx could not dlopen its libsherpa-shim.dylib
at runtime and exited immediately (the model then 500s with "grpc service
not ready"); it started fine only when run from inside package/.

Ship package/. instead — the self-contained run.sh + binary + lib/ bundle —
matching the Linux Dockerfile.golang (`COPY .../package/. ./`). Backends
that don't assemble a package/ fall back to the backend dir, and the
binary-existence guard now checks the directory actually shipped.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.5.0

Toggle v4.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(model-gallery): ⬆️ update checksum (#10469)

⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v4.4.3

Toggle v4.4.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
test(e2e): live-server voice-recognition gate test (#10324)

Add mock-backend VoiceEmbed/VoiceVerify (deterministic DC-offset speaker
discrimination) and a verify-mode gated realtime pipeline, then drive the
real HTTP/WS stack: an authorized speaker reaches response.done while an
unauthorized one is dropped before the LLM with a speaker_not_authorized
event.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.4.2

Toggle v4.4.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packag…

…es (#10257)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

v4.4.1

Toggle v4.4.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
docs: fix broken relref to realtime page (#10255)

Hugo fails the gh-pages build with REF_NOT_FOUND because the relref
in model-configuration.md uses the 'docs/' prefix; refs are resolved
relative to content/, so the page lives at 'features/openai-realtime'
(as the other ref in the same file already uses).


Assisted-by: Claude Code:claude-fable-5

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>