Skip to main content

Changelog

Every improvement, automatically tracked from our commit history.

Subscribe via Atom feed
← Prev Page 75 of 117 Next →
February 19, 2026
patch Core

Debug(android): add step-by-step logging to setup_unified_recovery FFI

Details

Adds android_log calls at each step of the unified recovery setup flow

(mnemonic generation, vault recovery, keypair generation, S3 upload) to

aid debugging on Android where stdout/stderr are not visible. Converts

existing eprintln calls to android_log for consistency.

patch Desktop Shell

Remove redundant VaultInitialize call causing 'already initialized' error (v1.40.3)

Desktop 1.40.2 → 1.40.3 | 8963f599
Details

SaveAiApiKeyAsync was calling RequestVaultUnlockAsync (which internally

initializes the vault if needed) and then calling VaultInitialize again

with a hardcoded password. The second call always failed with

AlreadyInitialized since the vault was just initialized by the unlock

flow. Simplified to a single VaultIsUnlocked + RequestVaultUnlockAsync

check which handles both initialization and unlock.

patch Desktop Shell

Switch to InteractiveExecutor and report token count (v1.40.2)

Details

Replace StatelessExecutor with InteractiveExecutor using a fresh context

per inference request. StatelessExecutor was creating and destroying

multiple Metal contexts per call, potentially causing inference issues.

InteractiveExecutor uses a single stable context. Also added pre-

tokenization diagnostic logging (prompt token count) and set TokensUsed

in the response so AiService logs accurate token counts.

patch Desktop Shell

Resolve double BOS token and enable Metal GPU offload (v1.40.2)

Desktop 1.40.1 → 1.40.2 | 73d97471
Details

StatelessExecutor automatically prepends a BOS token, but the Llama 3.x

chat template also included <|begin_of_text|> (the BOS token), causing

a double BOS that confused the model into immediately emitting EOS and

returning 0 tokens. Removed the explicit BOS from the template.

Also changed GpuLayerCount from 0 (CPU-only) to -1 (offload all layers)

so the model uses Metal acceleration on Apple Silicon, which should

significantly improve inference speed.

patch Desktop Shell

Use model-specific chat templates for local LLM inference (v1.40.1)

Desktop 1.40.0 → 1.40.1 | f42603cf
Details

The local LLama provider was hardcoded to use the Phi-3 chat template

(<|system|>...<|end|>) for all models. When running Llama 3.2 or

Mistral models, the unrecognized special tokens caused the model to

immediately emit EOS, returning 0 tokens. Added FormatPrompt() that

selects the correct chat template and anti-prompts per model family:

  • Llama 3.x: <|begin_of_text|><|start_header_id|>... format
  • Mistral: [INST]...[/INST] format
  • Phi-3: <|system|>...<|end|> format (default)
← Prev Page 75 of 117 Next →

Get notified about new releases