Debug(android): add step-by-step logging to setup_unified_recovery FFI
Details
Adds android_log calls at each step of the unified recovery setup flow
(mnemonic generation, vault recovery, keypair generation, S3 upload) to
aid debugging on Android where stdout/stderr are not visible. Converts
existing eprintln calls to android_log for consistency.
Remove redundant VaultInitialize call causing 'already initialized' error (v1.40.3)
Details
SaveAiApiKeyAsync was calling RequestVaultUnlockAsync (which internally
initializes the vault if needed) and then calling VaultInitialize again
with a hardcoded password. The second call always failed with
AlreadyInitialized since the vault was just initialized by the unlock
flow. Simplified to a single VaultIsUnlocked + RequestVaultUnlockAsync
check which handles both initialization and unlock.
Switch to InteractiveExecutor and report token count (v1.40.2)
Details
Replace StatelessExecutor with InteractiveExecutor using a fresh context
per inference request. StatelessExecutor was creating and destroying
multiple Metal contexts per call, potentially causing inference issues.
InteractiveExecutor uses a single stable context. Also added pre-
tokenization diagnostic logging (prompt token count) and set TokensUsed
in the response so AiService logs accurate token counts.
Resolve double BOS token and enable Metal GPU offload (v1.40.2)
Details
StatelessExecutor automatically prepends a BOS token, but the Llama 3.x
chat template also included <|begin_of_text|> (the BOS token), causing
a double BOS that confused the model into immediately emitting EOS and
returning 0 tokens. Removed the explicit BOS from the template.
Also changed GpuLayerCount from 0 (CPU-only) to -1 (offload all layers)
so the model uses Metal acceleration on Apple Silicon, which should
significantly improve inference speed.
Use model-specific chat templates for local LLM inference (v1.40.1)
Details
The local LLama provider was hardcoded to use the Phi-3 chat template
(<|system|>...<|end|>) for all models. When running Llama 3.2 or
Mistral models, the unrecognized special tokens caused the model to
immediately emit EOS, returning 0 tokens. Added FormatPrompt() that
selects the correct chat template and anti-prompts per model family:
- Llama 3.x: <|begin_of_text|><|start_header_id|>... format
- Mistral: [INST]...[/INST] format
- Phi-3: <|system|>...<|end|> format (default)
Get notified about new releases