| How to setup a local coding agent on macOS(ikyle.me) | |
| 475 points by kkm 2 days ago | 117 comments | |
tl;dr: Running Gemma 4 26B-A4B locally on an M1 Max via llama.cpp with Metal hits 58 tok/s, but adding a Q8 MTP draft model for speculative decoding (with `--spec-draft-n-max 3`) boosts it to 72 tok/s — faster than equivalent MLX setups. Pairing llama-server's OpenAI-compatible endpoint with the Pi terminal agent (configured for both text and image input via the multimodal projector) yields a usable local coding agent with screenshot support. Qwen3.6 35B-A3B is a stronger coder but runs slower at ~55 tok/s. | |
HN Discussion:
| |