whichllm
Andyyyy64
CLI that detects your hardware and ranks local LLMs that will run well on it, scored against real benchmarks.
What is whichllm?
A CLI that detects your hardware (GPU, CPU, RAM) and ranks the local LLM that will actually run well on it, scored against real benchmarks (LiveBench, Artificial Analysis, Aider, Arena ELO) rather than parameter count alone.
Pros & Cons
Pros
- Evidence-based ranking from multiple leaderboards, not a size heuristic
- Confidence markers (~ for estimated, ? for no data) - honest about uncertainty
- Scriptable JSON output, plus GPU simulation for purchase planning
Cons
- Speed figures are estimates, not measured guarantees
- Ollama integration needs manual HuggingFace ID mapping
- Early 0.x phase (v0.5.8)
License
MIT (OSI-open)
When it is interesting
Deciding what to run, or which GPU to buy, before you commit.
When it is too early
If you need measured throughput rather than estimates.
This repo featured in the 2026-06 edition of the Open-Source AI Radar.
oMLX
jundot
macOS-native LLM inference server for Apple Silicon with continuous batching and SSD-tiered caching.
apfel
Arthur-Ficial
Expose the on-device Apple Intelligence model on macOS 26 as a zero-setup OpenAI-compatible local API.
shimmy
Michael-A-Kuykendall
Pure-Rust local inference engine with an OpenAI-compatible API, shipped as one binary.