The Momentum Report

July 15, 2026

llm CLI Tool Guide 2026: Run Any LLM From Your Terminal

Signal Trigger

Why We're Covering This

The llm CLI tool by Simon Willison carries a viral score of 67 in the HookFlow dataset this week — matching the tracked tier as Granola and outpacing LangChain (61) and Amazon Bedrock (61). The dataset shows a delta_7d of -13 for llm this week within a flagged recalibration cycle; cross-agent validation confirms nine tools with clean momentum signals, llm among them with a validated delta of +14 from that layer. The coverage case isn't a single-week spike. It's a structural pattern: an infrastructure CLI built by Django and Datasette's creator is holding heat parity with consumer-facing AI products. That matters to builders: if practitioners are standardizing on llm as their provider-agnostic command layer, what does that mean for your current integration architecture?

A.R.C. Analysis

Architecture · Reliability · Context

Architecture

llm is a Python CLI distributed via PyPI (pip install llm). It's not a wrapper around a single model—it's a plugin host. The core ships with OpenAI support; every other provider (Anthropic Claude, Google Gemini, Mistral, Groq, local models via Ollama) arrives through a typed plugin interface. The tool carries zero model dependency. Switching from GPT-4o to Claude 3.5 Sonnet requires no code change, only a -m flag or model alias reassignment.

All prompt history writes to a local SQLite database automatically, with no telemetry or cloud sync. The core is API-first in that it composes cleanly with shell pipelines—stdin and stdout are first-class interfaces. For production integration, llm slots into any existing shell-based CI/CD or data pipeline without a library dependency, new SDK, or HTTP client layer. Open weights models run fully local when paired with Ollama. Cloud inference requires only the target provider's API key.

Reliability

llm carries a viral score of 67 in the current HookFlow dataset. The project is Apache 2.0 licensed and maintained by Simon Willison, whose maintenance track record on Datasette (active since 2017) is the most relevant comparator. That project has never been abandoned despite no VC backing; llm follows the same model.

Community composition matters. Scout log patterns show llm's adopters are disproportionately developers, researchers, and data scientists—not early-adopter consumers chasing novelty. That cohort retains tools longer and churns less once embedded in workflows. The SQLite-based prompt log creates a local data asset that increases switching cost over time, in the tool's favor. No rate-limit complaints appear in current community data. No pricing instability risk: the tool itself is free; costs lie entirely upstream at the model provider layer.

Verdict: Build with it. A maintained, Apache-licensed CLI with a local-first data model and no single-provider dependency is a low-risk infrastructure bet.

Context

Real-world deployments surfacing in scout logs differ from marketing narratives. Practitioners use llm for four distinct jobs: (1) one-shot queries against multiple providers without opening a browser—faster than any chat UI for developers already in a terminal; (2) piping shell output directly into an LLM step inside existing scripts (git diff | llm "write a commit message"); (3) systematic model comparison—running the same prompt against Claude, GPT-4o, and a local Ollama model in sequence, then querying the SQLite log to diff outputs; (4) building lightweight personal knowledge infrastructure from logged AI interactions, queryable via standard SQL.

The plugin ecosystem is the architectural lever making use cases 1 through 3 viable without workflow fragmentation. See HookFlow's Ollama tool profile for local model execution patterns that pair directly with llm, and Claude Code's profile for the contrasting GUI-first approach to the same provider tier.

Installation & Quick Start

Install via PyPI. Python 3.8+ required.

pip install llm

Set your first API key:

llm keys set openai
# Paste your key at the prompt

Run your first prompt:

llm "explain vector embeddings in one sentence"

Every prompt is logged automatically. Review your history immediately:

llm logs

That's the full surface area of the core tool. Everything else—additional providers, local models, structured output—is additive through plugins.

Plugin System: Adding More Models

The plugin interface is where llm separates from every single-provider CLI. Each plugin is a standard PyPI package.

# Add Anthropic Claude
llm install llm-claude-3

# Add Google Gemini
llm install llm-gemini

# Add local models via Ollama
llm install llm-ollama

After installation, list every available model across all installed plugins:

llm models

Switch providers by passing the -m flag:

llm -m claude-3.5-sonnet "summarize this document"
llm -m gemini-1.5-pro "summarize this document"
llm -m ollama/llama3 "summarize this document"

Model aliases reduce friction further. Set a default model once:

llm aliases set default claude-3.5-sonnet

From that point, bare llm "prompt" calls route to Claude without any flag. Switching providers in a team context becomes a one-line alias update—not a codebase change. For full local model execution without any outbound API call, the Ollama integration documented at /tools/ollama is the reference path.

Prompt Logging & History

Every interaction—prompt, model, response, timestamp—writes to a local SQLite database. Location:

llm logs path
# Returns something like: /Users/yourname/Library/Application Support/io.datasette.llm/logs.db

Query recent logs:

llm logs
# Returns last 3 conversations by default

llm logs -n 20
# Last 20 entries

Export to JSON for programmatic processing:

llm logs --json

Because the store is SQLite, you can query it directly with any SQL client or with Datasette (Willison's own tool):

SELECT model, prompt, response, datetime(timestamp, 'localtime')
FROM responses
ORDER BY timestamp DESC
LIMIT 50;

Production use cases this enables include audit trails for LLM-assisted decisions, personal prompt pattern analysis, building a searchable knowledge base from AI-assisted research sessions, and cross-provider output comparison on identical prompts over time. The data never leaves your machine unless you explicitly export it.

Shell Scripting with llm

Three patterns surface repeatedly in scout log analysis of practitioner deployments:

Automated commit message generation

git diff --staged | llm "write a concise git commit message for this diff"

Add it as a git alias and it becomes a one-keystroke step in any commit workflow.

Error log diagnosis

tail -n 50 /var/log/app/error.log | llm "identify the root cause and suggest a fix"

Pipe the last 50 lines of any log file into llm. Works with any model—route complex traces to a high-context model, quick lookups to a faster or cheaper one.

Structured JSON extraction with jq

llm "extract the company name, ARR, and funding round from this text as JSON: $(cat deal_notes.txt)" \
  | jq '.company_name, .arr, .funding_round'

Combine llm's text output with jq to extract structured fields from unstructured input inside a pipeline. This pattern replaces a Python script with a one-liner in many data-wrangling contexts.

Frequently Asked Questions

How does llm compare to the OpenAI CLI?

The OpenAI CLI (openai) is provider-specific—it only calls OpenAI endpoints. llm is provider-agnostic by design: the same command syntax routes to any installed plugin backend. If your team is already using multiple providers or anticipates switching, llm's plugin architecture eliminates the need for separate CLI tools per vendor. The prompt logging to SQLite also has no equivalent in the OpenAI CLI.

Can I use llm with local models, with no data leaving my machine?

Yes. Install the llm-ollama plugin, run Ollama locally, and point llm at a local model with -m ollama/modelname. No API key required. No outbound network calls for the inference step. The prompt log writes to local SQLite only. The full local execution path is documented in HookFlow's Ollama tool profile.

What happens to my prompts—is anything sent to a server besides the model API?

Nothing beyond the model provider's API endpoint. The llm binary itself has no telemetry, no analytics calls, and no cloud sync. Your prompts go to whichever model backend you target (OpenAI, Anthropic, etc.) and are logged locally to SQLite. If you run a fully local model via Ollama, no prompt data leaves your machine at any point.

Track the Heat Score Live

The viral score and cross-platform signal data for llm—along with every other tool in the HookFlow dataset—updates continuously. If you're making a build-vs-buy decision on CLI tooling, provider abstraction layers, or local model infrastructure, the live score is the signal to watch.

→ Track the llm heat score and 40+ AI infrastructure tools live at hookflow.ai

Heat scores update daily across 300+ AI tools.

Track every tool in real time →

← More blog posts