Perspective · 2026

How we think about AI.

The AI tooling landscape moved a lot between 2023 and 2026. We have shipped through every phase of it. Five things we believe today, and how they shape how we build.

01
Models are not the moat
The moat is retrieval, evals, versioned prompts, and infrastructure that survives load.
Read more →
02
Agents are real, most demos are not
Tool access, memory, structured outputs, recovery logic. Or it's a sandbox demo.
Read more →
03
Hallucination reduction is a stack
Each layer catches what the others miss. No model switch fixes this.
Read more →
04
Evals are the single biggest gap
Teams ship a prompt, three test cases, call it done. Then quality drifts.
Read more →
05
The buying question has changed
Two years ago: "can we." Today: "should we, and at what cost."
Read more →

Models are not the moat. The system around them is.

Anyone can call Claude Opus 4.7, GPT-5, Gemini 3, or Llama 4. The moat is the system around the model: retrieval that actually retrieves the right thing, evals that catch regressions before users do, prompts that are versioned and tested like code, and infrastructure that survives a production load test.

Where we've shipped this — Enterprise RAG with neural reranking →

Agents are real, but most "agentic" demos are not.

A working agent has tool access, durable memory, structured outputs, and recovery logic for when a tool call fails. Most production work we do involves agentic patterns now.

The Model Context Protocol (MCP) has become a useful standard for exposing internal APIs to LLMs, and we build MCP servers as part of integration work where it makes sense.

Where we've shipped this — AI agents across 6+ programmatic platforms →

Hallucination reduction is a stack, not a setting.

Grounding through retrieval, citation requirements, structural constraints in the prompt, output validation against schemas, confidence-aware fallbacks, and a human review path for the cases the model gets wrong. Each layer catches what the others miss. Nothing about this is solved by switching models.

Where we've shipped this — Multi-layer RAG with citations and validation →

Evals are the single biggest gap in most teams' AI work.

Teams ship a prompt, watch it work in three test cases, and call it done. Six weeks later quality has drifted, traffic has shifted, the model version updated, and nobody has a number for how good or bad the output is.

We treat eval design as a deliverable on every AI engagement.

Where we've shipped this — Eval-driven analytics with explainable outputs →

The buying question has changed.

Two years ago, the question was "can we use AI for this." Today the question is "should we, and at what cost." Compute is not free, latency is not free, and a wrong AI answer in a customer-facing surface is more expensive than no AI answer at all.

We help clients answer the second question honestly.

This is the operating context we work in, and it shapes how we scope, build, and hand off every engagement.

Have a real AI system to build?

30-minute scoping call. We'll tell you if it's worth doing — and if not, what to do instead.

Start the conversation →

How we think about AI.

Models are not the moat

Agents are real, most demos are not

Hallucination reduction is a stack

Evals are the single biggest gap