Making an Impact With Large Language Models

Our CTIO on a real, measurable LLM use case: turning millions of dollars of manual document data entry into a controllable, multi-model extraction pipeline, and why the right answer often isn't the biggest model.

There’s no shortage of grand claims about what AI will do someday. I’m more interested in what it can do now, what’s realistically achievable with the technology as it actually exists today, with a return on investment you can measure. This is one of those cases.

The problem

One of our Fortune 100 Clients spends several million dollars a year converting information from mostly handwritten forms into digital text through manual data entry. The forms are genuinely hard: dense, highly variable layouts with many value pairs, open-ended questions, tables, and multiple pages, hundreds of distinct formats, across multiple languages. Over the years the Client had explored options with their technology partners, but the result was usually a shift in where the cost landed rather than a real reduction in it.

The approach

The solution we developed extracts document data near-autonomously while keeping the Client in full control of cost and quality. Rather than betting everything on one model, it uses multiple LLMs coordinated by a central “conductor” that routes work and scores confidence. When one model’s output isn’t confident enough, another model acts as a second pair of eyes, and the confidence levels improve incrementally as a result. The Client decides which models to add or remove from the processing engine based on the quality each use case requires and what they’re willing to spend. Human involvement drops to the cases that genuinely need it.

The iterative design is the point. LLMs each have strengths and weaknesses; the value comes from knowing where a second model improves the output and where human review is still warranted. The same pattern generalizes well beyond this one Client’s forms.

Why this matters for how we choose AI

This use case is a good illustration of a principle we hold generally: the best tool is the one that fits the problem, not the biggest one available. There’s real room to experiment with processing cost (including the literal energy cost of running large models) and you often find you don’t need a frontier-scale model to hit the quality you need. Smaller, efficient multimodal models (the Allen Institute’s “Molmo” family is one example, competitive with much larger models while using far less data) can do the job. Choosing deliberately across models (by quality, cost, and fit) is exactly what a specialist should be doing on a Client’s behalf.

In the interest of being straight about maturity: some components of this solution have been validated in a sandbox, and others remain conceptual. That’s the honest state of the work, and it’s how something moves responsibly from concept toward production.

If you have a document-heavy process that’s expensive and stubbornly manual, let’s talk about what’s realistically possible.

Making an Impact With Large Language Models

The problem

The approach

Why this matters for how we choose AI

More insights

Enhancing the Wealth Management Experience for Family Offices

Strengthening Cybersecurity While Guiding a Bank's FRB Audit

AI's Healthcare Potential, and the Guardrails It Needs

Have a problem like this?