← Blog

The evolution of Fount's agent harness

Fount team··7 min read

Part 2 of the Marketer Graph series. Start with the overview, or read Part 1: the causal growth marketing problem first.

Quinley, our marketing chief-of-staff agent, has run on three significant versions of our agent harness: Quinley Base, Quinley Prime and Quinley Graph. This is the road from one to the next, and what the evals show.

The progression of Fount's agent harness

There is a piece of history worth telling first, because it explains where we are now. Insurers' data is, to put it plainly, spread out. It sits in many different systems and is stored in many different ways, across carriers, administrators, ad platforms, CRMs and no small number of spreadsheets, and the rules that govern it have shifted over the years as products, vendors and teams have changed. When Fount first started building with agents we already believed a knowledge graph was the right way to tame that, because we could see the problems it would solve. So we tried it. The graph structures we used then were strict and rigid, complex to build, and slow to stand up for even a single customer, and for an insurer, whose data lives in more places and forms than almost anyone's, they were punishing to set up and harder still to keep correct. They were the right idea at the wrong time, so we set them aside and built what got the job done for marketers immediately, the semantic layer that grounded Quinley Base. We came back to the graph only once agents had evolved, and once our own work on the agent harness and the graph structure had made it feasible to build and maintain at the speed a real customer needs.

Quinley, our marketing chief-of-staff agent (read: harness), has been through three significant versions to get to where it is now. The first was Quinley Base.

With Quinley Base we connected every customer's marketing data into a single warehouse/lake structure, covering the ad platforms, the events, the calls, the initiatives, experiments, marketing plans and the customer records. To let Quinley Base query all of that correctly, we gave it a semantic layer: a detailed, hand-built description of that customer's warehouse, the tables, the data rules, migrations, processes and the definitions, so the agent knew what everything meant before it wrote a query. A marketer could then ask a question in plain language and Quinley Base would lean on that semantic layer to write the query, run it against the unified data and context and return the numbers, with no SQL, no exports and no waiting on an analyst. For a lot of marketers that was the first time they could interrogate their own marketing data in a connected way at the speed of a conversation, and it remains a meaningful step ahead of how most of the market still operates. Quinley Base was very good at telling marketers what happened. Why it happened, the cause behind the move, was still theirs to work out.

Quinley Prime gave the agent a hand of its own. Where the first Quinley answered a question and forgot it, Prime could write, read and edit its own content through a structured file system: it drafts and revises analyses, creates plans it can self-reference, builds and edits reports, and keeps its own memories from one session to the next, so what it (and marketers) learned in a conversation last week is still there this week. It also became omnipresent, sitting on every page of the product with the live numbers from whatever dashboard the marketer is looking at, so they can point at a chart and ask what it means, why it moved and whether it is good or bad, and have the answer grounded in exactly what is on screen. Even so, Prime still described the world more than it explained it. It could show them what moved and keep a record of it, but the causal threads between everything it now produced were not something it could follow.

Despite the advancement we saw with Quinley Prime, something was still missing. Both versions shared the same limitation: the semantic layer was fixed and the relationships between entities had to be fully read in context. It was a careful, detailed description of the warehouse, and it evolved, but a person had to build it and a person had to keep it current, and it could not keep up with the business. All the work Prime could now produce, the analyses it ran, the reports it wrote, the memories it kept, sat alongside that semantic layer rather than inside the grounding the next answer would use. The relationships between things, which is where cause and effect actually live, were nowhere the agent could follow: that this caveat applies to that measure, that this experiment caused that result, that this exception explains that anomaly. Nothing remembered which analyses had already been run and verified, so the same work was redone again and again. And to teach the system something genuinely new, someone still had to go and update the semantic layer by hand. The institutional knowledge an experienced analyst carries around in their head stayed with the analyst, and each answer was only as good as the question behind it.

Quinley Graph was built to solve that. It runs on a new structure we built underneath the agent that we call the Marketer Graph, which turns the semantic layer and everything the agent produces into one living, connected graph that keeps itself current. Storing those relationships natively is what lets the agent begin to reason about cause and effect rather than just report the numbers. How that structure works is the subject of Part 3.

How the three versions compare

Version What changed The basis
Quinley Base Ask the marketing and customer data anything in plain language, grounded in a detailed description of the warehouse Leans on a fixed semantic layer to write and run the query
Quinley Prime The agent can write, read and edit its own work: analyses, reports and memories The same grounding, but it now authors and keeps its own content, and reads the live contents and artifacts the marketer is looking at
Quinley Graph The semantic layer and everything the agent produces become one living graph of relationships, understood instantly and multimodally Runs vetted definitions and verified analyses from the graph, carrying their caveats, confidence and connections

We score each version on the same internal eval benchmark, which sorts questions by the kind of work they demand, we combine text-to-SQL benchmarks in a similar way to how Spider and BIRD do: simple metric lookups, aggregations and breakdowns, multi-hop diagnostic queries that chain across tables, joins across separate data sources, and questions that turn on grounding a marketer's phrase to the right definition or value. We also benchmark qualitative agent conclusions against manually labeled training data given causal claims. This of course has its own challenges as labels but for the purpose of this exercise is more than sufficient.

For comparison we take Quinley Base as the index and Prime and the Graph are shown as their improvement over it, graded as described above. What one would intuitively note is that the simple lookups show only a modest improvement over Prime, given the old approach already handled them. It is the multi-hop, cross-source, grounding and confidence-driven reference activities that saw the biggest gains. These were the ones that challenged our agents most and needed the most intervention, and they are where the Graph pulls away.

Eval category Quinley Base Quinley Prime (with dynamic marketer context provided) Quinley Graph
Simple metric lookups Index +40% +53%
Aggregations and breakdowns Index +68% +93%
Multi-hop diagnostic queries Index +23% +74%
Joins across data sources Index +17% +84%
Grounding terms to definitions Index +31% +85%
Marketer causal claims Index +35% +65%

The token efficiency side moved even more than the performance side. Prime carries more in its context to reference and index its own analyses, reports and memories, so it used about seven percent more tokens per answer than Quinley Base. The Graph went the other way, and hard: it uses roughly 4.3 times fewer tokens per answer than Prime, around a 75 percent cut against the Quinley Base baseline.

Cost per answer Quinley Base Quinley Prime Quinley Graph
Average tokens used Index +7% -75%

The reason is the whole point of the graph. The earlier versions had to carry a large description of the warehouse and reason over raw tables on every question, which fills the context window before the real thinking starts. The Graph retrieves only the few vetted definitions a question actually needs and runs them, so the model reads far less to answer the same question. That saved context is not only cheaper, it provides headroom to let the agent take on more problems, reason further and chain more steps inside the same window, which is what makes acting on the data practical rather than just possible.


← Part 1 · Overview · Next: Turning marketing into a graph →

Frequently asked questions

What are the three versions of Quinley?

Quinley Base connected a customer's marketing data into one warehouse and used a hand-built semantic layer to answer questions in plain language. Quinley Prime gave the agent a hand of its own - it could write, read and edit its own analyses, reports and memories, and sat on every page of the product with the live numbers on screen. Quinley Graph runs on the Marketer Graph, which turns the semantic layer and everything the agent produces into one living, connected graph that keeps itself current.

How much better is Quinley Graph?

Indexed against Quinley Base, the Graph shows a 40%+ improvement on verifiable tasks overall, with the largest gains on the hardest categories: multi-hop diagnostic queries, joins across data sources, grounding a marketer's phrase to the right definition, and causal claims. Simple metric lookups improve only modestly, because the older approach already handled them well.

Why does the Graph use far fewer tokens?

The earlier versions had to carry a large description of the warehouse and reason over raw tables on every question, which fills the context window before the real thinking starts. The Graph retrieves only the few vetted definitions a question actually needs and runs them, so the model reads far less. That works out to roughly 4.3 times fewer tokens per answer than Prime, around a 75 percent cut against the Quinley Base baseline, and the saved context is headroom to reason further and chain more steps.

How are the versions benchmarked?

On the same internal eval that sorts questions by the kind of work they demand - simple lookups, aggregations, multi-hop diagnostics, joins across sources, and grounding terms to definitions - combined in a similar spirit to text-to-SQL benchmarks like Spider and BIRD. Qualitative agent conclusions are benchmarked against manually labelled training data for causal claims. Quinley Base is the index and Prime and Graph are shown as their improvement over it.