← Blog

Trust, rails and how we know it's right

Fount team··5 min read

Part 4 of the Marketer Graph series. Start with the overview, or read Part 3: turning marketing into a graph first.

A connected graph is only useful if the numbers coming off it can be trusted. This is how Quinley produces answers on rails, how every figure stays traceable, how we prove it is right, and how it keeps getting sharper.

Answers on traceable, trusted, governable rails

Fount operates in an industry built on trust and the careful pricing of risk. Insurers live and die by getting numbers right, by knowing what they can rely on and what they cannot, and by being able to show their working when a regulator, a reinsurer or a board asks how a figure was reached. A marketing system that plugs into that world cannot be a black box that hands over a confident number and asks to be taken on faith. It has to be trustworthy in the same way the rest of the business is held to account: every figure traceable to its source, every definition agreed rather than improvised, and every step something a marketer could defend if challenged. That is why the answer path runs on rails rather than on improvisation.

When it is finally time to produce a figure, Quinley does not hand improvised SQL to the warehouse. It runs a vetted definition. We hold these as models, each one a typed function from inputs to a table: the query lives in the graph with named, validated parameters rather than values pasted into a string, so a model can only be run with the inputs it declares and within the ranges it allows. Recipes compose several of those models into a whole report, binding one set of inputs once and threading it through every step. Each is versioned, so improving a definition writes a new version rather than quietly rewriting history, and every result comes back with its source tables and how fresh the underlying data is. Execution is read-only, and if one step of a report cannot be produced, that section comes back marked unavailable rather than confidently wrong, while the rest of the report still stands. Every figure also carries the definition and the source tables behind it, so any number can be followed back to exactly how it was produced. The freedom is in the question a marketer asks. The execution runs on rails.

So the question we started with runs quite differently now. When a marketer asks why the leads were down last month, the graph works through it: leads fell fourteen percent, the fall was concentrated in paid social, and inside paid social it sits in one channel where spend held flat but conversions dropped. It links that channel to the creative concept that had been carrying it, notes that the concept was paused on the twelfth against a ticket in Asana, and flags that the recut which replaced it was already showing fatigue. They get the chain of causes, the numbers behind each step, and the source for each one. Prime could already give the next person a head start, but only if someone had run that analysis and written it down first. The graph changes three things about that. The connections are now stored natively rather than worked out from context each time, so the chain above holds together on its own. It reaches across more of the picture, including the ad creative itself, which is vectorized so the concept and its performance sit in the same fabric as the numbers. And it stays live, so the next person benefits whether or not anyone remembered to save the work.

How we know it works

All of this only matters if a marketer can stand behind the number when someone above them asks where it came from. A recommendation that cannot be defended is worse than no recommendation at all, so we hold the system to a hard standard. We do not check Quinley's answers against figures we typed in by hand, because hand-typed figures are prone to human error and go stale. Instead, each test carries a piece of reference logic that we run against the warehouse and/or documents embedded into the Fount platform at the moment of grading to work out the correct answer right then, we run the agent, and we compare the two. Every test runs twice, once with the graph switched on and once with it switched off, which is the measurement that actually justifies the whole project. On our hardest cases, the ones built around exactly the confusions that produce plausible wrong answers, a deliberately small model answered four of seven correctly with the graph and none at all without it. And the cases are not invented for the test, they are drawn from a real corpus of around four hundred customer sessions, so the traps we check against are the traps marketers actually hit. The point of all of it is simple: when a marketer carries the number into a meeting, it holds up because it can clearly be evidenced.

This is the same loop that took AI from autocomplete to writing real software. The coding assistants got good not because the models simply got smarter, but because they stopped trusting plausible-looking output and started running it: write the code, run the tests and the type-checker, read the failures, fix them. We do the equivalent for marketing. The reference logic is our test suite, the warehouse is the thing it runs against, and a number that does not reconcile is a failing test, caught before it ever reaches the marketer. Marketing analytics has never had that loop. It does now.

It gets sharper every week

Finally, the graph is built to learn. Every action it takes is logged, and that record is the raw material for improving itself. The whole marketing loop is logged and graphed. When the agent works something useful out in the course of answering, it can deposit that back into the graph, but only ever as a low-confidence candidate, never on top of a trusted definition. Separately, a quieter background process reviews recent activity, notices where calls failed and which questions keep coming up, and proposes additions to the knowledge, grounded in evidence and stripped of any individual's identity, which sit waiting to be ratified rather than going live on their own. Definitions that keep getting used become candidates for promotion, and a candidate that passes our checks can earn its way to canonical. I would rather say plainly that this flywheel is not yet fully automatic than oversell it, but the mechanism is wired end to end: the graph watches how it is used, proposes ways to get better, and keeps anything unproven out of the answers a customer ever sees.


← Part 3 · Overview · Next: Causal AI and governed action →

Frequently asked questions

What does 'answers on rails' mean?

When it is time to produce a figure, Quinley does not hand improvised SQL to the warehouse. It runs a vetted definition held as a model - a typed function from inputs to a table, with named, validated parameters rather than values pasted into a string. Recipes compose several models into a whole report. Each is versioned, execution is read-only, and every figure comes back with the definition and source tables behind it, so any number can be followed back to exactly how it was produced. The freedom is in the question a marketer asks; the execution runs on rails.

How do you actually verify an answer is correct?

We do not check Quinley's answers against figures typed in by hand, because those are error-prone and go stale. Each test carries a piece of reference logic that we run against the warehouse and embedded documents at the moment of grading to work out the correct answer right then, we run the agent, and we compare the two. Every test runs twice, once with the graph on and once with it off. On our hardest cases, a deliberately small model answered four of seven correctly with the graph and none at all without it.

Where do the test cases come from?

They are not invented for the test. They are drawn from a real corpus of around four hundred customer sessions, so the traps we check against - the confusions that produce plausible wrong answers - are the ones marketers actually hit.

How does the graph improve over time?

Every action it takes is logged, and that record is the raw material for improving itself. When the agent works something useful out, it can deposit it back into the graph, but only ever as a low-confidence candidate, never on top of a trusted definition. A quieter background process reviews recent activity, notices where calls failed and which questions keep coming up, and proposes additions grounded in evidence and stripped of identity, which wait to be ratified rather than going live on their own. Definitions that keep getting used can earn their way from candidate to canonical.

How is this like the way AI got good at coding?

The coding assistants got good not because the models simply got smarter, but because they stopped trusting plausible-looking output and started running it: write the code, run the tests and the type-checker, read the failures, fix them. We do the equivalent for marketing. The reference logic is the test suite, the warehouse is what it runs against, and a number that does not reconcile is a failing test, caught before it ever reaches the marketer. Marketing analytics never had that loop. It does now.