Turning marketing into a graph you can search
Part 3 of the Marketer Graph series. Start with the overview, or read Part 2: the evolution of our agent harness first.
This is the structure underneath Quinley Graph: how the Marketer Graph is built, and how the agent finds the right knowledge in it from a question asked in plain language.
How the Marketer Graph changes how our agents work
The Marketer Graph changes what sits underneath the agent. With Prime the grounding came in two separate pieces: the semantic layer that described the warehouse, and the growing body of artifacts the agent produced and could search through, the analyses, the reports, the memories. They sat side by side but were not connected to one another, so the relationships between them had to be worked out afresh on every question. The Marketer Graph folds both into a single living structure where everything is linked and which keeps itself current, and we put that between the agent and the warehouse.
At the storage layer it is deliberately simple. It is three tables, one for the things we know, one for how those things connect, and one that records every action the system takes.
The things we know are held as nodes, and they come in two layers. The first layer is everything needed to query the data correctly: the concepts a marketer talks about, the entities in the warehouse, the dimensions and scopes they slice by, the measures themselves, and, crucially, the caveats attached to them. This is the old semantic layer, but broken into connected pieces rather than held as one fixed document. The second layer is everything the business and the agent produce as they go: the analyses that have been run and verified along with their results, the models and recipes behind them, the experiments and their outcomes, the exceptions worth remembering, the product information, and the playbooks and insights built up over time. So the graph holds "here is exactly how blended CPL is computed for this account", and "here is the verified analysis we ran last month", and "here is what happened the last time renters spend was back-loaded into the final week".
The connections matter as much as the nodes. When the graph links a measure to its caveat, or marks two definitions of "lead" as being in conflict, or records that one decision led to a particular outcome, those links are part of the structure. The links from a decision to its outcome are the beginnings of a causal picture rather than a correlational one. That means when the agent reaches for a metric, it picks up the caveats and the conflicts that hang off it in the same motion, because they are one step away by design rather than something it has to remember to go and check. This is where marketing differs from code. A codebase hands these relationships to the agent for free, through the compiler and the references that already link one file to another, which is a large part of why AI got good at coding so fast. Marketing data has no compiler and arrives with none of that structure, so we have to build it, and that construction is most of the work and most of the value.
Because the knowledge lives in the graph rather than in a document someone maintains, it stays current on its own. When Prime runs and verifies a new analysis, writes a report, or logs an experiment or an exception, that becomes a node in the graph straight away, wired to the measures, entities and caveats it touches, and the agent understands it the next time it is asked, with no marketer re-entering anything and no one hand-updating a semantic layer. The graph is also multimodal: it does not only hold tables and text but the creatives themselves, so an ad image, its concept, visual components, message and its performance sit together as connected knowledge rather than in five separate places.
Sitting on every piece of knowledge is a confidence level, and this is the part that quietly does the most work. A definition is either canonical, ratified or merely a candidate, and only the first two are ever allowed to produce a number. A definition the system is not yet sure of is held back and shown as a candidate rather than being silently folded into an answer. This is what lets Quinley decline to be confidently wrong, which in marketing analytics is worth more than almost any individual feature, because the expensive mistakes are not the obvious errors, they are the plausible numbers that were quietly computed the wrong way.
Finding the right knowledge, by meaning - built for insurance marketing
For the graph to be useful it has to surface the right knowledge from a question asked in everyday language, and a marketer does not phrase things the way a database is shaped. This has been one of our greatest learnings: a marketer talks about quotes and sales opportunities interchangeably, while data systems, and hence agents, interpret those vastly differently and return different answers depending on which phrase a marketer chooses to use on the day.
We handle that by giving every node a mathematical fingerprint of its meaning, a high-dimensional embedding produced with a modern embedding model, specifically applied to insurance marketing, and we are careful about what we feed it. We embed the meaning of a thing, its description and the words a person would actually use, rather than the raw SQL or the column names, because the point of that fingerprint is to bridge how a marketer talks to how the warehouse is built and the various strange ways that third party vendors have decided to store and share data.
This is a hybrid search, four independent passes run at once and fused together. The first is a classic full-text pass over the titles and descriptions. The second is a value-literal pass: it takes a phrase like a plan name or a status and resolves it through a value dictionary to the exact field and value in the data, even when the underlying column is named nothing like it. The third walks outward across the graph from the strongest matches to pull in their connected neighbors, down-weighted so corroboration never outranks a direct hit. The fourth is the semantic pass, a nearest-neighbor search over the embeddings through an HNSW index. We combine the four with reciprocal rank fusion, so a result that surfaces across several passes on its own merits rises to the top, which is far more dependable than trusting any single ranking. It is the same retrieval problem the AI coding assistants solve over a large codebase, finding the few relevant pieces among thousands by blending meaning-based and literal search, and we solve it the same way. Whatever comes back arrives self-describing, carrying its own caveats first, ahead of the numbers, so the metric never arrives without the warning that makes it easy to get wrong.
← Part 2 · Overview · Next: Trust, rails and verification →
Frequently asked questions
How is the Marketer Graph stored?
At the storage layer it is deliberately simple: three tables, one for the things we know (nodes), one for how those things connect (edges), and one that records every action the system takes (events). Nodes come in two layers - everything needed to query the data correctly, and everything the business and the agent produce as they go, such as verified analyses, experiments, exceptions and playbooks.
What is a confidence level and why does it matter?
Every piece of knowledge carries a confidence level - canonical, ratified or candidate - and only canonical or ratified definitions are ever allowed to produce a number. A definition the system is not yet sure of is held back and shown as a candidate rather than silently folded into an answer. This is what lets the agent decline to be confidently wrong, which in marketing analytics is worth more than almost any individual feature.
How does the graph find the right knowledge from a plain-language question?
Through a hybrid search: four independent passes run at once and fused with reciprocal rank fusion. A full-text pass over titles and descriptions, a value-literal pass that resolves a phrase like a plan name to the exact field and value through a value dictionary, a graph-neighbour pass that walks outward from the strongest matches, and a semantic pass that runs a nearest-neighbour search over embeddings through an HNSW index. A result that surfaces across several passes rises to the top.
What do the embeddings actually capture?
Every node gets a high-dimensional embedding produced with a modern embedding model, applied specifically to insurance marketing. We embed the meaning of a thing - its description and the words a person would actually use - rather than the raw SQL or column names, because the point is to bridge how a marketer talks to how the warehouse is built, including the strange ways third-party vendors store and share data.