Aller au contenu principal

The Well of Kvasir: When the RAG Pipeline Learned to Listen

· 9 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

How a single email from a candidate revealed four distinct failures in four distinct layers — and why Separation of Concerns was the only way to fix them

The Email That Changed Everything

Five days before the municipal elections, Florent Lardic — head of the Construire l'Avenir list — wrote to us. He had tried the comparator. He found it useful, relatively impartial. But there was a problem.

"Il y a un gros trou dans la raquette sur un sujet determinant de la campagne : nous sommes explicites sur l'ecole PLL (premier tract puis tract 6 pages, publications FB) et cela ne ressort pas."

A gap in the racket. On the most contentious issue in the campaign — the Pierre-Le-Lec school renovation — his list's explicit, detailed position was invisible. The comparator showed: "Ne mentionne pas specifiquement l'Ecole Pierre Le Lec dans son programme."

This was not a matter of bias. All four lists suffered the same blindness. The system had their documents. It had their tracts, their editorials, their Facebook publications. It simply could not find them when a citizen asked about the school.

Florent's email was a gift. Not because it was comfortable — it was not — but because it pointed, with the precision of someone who knows their own programme, to the exact place where the pipeline failed. A candidate's feedback, in civic AI, carries a specificity that no automated metric can match. He knew what his documents contained. The system should have known too.

The Trace That Told the Story

We opened the Opik trace for the Pierre-Le-Lec comparison. Two spans: rag_compare_retrieval and rag_compare_synthesis.

The retrieval span told us everything. Twelve chunks retrieved — three per list, across all four lists. All represented. But best_distance: 0.5094. above_threshold_count: 0. Not a single chunk was confidently relevant to the query.

The synthesis span told us something else: the LLM had managed to produce a structured comparison table despite the weak context. It extracted positions, arguments, sources. It filled a table with content. But it was working from echoes — fragments where "ecole" appeared near other topics, chunks where the school was mentioned in passing, not as the subject.

The paradox: the output looked reasonable. The input was hollow.

This is the most dangerous failure mode in RAG: plausible synthesis from insufficient retrieval. The citizen reads a comparison table and trusts it. They don't see that the AI was assembling fragments rather than finding the source. They don't know that Construire l'Avenir's six-page tract about the school was sitting in the vector store, unretrieved, because the query "pierre le lec" didn't land close enough in embedding space.

Born from Collective Wisdom

In Norse mythology, when the Aesir and Vanir gods ended their war, they sealed the peace by each spitting into a shared vessel. From that vessel, Kvasir was born — the wisest being alive, carrying the collective knowledge of every deity. He traveled the nine worlds answering every question put to him. No query could stump him, because his wisdom was not his own — it was everyone's, distilled into one.

We named our RAG specialist agent Kvasir because the parallel is exact. The vector store carries the collective words of all four lists, the participatory programme, six years of municipal documents. No single source contains the full picture. The answer emerges from the intersection of fragments — like wisdom emerging from the combined knowledge of the gods.

Kvasir's first task was to diagnose why the well had gone dry for Pierre-Le-Lec.

Four Layers, Four Failures

The diagnosis revealed not one problem but four — each in a different layer of the pipeline, each requiring its own fix. Separation of Concerns was not an architectural luxury. It was the only way to see clearly.

Layer 1 — Refine (pre-retrieval input quality). The query "pierre le lec" hit the vector store as-is. Three words. The embedding model mapped them to general French semantics — not to a specific school renovation project. The refiner — the cheap pre-processing step that corrects spelling and expands vague queries — had a local places gazetteer for candidate names but not for local projects. It did not know that "pierre le lec" should expand to "projet de renovation de l'ecole Pierre-Le-Lec, regroupement scolaire, programme Petites Villes de Demain."

Layer 2 — Retrieval (search depth). The compare mode retrieved 3 chunks per list. For a well-indexed topic, three is enough. For a topic where the relevant content is embedded in broader documents — a six-page tract that covers the school alongside ten other proposals — three is too few. The relevant chunk might be the fourth or fifth closest.

Layer 3 — Metrics (quality measurement). The relevance threshold was set at 0.5 — a distance below which a chunk is considered "confidently relevant." This threshold was calibrated for English content with the generic all-MiniLM-L6-v2 embedding model. French civic vocabulary, with its Breton proper nouns and administrative jargon, systematically scores higher distances. Every chunk was being marked "not confident" when many were, in fact, adequate.

Layer 4 — Ingestion (data preparation). The chunks in ChromaDB were raw text — the embedding model saw only the words in the chunk, without knowing which list it came from, what category it belonged to, or what the document's title was. A chunk about the school buried in page four of a tract had no contextual signal to help the embedding capture its topic.

Four layers. Four independent concerns. No layer could solve another's problem.

The Cheapest Fix First

TRIZ teaches Prior Action: perform the cheap correction before the expensive step, so the expensive step operates on clean input. We applied the same principle to the fix order.

Layer 1 cost nothing. We added a local places gazetteer to the refiner's prompt — the same pattern as the candidate name gazetteer that had already proven itself. "Ecole Pierre-Le-Lec: projet de renovation et regroupement scolaire, programme Petites Villes de Demain. Port d'Audierne: port de peche, criee, activite langoustiere." When the refiner detects a local place name, it expands the query with associated terms — the same way it already expands "Bosser" to "Eric Bosser (Cap sur Notre Futur)."

Layer 2 cost nothing. A default parameter change: n_per_list from 3 to 5. The caller can still override. But the default now acknowledges that comparison queries need more context per list to catch the chunks that matter.

Layer 3 cost nothing. A constant calibrated with a comment: _RELEVANCE_THRESHOLD = 0.55. The comment explains why: "Calibrated for all-MiniLM-L6-v2 on French civic content. With French-optimized embeddings, tighten back to 0.5."

Layer 4 required a re-ingestion. We added a metadata prefix to each chunk: [Programme prioritaire | ecole | Construire l'Avenir]. The embedding model now sees, in the first tokens of every chunk, what document this is, what category it belongs to, and which list produced it. The topic signal travels with the text.

171 documents. 511 chunks. Re-ingested in seconds.

The Numbers

Before: best_distance = 0.509. above_threshold_count = 0/12. Zero chunks confidently relevant across all four lists.

After the full pipeline — expanded query, deeper retrieval, calibrated threshold, enriched chunks:

ListBefore (best)After (best)Above threshold
Construire l'Avenir0.6650.4934/5
Passons a l'Action !0.6140.4305/5
S'unir pour Audierne-Esquibien0.6060.4925/5
Cap sur Notre Futur0.6180.4134/5

18 out of 20 chunks above threshold. From zero.

But numbers are abstractions. The real measure is what the citizen reads.

Before, the comparator said Construire l'Avenir does not specifically mention the school. After, it says: "Pour la renovation immediate du batiment, avec un budget de 5 millions d'euros (soit 26 euros/habitant/an). Les marches de travaux sont deja signes. Le plan de financement ne necessite pas d'augmenter les impots." With sources. With arguments from the opposition. With each list's actual position, cited and comparable.

The school appeared. Not because we added information — it was always there. But because the pipeline learned to look where the documents actually speak.

What the Well Remembers

Kvasir's first act was diagnostic, not creative. He didn't build new capabilities. He looked at the existing pipeline through the lens of Separation of Concerns and found that four layers, each slightly miscalibrated, combined to produce a blind spot.

This is the deepest lesson. A RAG pipeline is not a single system — it is a chain of transformations. Query enters. Query is refined. Query is embedded. Embedding searches the store. Results are filtered. Context is built. Context is synthesized. Each step is a layer with its own concern, its own failure modes, its own calibration.

When retrieval fails, the instinct is to upgrade the engine — bigger embeddings, fancier re-rankers, more compute. But the windshield was dirty before the engine was slow. The query was vague before the embedding was generic. The chunks were unlabeled before the distance was high.

Florent's email, precise and constructive, pointed to the symptom. Kvasir found the four causes. Separation of Concerns made the fixes independent, testable, reversible. And TRIZ ordered them by cost: fix the input, then the depth, then the measurement, then the data.

In Norse myth, Kvasir's blood was mixed with honey to create the Mead of Poetry — the sacred drink that gives anyone who tastes it the gift of knowledge. What we built today is more modest: a pipeline that, for the first time, consistently finds what the documents actually say about the most contested issue in a small commune's elections.

But the principle is the same. Knowledge exists. It is scattered, buried in PDFs and Facebook posts and six-page tracts slipped under doors. The well's job is not to create knowledge. It is to make it findable — for any citizen who asks, about any topic, from any list.

The well is open. The water is clear. And Florent's school renovation project is no longer invisible.


Related: The RAG Adventure Begins | Clean the Windshield | The Red Thread | The Gazetteer Guard