The Order of Things: When Neutrality Needs a Shuffle
How a WhatsApp message revealed a systemic bias we didn't intend — and why open source made the fix possible in hours
The Message
It arrived mid-afternoon, between two meetings. A citizen — sharp, technically literate, with thirty minutes of break time — had noticed something we hadn't.
"C'est normal que Lardic soit toujours cité en premier sur votre comparatif ?"
She'd clicked through several themes in the comparison tool. Each time, the same list appeared first: Construire l'Avenir. Then Guillon. Then her own list. Then Bosser. Always the same order. Even after fifteen minutes of inactivity. Even after clearing the page.
"Si vous voulez mettre en avant Lardic, faites le, mais ne vous présentez pas comme neutre car ce n'est pas honnête."
She was right. Not about the intent — there was none — but about the effect. A tool that claims neutrality but presents one list first, every time, in every comparison, is not neutral. Perception matters as much as principle. And in the two weeks before a municipal election, it matters more.
The Chain
Tracing the cause took less time than the conversation.
Since Python 3.7 (2018), dictionaries preserve insertion order — what was a CPython 3.6 implementation detail became a language guarantee. When we defined the four electoral lists in the configuration, we wrote them in a particular order — alphabetical by internal slug: ca, paa, spae, csnf. That order propagated, unchanged, through every layer:
- The UI extracted dictionary keys as a list:
["ca", "paa", "spae", "csnf"] - The retrieval layer queried ChromaDB for each list, in that order
- The context builder concatenated excerpts, list by list, in that order
- The LLM received the prompt with Construire l'Avenir's excerpts first
LLMs tend to present information in the order they receive it. The model wasn't biased. The prompt was ordered. And the order was set the day someone typed a Python dictionary literal.
No one chose this. No one noticed it. That's the point.
The Fix
The fix lives in two places, belt and suspenders.
In the code (app/agents/ocapistaine/features/compare.py): before building the context that the LLM will read, the list order is shuffled with random.shuffle(). Every request gets a different permutation. Four lists produce 24 possible orderings, all equally likely.
In the prompt (app/rag/prompts.py): an explicit instruction tells the model to follow the order of the extracts it receives, rather than imposing its own. The LLM doesn't choose who goes first — the dice do.
Total cost: zero. No model change, no re-indexing, no infrastructure. Two lines of random.shuffle() and one sentence in the system prompt.
Why Open Source Matters Here
The citizen who flagged this didn't have time to read the code — she said so herself. But she knew the code was there. That's the difference between "trust me, I'm neutral" and "here's how it works, check for yourself."
The prompts that instruct the AI are open source. The retrieval logic is open source. The configuration that defines which lists exist and how they're compared is open source. Anyone — a candidate, a journalist, a citizen with thirty minutes — can trace the exact path from question to answer.
Where to look
For those who want to verify — or who simply want to understand how the tool treats their programme:
| What | Where |
|---|---|
| Compare prompt (what the AI is told) | app/rag/prompts.py — the COMPARE_SYSTEM_PROMPT variable |
| Synced prompts (production version) | app/prompts/local/ocapistaine_rag.json — ocapistaine.compare_system |
| Retrieval logic (how documents are fetched) | app/rag/retrieval.py — search_compare() function |
| Context builder (how results are assembled) | app/agents/ocapistaine/features/compare.py — the shuffle + concatenation |
| Agent persona (core neutrality principles) | app/agents/ocapistaine/prompts.py — "Neutralité absolue entre les listes" |
| Query refinement (how questions are pre-processed) | app/prompts/local/ocapistaine_rag.json — ocapistaine.refine_system |
All files are in the OCapistaine repository, branch dev.
The information pipeline, step by step
When a citizen asks a comparison question, here is exactly what happens:
-
Refine — A cheap LLM call corrects spelling, resolves proper nouns (e.g., "van praet" becomes "Van Praët"), and detects the thematic category. This uses a gazetteer of known candidate names.
-
Retrieve — ChromaDB is queried independently for each list. The same question is searched against each list's indexed programme documents. Results come back with a distance score measuring semantic relevance.
-
Shuffle — The list order is randomised. No list is systematically first.
-
Build context — Excerpts from each list are assembled into a single prompt, with clear section headers per list.
-
Synthesise — The LLM (currently Mistral) reads all excerpts and produces a structured comparison, following the prompt's rules: neutral, factual, no sources in text, structured format.
-
Display — The response is shown to the citizen, with source documents listed separately below.
The model never sees which list "we" prefer — because we don't.
The Data Gap
The shuffle fixes the order. It doesn't fix the depth.
Our ChromaDB collection holds 511 chunks. The distribution is not even:
| List | Chunks | Share |
|---|---|---|
| Reference documents | 392 | 77% |
| Passons à l'Action | 55 | 11% |
| Construire l'Avenir | 31 | 6% |
| S'unir pour Audierne-Esquibien | 27 | 5% |
| Cap sur Notre Futur | 6 | 1% |
A list with 6 chunks will produce thinner answers than one with 55, regardless of order. This isn't bias — it's data availability. Lists that haven't published their full programme yet will naturally have less material to compare.
The solution is simple: publish the programme, and we'll index it. The ingestion pipeline is ready. The well is open.
The Lesson
Building a neutral tool is not a one-time decision. It's a continuous discipline. You can write "neutralité absolue" in your system prompt and mean it — and still have a dictionary literal undermine it.
The citizen who noticed wasn't looking for bad faith. She was looking for rigour. That's the kind of scrutiny a civic tool should welcome, not fear.
We fixed it in hours. Not because the problem was simple — ordering bias in information presentation is a well-studied phenomenon in cognitive science — but because the architecture was transparent enough that tracing cause to effect was straightforward.
The prompts are open. The code is open. The data pipeline is documented. And now, the order is random.
As it should have been from the start.
Related: The Well of Kvasir | Ò Capistaine, My Capistaine | The Lighthouse Manifesto
