Skip to main content

The Conversation Loop: When Citizens Teach the AI What Matters

· 6 min read
Jean-Noël Schilling
Locki one / french maintainer

How streaming, threads, and two small buttons turned a Q&A tool into a learning system

The Question After the Question

The RAG system could answer questions. It could find documents, retrieve chunks, synthesize a response. But every question was an island. A citizen would ask about housing, get an answer, then ask "and what about the school?" — and the AI would start from zero, as if they had never spoken.

This is the gap between a search engine and a conversation. A search engine answers queries. A conversation builds understanding. And if you are building a civic assistant six days before a municipal election, understanding is what citizens need.

Memory in Three Exchanges

The fix seems simple: pass the conversation history to the LLM. But how much history? Too little, and follow-ups break. Too much, and the context window fills with old answers instead of fresh document chunks.

We settled on three exchanges — six messages. The last three question-answer pairs travel with every new query, injected between the system prompt and the current user question (with its RAG context). The retrieval still searches only on the current question. The LLM sees what was discussed, but the documents are always fresh.

This is a deliberate separation of concerns. ChromaDB finds the relevant chunks. The conversation history provides continuity. The LLM synthesizes both. Each layer does what it does best.

Streaming: Respect the Wait

Before streaming, every question meant a blank screen and a spinner. "Recherche en cours..." The citizen waits. Five seconds. Ten. Is it broken? Is it thinking? Did I ask something wrong?

Streaming changes the contract between the system and the user. The first tokens arrive within a second. The citizen reads as the AI writes. The wait becomes a conversation, not a void.

The implementation required threading streaming through four layers: the Streamlit UI calls the agent, which calls the feature, which calls the provider. Each provider — Ollama, OpenAI, Claude, Mistral — has its own streaming protocol. The failover wrapper tries each in sequence; if one fails at connection time, it moves to the next. Once a stream starts, tokens flow directly.

There is something honest about streaming. The citizen sees exactly what the AI produces, token by token. No hidden post-processing. No delayed reveal. The response is the response, unfolding in real time.

Two Buttons and a Thread

After the response streams in, two buttons appear: a thumbs up and a thumbs down.

That is the entire feedback interface. No five-star scales, no comment boxes, no "rate this on seven dimensions of helpfulness." A citizen who just read an answer about local housing policy should not need a PhD in survey design to say whether the answer was useful.

But behind those buttons, something important happens. Each click sends a score to Opik — a 1.0 or a 0.0 — tagged as a user_rating on the specific trace for that question. And every trace in a session belongs to the same thread.

Opik threads are what make this work as a learning system. A thread groups all the traces from a single conversation. In the Opik UI, you can see the full exchange: what the citizen asked, what the AI answered, which answer got a thumbs up, which got a thumbs down. The thread is the unit of conversation. The trace is the unit of judgment.

This matters because a thumbs-down on question three, after two thumbs-ups, tells you something specific. The system was doing well, then it failed. The RAG context was probably relevant (questions one and two worked), so the issue is likely in the synthesis — the prompt, the model, the way the answer was framed. Without the thread, that signal is lost in a sea of individual traces.

The Loop That Closes

Here is the methodology that emerges:

The citizen asks a question. ChromaDB retrieves relevant documents. The LLM synthesizes a response, streaming it token by token. The citizen reads, considers, and clicks a button. That click becomes a feedback score on a trace, within a thread, inside Opik.

Later — daily, weekly, whenever we choose — we can query Opik for traces with low user ratings. We can see what was asked, what context was retrieved, what was synthesized. We can look at the thread to understand the conversational flow. And we can use those insights to refine the prompts, adjust the retrieval, or identify gaps in the document corpus.

This is the continuous improvement loop made concrete. Not through automated metrics alone — though we have those too, measuring confidence and retrieval distance — but through the irreplaceable signal of a person who read the answer and decided whether it was good enough.

An automated metric can tell you the response was well-formed. Only a citizen can tell you it was helpful.

What Trust Looks Like

There is a temptation in AI development to optimize for the metrics you can measure. Response length. Latency. Confidence scores. Token counts. These are useful. They are not sufficient.

The thumbs-down from a citizen who asked about school renovation and got an answer about road maintenance — that signal carries more information than a thousand automated evaluations. It says: the system failed where it mattered, at the moment it mattered, for a person who needed it to work.

Building a civic assistant is not a machine learning problem. It is a trust problem. The citizens of Audierne-Esquibien are being asked to trust that an AI system will fairly represent what their candidates promise. That trust is earned one answer at a time, and it can be lost in a single bad response.

The conversation loop — history for context, streaming for transparency, feedback for accountability — is how we try to deserve that trust. Not by being perfect, but by being honest about what we know, responsive to what we get wrong, and committed to getting better.

The buttons are small. The commitment behind them is not.


Related: The RAG Adventure Begins | The Gazetteer Guard | Grounding AI in Reality