Skip to main content

5 posts tagged with "Observability"

Articles on observability practices and tools

View All Tags

Forseti461 Feature Architecture: Modular Prompts with Opik Versioning

· 4 min read
Jean-Noël Schilling
Locki one / french maintainer

Today we completed a major architectural milestone: modular prompt management for Forseti461. Each feature now has its own versioned prompt in Opik, enabling independent optimization and A/B testing.

tip

From a single monolithic prompt to a clean separation of concerns — each Forseti feature can now evolve independently while sharing a common persona.

Forseti461 Prompt v1: Charter-Proofing AI Moderation for Audierne2026

· 7 min read
Jean-Noël Schilling
Locki one / french maintainer

Forseti461 is an AI agent that automatically moderates citizen contributions to participatory democracy platforms — approving only concrete, constructive, locally relevant ideas while rejecting personal attacks, spam, off-topic posts, or misinformation, and always explaining decisions with respectful, actionable feedback.

tip

This weekend, Facebook reminded us that democracy is fragile. Toxic comments, personal attacks, and off-topic rants flooded discussions about local issues. The signal gets lost in the noise. Citizens disengage. Constructive voices give up.

What if we could protect civic discourse at scale?

First Submission: Building a Charter Validation Testing Framework

· 3 min read
Jean-Noël Schilling
Locki one / french maintainer

Goal: Create a systematic approach to test and improve our AI-powered charter validation system.

For the Encode Hackathon first submission, we focused on building the infrastructure to ensure Forseti 461 (our charter validation agent) catches all violations reliably. The key insight: you can't improve what you can't measure.

OPIK : AI Evaluation and Observability

· 18 min read
Jean-Noël Schilling
Locki one / french maintainer

This lecture, led by Abby Morgan, an AI Research Engineer, introduces AI evaluation as a systematic feedback loop for transitioning prototypes to production-ready systems. It outlines the four key components of a useful evaluation: a target capability, a test set, a scoring method, and decision rules. The session differentiates between general benchmarks and specific product evaluations, emphasizing the need for observability in agent evaluation. It demonstrates using OPIK, an open-source tool, to track, debug, and evaluate LLM agents through features like traces, spans, 'LM as a judge', and regression testing datasets.