Aller au contenu principal

5 articles tagués avec « Observability »

Articles on observability practices and tools

Voir tous les tags

Forseti461 Feature Architecture: Modular Prompts with Opik Versioning

· 4 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

Today we completed a major architectural milestone: modular prompt management for Forseti461. Each feature now has its own versioned prompt in Opik, enabling independent optimization and A/B testing.

astuce

From a single monolithic prompt to a clean separation of concerns — each Forseti feature can now evolve independently while sharing a common persona.

Forseti461 Prompt v1 : Modération IA conforme à la Charte pour Audierne2026

· 8 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

Forseti461 est un agent IA qui modère automatiquement les contributions citoyennes sur les plateformes de démocratie participative — approuvant uniquement les idées concrètes, constructives et localement pertinentes, tout en rejetant les attaques personnelles, le spam, les hors-sujets ou la désinformation, et en expliquant toujours ses décisions avec des retours respectueux et actionnables.

astuce

Ce week-end, Facebook nous a rappelé que la démocratie est fragile. Commentaires toxiques, attaques personnelles et diatribes hors-sujet ont envahi les discussions sur les enjeux locaux. Le signal se perd dans le bruit. Les citoyens se désengagent. Les voix constructives abandonnent.

Et si nous pouvions protéger le débat civique à grande échelle ?

First Submission: Building a Charter Validation Testing Framework

· 3 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

Goal: Create a systematic approach to test and improve our AI-powered charter validation system.

For the Encode Hackathon first submission, we focused on building the infrastructure to ensure Forseti 461 (our charter validation agent) catches all violations reliably. The key insight: you can't improve what you can't measure.

OPIK : AI Evaluation and Observability

· 18 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

This lecture, led by Abby Morgan, an AI Research Engineer, introduces AI evaluation as a systematic feedback loop for transitioning prototypes to production-ready systems. It outlines the four key components of a useful evaluation: a target capability, a test set, a scoring method, and decision rules. The session differentiates between general benchmarks and specific product evaluations, emphasizing the need for observability in agent evaluation. It demonstrates using OPIK, an open-source tool, to track, debug, and evaluate LLM agents through features like traces, spans, 'LM as a judge', and regression testing datasets.