Aller au contenu principal

OCapistaine Application Guide

Consolidated documentation for the OCapistaine application stack

This guide replaces the deprecated ARCHITECTURE.md, FIRECRAWL_GUIDE.md, and QUICKSTART.md from docs/docs/.


Overview

OCapistaine is an AI-powered civic transparency system for local democracy in Audierne, France. It processes municipal documents and citizen contributions through a layered architecture.

Key Components:

  • Streamlit UI - Citizen-facing interface (app/front.py)
  • FastAPI - REST API for integrations (app/main.py)
  • Forseti Agent - Charter validation (app/agents/forseti/)
  • Mockup System - Testing framework (app/mockup/)
  • Crawlers - Document acquisition (src/)

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ Streamlit UI │ │ FastAPI REST │ │ N8N (Vaettir repo) │ │
│ │ (front.py) │ │ (main.py) │ │ FB / Email / Chat │ │
│ └────────┬────────┘ └────────┬────────┘ └──────────┬──────────────┘ │
└───────────┼────────────────────┼─────────────────────┼──────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Providers │ │ Logging │ │ i18n │ │ Services │ │
│ │ (LLM APIs) │ │ System │ │ (EN/FR) │ │ (orchestration)│ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ BUSINESS LOGIC LAYER │
│ ┌───────────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ AGENTS (ELv2) │ │ PROCESSORS (Apache 2.0) │ │
│ │ ┌─────────────────────────┐ │ │ ┌───────────────────────────┐ │ │
│ │ │ Forseti (Charter) │ │ │ │ Mockup Processor │ │ │
│ │ │ RAG Agent (pending) │ │ │ │ (validation testing) │ │ │
│ │ └─────────────────────────┘ │ │ └───────────────────────────┘ │ │
│ └───────────────────────────────┘ └─────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ DATA & EXTERNAL SERVICES │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Redis │ │ Firecrawl │ │ Opik │ │ Vaettir/N8N │ │
│ │ Cache │ │ (crawl) │ │ (tracing) │ │ (workflows) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Directory Structure

app/                           # Main application (replaces src/ for app code)
├── front.py # Streamlit UI entry point
├── sidebar.py # Streamlit sidebar
├── main.py # FastAPI entry point
├── i18n.py # Internationalization (EN/FR)

├── agents/ # AI Agents (ELv2 licensed)
│ ├── forseti/ # Charter validation agent
│ │ ├── agent.py
│ │ ├── features/
│ │ └── prompts.py
│ └── tracing/ # Opik tracing integration
│ └── opik.py

├── processors/ # Data processors (Apache 2.0)
│ └── mockup_processor.py # Charter testing workflow

├── mockup/ # Testing framework (Apache 2.0)
│ ├── generator.py # Contribution generation
│ ├── levenshtein.py # Text mutations
│ ├── llm_mutations.py # LLM-based mutations
│ ├── storage.py # Redis persistence
│ ├── dataset.py # Opik export
│ └── batch_view.py # Streamlit UI

├── providers/ # LLM providers (Apache 2.0)
│ ├── base.py
│ ├── gemini.py
│ ├── mistral.py
│ ├── claude.py
│ └── ollama.py

├── logging/ # Domain logging (Apache 2.0)
│ ├── config.py
│ └── domains.py

└── translations/ # i18n JSON files
├── en.json
└── fr.json

src/ # Crawling utilities (Apache 2.0)
├── config.py # Data source configuration
├── firecrawl_utils.py # Firecrawl manager
├── crawl_municipal_docs.py # Main crawler script
├── download_pdfs.py # PDF downloader
├── extract_pdf_urls.py # URL extraction
└── extract_text_from_pdfs.py # OCR/text extraction

ext_data/ # Crawled documents
├── mairie_arretes/ # ~4010 arrêtés
├── mairie_deliberations/ # Council deliberations
└── commission_controle/ # Commission documents

Quick Start

1. Environment Setup

# Install dependencies
poetry install

# Configure environment
cp .env.example .env
# Edit .env with your API keys:
# - FIRECRAWL_API_KEY
# - GEMINI_API_KEY (or other LLM)
# - OPIK_API_KEY (optional, for tracing)

2. Run the Application

# Start Streamlit UI (port 8502)
./scripts/run_streamlit.sh

# Or manually:
poetry run streamlit run app/front.py --server.port 8502

# Start FastAPI (port 8050) - optional
poetry run uvicorn app.main:app --reload --port 8050

3. Run Document Crawlers

# Dry run (preview)
poetry run python src/crawl_municipal_docs.py --dry-run

# Scrape single page (test)
poetry run python src/crawl_municipal_docs.py --source mairie_arretes --mode scrape

# Full crawl
poetry run python src/crawl_municipal_docs.py --source mairie_arretes --mode crawl --max-pages 100

Document Crawling (Firecrawl)

Data Sources

SourceURLDocuments
mairie_arretesaudierne.bzh/publications-arretes/~4010
mairie_deliberationsaudierne.bzh/deliberations-conseil-municipal/~3965
commission_controleaudierne.bzh/systeme/documentheque/?documents_category=49Variable

Commands

# Test API connection
poetry run python examples/simple_scrape.py

# Explore structure (single page)
poetry run python src/crawl_municipal_docs.py --source mairie_arretes --mode scrape

# Limited crawl (validate)
poetry run python src/crawl_municipal_docs.py --source mairie_arretes --mode crawl --max-pages 10

# Full crawl (production)
poetry run python src/crawl_municipal_docs.py --source all --mode crawl --max-pages 500

Output Structure

ext_data/<source>/
├── *.md # Markdown content
├── *.html # HTML content
├── *_metadata.json # Page metadata
├── index_<timestamp>.md # Crawl index
├── crawl_metadata_<timestamp>.json
└── errors.log

PDF Processing

# Download PDFs from crawled metadata
poetry run python src/download_pdfs.py

# Extract text from PDFs
poetry run python src/extract_text_from_pdfs.py

Opik Integration (Dual Tracing)

OCapistaine uses dual Opik tracing for complete observability:

┌─────────────────────────────────────────────────────────────────────────┐
│ OPIK DUAL TRACING │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ OCapistaine App │ │ Vaettir (N8N Workflows) │ │
│ │ (Python Tracing) │ │ (Workflow Tracing) │ │
│ ├─────────────────────────┤ ├─────────────────────────────────┤ │
│ │ • Forseti agent calls │ │ • Facebook integrations │ │
│ │ • LLM provider requests │ │ • Email processing │ │
│ │ • Mockup validation │ │ • Chatbot workflows │ │
│ │ • Charter evaluation │ │ • Cross-channel orchestration │ │
│ └───────────┬─────────────┘ └───────────────┬─────────────────┘ │
│ │ │ │
│ └──────────────┬──────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Comet ML / Opik Dashboard │ │
│ │ (ocapistaine-dev workspace)│ │
│ └─────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘

Why Dual Tracing?

SourceWhat It TracesUse Case
App (Python)Agent logic, LLM calls, validationDebug prompt performance, measure accuracy
Vaettir (N8N)Multi-channel workflows, integrationsTrack citizen journey, FB→Agent→Response

Both feed into the same Opik workspace (ocapistaine-dev) for unified observability.

App-Side Configuration

# .env
OPIK_API_KEY=your_comet_api_key
OPIK_WORKSPACE=ocapistaine-dev

App-Side Usage

from app.agents.tracing.opik import OpikTracer

# Automatic tracing via decorator
@OpikTracer.track
def process_contribution(text: str):
# Your logic here
pass

# Or with spans for detailed tracing
with OpikTracer.span("charter_validation") as span:
result = forseti.validate(contribution)
span.log_output({"valid": result.is_valid})

Location: app/agents/tracing/opik.py

Vaettir-Side Integration

Vaettir (N8N) has its own Opik integration for workflow tracing:

  • Repo: github.com/locki-io/vaettir
  • Traces: FB messages, email, chatbot interactions
  • Correlation: Workflow ID links to app traces

Unified Dashboard

Access all traces at: comet.com → ocapistaine-dev workspace

ViewShows
TracesAll LLM calls from both sources
DatasetsExported mockup contributions
ExperimentsPrompt optimization results
MetricsLatency, accuracy, cost

Mockup Testing System

The mockup system tests Forseti's charter validation with controlled variations.

Generate Test Contributions

from app.mockup.generator import MockContribution, generate_variations

# Create base contribution
base = MockContribution(
text="Je propose d'améliorer l'éclairage du port",
category="environnement",
expected_valid=True
)

# Generate variations
variations = generate_variations(base, count=10)

Run Validation Tests

Access via Streamlit UI → "Mockup" tab, or programmatically:

from app.processors.mockup_processor import MockupProcessor

processor = MockupProcessor()
results = processor.run_batch_validation(contributions)

Export to Opik

from app.mockup.dataset import export_to_opik

export_to_opik(contributions, dataset_name="forseti-charter-2026-01-28")

See MOCKUP.md for detailed documentation.


Internationalization (i18n)

The UI supports English and French.

Usage

from app.i18n import get_text, set_language

set_language("fr") # or "en"
text = get_text("welcome_message")

Adding Translations

Edit app/translations/{lang}.json:

{
"welcome_message": "Bienvenue sur Ò Capistaine",
"submit_button": "Soumettre"
}

See I18N.md for details.


Logging System

Domain-based logging with rotation.

Domains

DomainPurpose
presentationUI events
servicesBusiness logic
agentsAI agent activity
processorsData processing
providersLLM API calls
adaptersExternal integrations
dataData access

Usage

from app.logging import get_logger

logger = get_logger("agents")
logger.info("Forseti validation started", extra={"contribution_id": "123"})

See LOGGING.md for details.


Environment Variables

# Required
FIRECRAWL_API_KEY=your_key # Document crawling

# LLM Providers (at least one)
GEMINI_API_KEY=your_key
MISTRAL_API_KEY=your_key
OPENAI_API_KEY=your_key

# Optional
OPIK_API_KEY=your_key # Tracing (or use Vaettir)
OPIK_WORKSPACE=ocapistaine-dev
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=5

Running in Production

Docker Compose

version: "3.8"
services:
streamlit:
build: .
ports:
- "8502:8502"
environment:
- REDIS_HOST=redis
depends_on:
- redis

redis:
image: redis:7
ports:
- "6379:6379"

ngrok (Demo/Hackathon)

# Start ngrok tunnel
./scripts/start_ngrok.py

# Access at: ocapistaine.ngrok.app

License Split

ComponentLicensePrize Eligible
app/providers/Apache 2.0Yes
app/logging/Apache 2.0Yes
app/mockup/Apache 2.0Yes
app/processors/Apache 2.0Yes
app/i18n.pyApache 2.0Yes
src/ (crawlers)Apache 2.0Yes
app/agents/ELv2No
app/agents/forseti/ELv2No

See the COLLABORATION_ADDENDUM.md file in the repository root for hackathon prize distribution rules.


DocumentDescription
Forseti Agent](../agents/forseti/ARCHITECTURE.md) - Charter validation agentForseti agent features and adding new features
Prompt ManagementPrompt registry and Opik integration
AUTO_CONTRIBUTIONS.mdAuto-contribution workflow
MOCKUP.mdMockup testing system details
LOGGING.mdLogging system guide
I18N.mdInternationalization guide
STREAMLIT_SETUP.mdStreamlit configuration
sovereignty/rag-document-storage.mdData sovereignty & storage architecture (sovereign)

Opik & Optimization

DocumentDescription
opik/EXPERIMENT_WORKFLOW.mdPrompt optimization experiments
opik/CONTINUOUS_IMPROVEMENT.mdMethodology for AI feature improvement

Scheduler & Tasks

DocumentDescription
scheduler/README.mdAPScheduler integration
scheduler/TASK_BOILERPLATE.mdTask implementation guide
scheduler/tasks/TASK_OPIK_EVALUATE.mdScheduled Opik evaluation task

Troubleshooting

Firecrawl Issues

# Check API key
echo $FIRECRAWL_API_KEY

# Test connection
poetry run python examples/simple_scrape.py

# Check errors
cat ext_data/mairie_arretes/errors.log

Redis Connection

# Verify Redis is running
redis-cli ping

# Check connection
poetry run python -c "from app.data.redis_client import get_redis; print(get_redis().ping())"

Streamlit Port Busy

# Kill existing process
./scripts/run_streamlit.sh # Auto-kills port 8502

Migration Notes

From src/ to app/

The src/ directory now contains only crawling utilities. Application code has moved to app/:

Old LocationNew Location
src/agents/app/agents/
src/services/app/services/
src/processors/app/processors/
src/config.pyapp/providers/config.py (for app), src/config.py (for crawlers)

Opik Changes

Opik is now primarily used via Vaettir (N8N workflows). Local tracing remains in app/agents/tracing/opik.py but is secondary.


Last updated: 2026-01-28