Aller au contenu principal

When the Bottleneck Moves: From Chunking to Anonymization

· 5 minutes de lecture
Jean-Noël Schilling
Locki one / french maintainer

How TRIZ and Theory of Constraints guided us from a 120-second timeout to a zero-LLM first pass

The Constraint That Came Back

Two weeks ago, we celebrated. A 58,880-character municipal transcript had been choking our pipeline — five chunks, five timeouts, zero themes extracted. The fix was textbook Theory of Constraints: identify the bottleneck (local model inference speed), exploit it (reduce chunk size from 15k to 8k characters), subordinate everything else. Themes started flowing.

Then Step 5 of the Five Focusing Steps whispered what it always whispers: repeat — find the new constraint.

The new constraint was sitting right where we'd left it. Anonymization. The same 58k document that now chunked beautifully for theme extraction was still being sent whole to a 7B model for PII detection. One shot. 58,000 characters. 120-second timeout. Same bottleneck, different feature.

The Contradiction

TRIZ teaches us to name contradictions precisely before trying to resolve them. Here's ours:

IF we use an LLM for anonymization THEN we get context-aware PII detection (names vs. institutions, addresses vs. place names) BUT large documents exceed the inference budget and timeout

The physical contradiction underneath is sharper:

The anonymization system must be intelligent (to distinguish "Jean Dupont" from "Dupont SA") AND must be fast (to handle 58k characters without timeout)

The first instinct — the one we resisted — was to apply the same solution that worked for theme extraction: segment the document into chunks, anonymize each chunk, reassemble. But anonymization isn't theme extraction. A name that appears in chunk 3 needs the same placeholder as in chunk 1. Entity consistency across chunks requires either a shared state machine or a merge-and-deduplicate pass. The complexity is real, and the failure mode is worse: inconsistent anonymization is worse than no anonymization.

Segmentation was the wrong principle this time.

Prior Action

TRIZ Principle #10 — Prior Action — says: perform the required action in advance, either fully or partially.

The insight: we already had an NLP-based PII detector sitting unused in our codebase. Opik's Presidio guardrail runs locally, uses named entity recognition instead of generative inference, has no context window limit, and processes text in milliseconds. We'd built validate_no_pii() months ago as a post-processing check. It was never called.

What if we stopped treating NLP detection as validation and started treating it as the first pass?

Document (58k chars)

[Presidio NER] ← no LLM, no context limit, 200ms

Entities detected: Jean Dupont, [email protected], 06 12 34 56 78

[Deterministic replacement] ← [PERSONNE_1], [EMAIL_1], [TELEPHONE_1]

Pre-anonymized text (58k chars, PII replaced)

[LLM enrichment] ← CONDITIONAL: skip if PII found AND text > 15k chars

The LLM doesn't disappear. It becomes optional. For a 500-character citizen comment, the LLM still runs — it distinguishes "La Mairie" (keep as keyword) from "Marie Dupont" (anonymize) with nuance that NER can't match. But for the 58k transcript that was timing out? The NLP pass handles the bulk. The LLM is subordinated to the constraint rather than fighting it.

The Five Steps, Again

StepBefore (broken)After (PII-first)
IDENTIFYLLM inference on large docs times out at 120sSame constraint, same root cause
EXPLOITUse existing NLP guardrail as first pass (0ms → 200ms, $0.00)
SUBORDINATELLM was the only pathLLM becomes enrichment, gated by document size
ELEVATEWould require bigger model or longer timeoutNLP pass eliminates the need
REPEATAnonymization was the constraintNow: keyword extraction quality (NLP can't distinguish institutions from PII)

The key insight from ToC: you don't always elevate the constraint. Sometimes you subordinate around it so thoroughly that it stops being the constraint at all.

What the Ideal Looks Like

TRIZ's Ideal Final Result asks: what if the system performed its function without existing?

For large documents, the LLM anonymizer now doesn't exist. Its function — PII detection and replacement — is performed by the NLP pass. The document arrives, entities are detected in 200ms, placeholders are applied deterministically, and theme extraction proceeds on clean text. No inference queue. No timeout. No API cost.

For small documents, the LLM still earns its keep. It extracts keywords that feed theme extraction. It understands that "Audierne" is a place to preserve, not a name to redact. The two systems complement rather than compete.

The metadata tells the story: anonymization_type: "pii" means the NLP pass handled everything. anonymization_type: "pii+llm" means both contributed. The pipeline doesn't care which path it took — it just needs clean text.

The Pattern Behind the Pattern

What interests me most isn't the technical fix. It's the rhythm.

We solved chunking for theme extraction. The bottleneck moved to anonymization. We solved anonymization with a different principle (Prior Action instead of Segmentation). The bottleneck will move again — probably to keyword quality, since NLP can't extract the semantic keywords that LLMs do.

Theory of Constraints says this is the natural order. You never "finish" — you cycle. Each resolution reveals the next constraint. The system gets faster, cheaper, more capable, but it never reaches equilibrium. It's a ratchet, not a destination.

TRIZ says something subtler: the type of contradiction changes. The first time, it was a technical contradiction (accuracy vs. speed). This time, it was a physical contradiction (the system must be intelligent AND fast). Next time, it might be an organizational contradiction (who validates the anonymization quality?).

The methodology doesn't give you the answer. It gives you the question to ask. And the question, asked precisely, contains half the answer already.


Code reference: app/mockup/anonymizer.py (detect_pii_entities, apply_pii_replacements), app/mockup/field_input.py (_apply_anonymization PII-first flow)

Related: The Anonymization Trilemma | Reliability Without the Cloud Tax