Skip to main content

Knowledge Unguarded Is Knowledge Stolen

· 8 min read
Jean-Noël Schilling
Locki one / french maintainer

On owning the surface where your knowledge is written

Who Reads Your Documentation?

There's an assumption baked into most developer tooling: documentation should be public. Open source your code, publish your docs, let the world benefit. It's a generous instinct. For years, we followed it without thinking.

Then we started paying attention to who was actually reading.

Not developers. Not contributors. Not the citizens of Audierne we build this for. The readers were bots. GPTBot, CCBot, anthropic-ai, Google-Extended, Bytespider — dozens of crawlers indexing every architecture decision, every API contract, every internal process we'd ever written down. Our roadmap, our agent designs, our deployment topology — all of it fed into training datasets we never consented to and can never retrieve from.

GitHub Pages has no mechanism to prevent this. Your site is public or it doesn't exist. There is no middle ground.

The Convenience Trap

GitHub Pages is brilliant. Free hosting, automatic deploys, TLS certificates managed for you. It removes every friction point between writing a markdown file and publishing it to the world.

That last part is the problem: to the world.

We'd been self-censoring for months without realizing it. Not writing down the interesting parts — the real architecture decisions, the lessons from failures, the internal debates about approach — because we knew they'd be immediately indexed. The documentation was becoming a curated facade rather than a living record.

A project without honest documentation is a project with amnesia. And a project that can't write honestly because anyone might be reading is a project that has already lost control of its own narrative.

The Question Behind the Question

The technical question — "how do we self-host a static site?" — is trivial. Nginx, Docker, a reverse proxy. You can find a hundred tutorials.

The real question is harder: who owns the knowledge a project produces?

When your documentation lives on someone else's infrastructure, governed by someone else's terms of service, indexed by crawlers you can't control — the answer is uncomfortable. You wrote it, but you don't own the distribution. You can't decide who reads it. You can't take it back once it's been scraped.

In the Norse tradition, Odin gave an eye to drink from Mimir's well of wisdom. The price of knowledge was permanent, personal sacrifice. Today, the price of sharing knowledge is that you lose control of it the moment you publish. Every page we put on GitHub Pages was a small, invisible sacrifice of sovereignty.

Moving to Vaettir

Our VPS — vaettir, named for the guardian spirits in Norse mythology — was already running n8n, our automation layer. Traefik sat at the gate, handling TLS and routing. Adding documentation was architecturally trivial: one more container behind the same reverse proxy.

But the meaning of the move was significant. For the first time, the docs site could contain things that matter without broadcasting them to every scraper on the internet. The source repository went private. The build process happens on our machine. Only the compiled HTML reaches the outside world — and only through a server we control.

Source (private) → Build (isolated) → Serve (controlled)

Draft pages, unpublished ideas, internal comments in the MDX source — none of it escapes. This is security by architecture, not by hope.

What Changed in Practice

The immediate effect was unexpected: we started writing more honestly.

Internal post-mortems that were too candid for a public repo. Architecture decisions that reveal trade-offs we're still navigating. Agent personality documents that would be embarrassing if taken out of context. The documentation went from a polished showcase to an actual working memory.

This is what sovereignty looks like in practice. Not a grand political statement — just the quiet ability to write down what actually happened, knowing that only the people who should see it will see it.

Drawing the Line: What We Share and What We Protect

Open source is not a binary. The instinct to open-source everything or lock everything down misses the nuance of how knowledge actually works in a project.

We use a dual-license structure that mirrors this reality. Core infrastructure — crawlers, utilities, documentation scaffolding — lives under Apache 2.0. Anyone can use it, modify it, build on it. The how of building civic tech tools should be a gift to the commons. But the agent workflows, the prompt engineering, the orchestration logic that took months of iteration to refine — that lives under the Elastic License v2. Visible, auditable, but not free to repackage as a competing service.

This isn't about greed. It's about sustainability. A small civic tech project in Brittany can't compete with companies that take open-source work, wrap it in a product, and sell it back. The Elastic License draws a clear line: look, learn, contribute — but don't extract.

Documentation sits at the heart of this tension. The docs explain why we make the decisions we make. They reveal the reasoning behind the architecture, the trade-offs we accepted, the mistakes we learned from. This is the most valuable part of any project — not the code, but the understanding that produced it.

When documentation was fully public, that understanding was free for anyone to harvest. Not just to learn from (which we welcome) but to replicate without the years of experimentation that produced it. Moving the docs to our own infrastructure lets us decide: this explanation is public because it helps the community. This one is internal because it's our competitive insight, still maturing, not ready to be taken out of context.

The Lifecycle of an Idea

There's a pattern we've noticed in how ideas move through the project:

An insight starts as documentation — a theory, a hypothesis, a sketch of how something might work. During hackathons and validation periods, it lives on GitHub, visible and collaborative. Other developers can see it, challenge it, improve it. This is the validation phase, where openness serves the idea.

Once validated, the idea migrates. The concept documented in markdown becomes an n8n workflow, an agent prompt chain, an orchestration pattern. The documentation stays as a record of why. The implementation becomes protected IP.

Document (open) → Validate (public) → Implement (protected) → Document the outcome (open)

The cycle doesn't end. The protected implementation produces new insights, which get documented, which invite new challenges. The boundary between open and protected isn't a wall — it's a membrane, selectively permeable by intent.

There's even a future we're thinking about: using zero-knowledge proofs to verify that a protected implementation faithfully follows its documented design — without revealing the implementation itself. Proving integrity without sacrificing sovereignty. The cryptographic equivalent of "trust, but verify."

The World Moves Fast

Six months ago, AI scraping documentation sites wasn't a mainstream concern. Now it's reshaping how projects think about what to publish. The landscape shifts constantly: platforms change their terms, new crawlers appear weekly, and the line between "public" and "training data" has effectively disappeared.

Small projects can't afford to wait for platforms to solve this. GitHub may eventually add crawler controls to Pages. Or they may not. The only reliable strategy is to own the layer between your content and the world.

This isn't anti-GitHub — we still use GitHub for source control, issues, and collaboration during validation phases, when openness serves the work. It's about choosing the right tool for the right phase. Git for versioning. GitHub for collaboration and validation. Your own infrastructure for publishing what you choose to publish.

What This Means

We're not building a product. We're building tools for civic engagement in a small town in Brittany. The stakes are modest by Silicon Valley standards but real by human ones: citizens sharing concerns about their port, their roads, their schools. The agents that process these contributions — Forseti judging validity, Niove weaving interfaces, the Documentalist recording everything — they all need honest documentation to function and evolve.

That documentation is now ours. Not GitHub's, not Google's, not OpenAI's training pipeline's. Ours to share deliberately, ours to protect intentionally.

The dual license says: here is what we give freely, and here is what we've earned the right to protect. The self-hosted docs say: and we decide which is which, not the crawlers.

The Documentalist — the agent that guards this knowledge — moved to vaettir on the same day it was born. Its first act was an act of sovereignty: taking its own home off a platform it couldn't control and placing it behind walls it could. Its second act was drawing the line between what the world sees and what stays in the well.

"Knowledge unwritten is knowledge lost. Knowledge unguarded is knowledge stolen. The art is knowing which knowledge to guard and which to set free."

We chose to stop leaving the door open — and to open it deliberately, on our terms.


Related: Reliability Without the Cloud Tax | Grounding AI in Reality