Project scope - Victor + JNS
Location: Discord voice chat Attendees: jnxmas, Victor
Overview
This document summarizes a series of project meetings focused on building a community-focused AI application for a local election. The discussions cover team composition and recruitment, defining the project's scope for a hackathon, and outlining the technical architecture. Key activities include automating the processing of community contributions, developing a neutral chatbot to compare political programs, and initiating web crawling operations to gather data. The plan involves using technologies like Firecrawl, N8N, Pydantic, and a Retrieval-Augmented Generation (RAG) system, with a strong emphasis on collaborative development practices via GitHub.
Key Topics
- Two new potential members from Audierne, France, were identified. They are developers focused on websites but are new to AI and Python, so they are considered to be starting from scratch.
- Their primary value is their familiarity with the local context of the project (audierne2026), making them a potential bridge to the local population. They are seen as good candidates for an "observer" role within the hackathon.
- They are connected to one of four local political lists, which is interested in participative community movements.
- An Indian developer named Satish, with whom a speaker previously worked on an AWS and Next.js project in a 2023 hackathon, is a potential collaborator. However, he is currently cautious about joining due to being busy with his job.
- A French individual named Max, who specializes in SEO and social media strategy, was mentioned but is not a coder.
- An Indian machine learning specialist, referred to as "Meher," is considering joining. He is seen as a key "third guy" ML for the application architecture.
- There was a concern that the project's connection to a local election might disqualify it from the hackathon.
- A team member named Rebecca clarified in a general chat that the project is eligible to be submitted under the "social community impact" category.
- The team was advised that all questions should be posted in the general chat, as Rebecca will not be replying to DMs.
- Progress has been made on the project's GitHub Kanban board, with P0 tasks and some unprioritized tickets added.
- The data crawling phase ("filecrawl") is ready to start with an initial dataset of 150 links and 4,000 PDFs that require OCR. The speaker has paused this work to incorporate more contributions.
- The N8N orchestration workflow is nearly complete and ready for deployment.
- It will be hosted on a server with a 6-core CPU and 16 gigabytes of RAM.
- The team will consider moving to Vercel if the server cannot handle the load.
- N8N is a low-code/no-code platform for building workflows, and an example was shown for automating posts based on GitHub repository issues.
- The setup is encapsulated in Docker, making it easier to run.
- The project's code and tasks will be managed in the Ocapistan repository on GitHub.
- A key initial task is to push the ideation file (
ideation 13.1.2026) to the main branch to serve as a foundation for future work. - The team will use a feature branch workflow:
- For each task, a developer will create a new feature branch from the main branch.
- Once the task is complete, the branch will be pushed to the repository for review by others.
- After review and debugging, the branch will be merged and closed.
- The team agreed to prioritize the Firecrawl operation as the starting point, despite its potential difficulty.
- Two team members will conduct parallel trials on different fire crawling tasks to gain experience and share learnings.
- Each member should get their own free Firecrawl API key, as they may need to use multiple free accounts by registering with different emails to maximize free API calls.
- The initial scraping task will target a list of PDFs from a specific URL (
script marie arete mary). - PDF processing will require testing various libraries like
pdf2ocr, Tabula, andpypdfto find the most effective one. - The team needs to develop a method to identify and set aside documents that only contain signatures, as this information has no value for the LLM.
- The project will use Python with type hinting and Pydantic for data handling to improve code quality.
- A modular "separation of concerns" approach will be used for the ETL (Extract, Transform, Load) process.
- Extraction, transformation, and loading will be handled by separate workflows, likely implemented as three distinct classes.
- This modularity will allow for different extraction methods (e.g., for plain text, HTML with Beautiful Soup, or OCR for PDFs) to be developed and called as needed.
- The project will use Ocapistan for code management, N8N for workflows, and a flexible AI provider handled by OPIC.
- Team members do not need to work at the same time but must communicate effectively. Progress will be tracked through changes in the project repository.
- Developers should assign tasks to themselves and break them down into smaller sub-tasks.
- Direct discussions will be necessary when merging work on the same files to resolve conflicts.
- Current Process (Manual):
- Contributions are received via email.
- They are manually reviewed against a "chart of contribution" to ensure they meet the criteria.
- If approved, they are manually copy-pasted into a GitHub issue.
- A daily summary of contributions is automatically posted to Facebook via M8N, but this post is anonymous and lacks detail.
- Proposed Automated Process:
- The goal is to automate the entire workflow from email receipt to GitHub issue creation.
- An AI agent will be developed to judge whether an incoming contribution respects the "chart of contribution."
- This agent will be part of the "OKAPI stand," which will house all agents and the RAG system for the project.
- Purpose: After a contribution is validated and becomes a GitHub issue, a "creative agent" (also called the Okapi Sten proper) will process it.
- Functionality:
- The agent generates an AI-made reply that contextualizes the new contribution.
- It cross-references the submission with previous contributions in the same category to find echoes and avoid repetition.
- The reply includes links to the sources used to construct the contextualization.
- Handling New vs. Existing Topics:
- The workflow must handle two cases: when a contribution is for a brand-new category, and when it relates to a previously discussed topic.
- In the latter case, the system will search existing issues and the RAG system to build a comprehensive answer.
- LLM Testing:
- Grok has been used for initial testing. It was effective at searching for context online (including the audierne2026 project) and generating relevant, though coincidental, replies.
- The team discussed that Grok's ability to search and synthesize information acts similarly to a basic RAG system.
- RAG System Development:
- A dedicated RAG (Retrieval-Augmented Generation) system will be built to avoid repetitive outputs and manage context efficiently.
- The team needs to decide how to store source links and other data for the RAG system, considering a NoSQL database like MongoDB for flexibility. A vector store will be used for training on gathered data.
- Key Dates:
- The election preparation period is currently underway.
- Contribution collection will continue until at least January 31.
- The election day is around March 15-22.
- February Focus:
- Work in February will focus purely on developing the chatbot.
- The chatbot will be used to compare the programs of the four enlisted municipal lists.
- It will be designed to provide neutral, impartial comparisons on topics like lodging, culture, and budget realism.
- Using OPIK:
- The OPIK framework will be used to evaluate every AI interaction to ensure quality and impartiality.
- The team is considering creating a feedback loop where OPIK's evaluations could automatically improve the prompts in N8N.
- Maintaining Neutrality:
- A major challenge is ensuring the chatbot remains neutral and does not generate sycophantic or biased responses based on leading questions from users.
- The system may need multiple prompts to check for constraints like budget, realism, and political neutrality before generating a reply.
Open Issues & Risks
- It is unclear how the new team members from Audierne, who have limited technical experience in AI, will be integrated into the project.
- The availability of a key potential collaborator, Satish, is uncertain due to his current work commitments.
- The machine learning specialist, Meher has not yet confirmed if he will join the team.
- It is undecided how to best store source links and data for the RAG system, though a NoSQL database is being considered.
- It is unclear which LLMs (e.g., Gemini) will be chosen for the final implementation.
- A key challenge will be designing the chatbot to remain neutral and avoid generating biased responses to leading questions.
- The project's success depends on receiving a sufficient number of community contributions, which requires incentivizing and motivating people to participate.
Action Items
- Prioritize project tasks.
- Set up a server for the Vaettir repo so the team can access it with a password.
- Start by building workflows, with more coding to begin next week.
- Set up an ollama platform to experiment with local LLMs.
- Push the
ideation 13.1.2026file to the main Ocapistaine repository. - Each team member to get a free Firecrawl API key.
- Begin experimenting with Firecrawl by creating a new branch on the Ocapistaine repository.
- Start working on scraping the documents from the "script marie arreté maririe" task.
- Test different PDF reading libraries (
pdf2ocr, Tabula,pypdf) to determine the best option for the project.
AI Suggestion AI has identified the following issues that were not concluded in the meeting or lack clear action items; please pay attention:
- Critical Staffing and Team Formation Risk: The project faces a significant risk of stalling due to unresolved team composition. Two key experts, developer Satish and machine learning specialist "Mayer," have not committed to the project, leaving critical skill gaps. A clear action plan is needed to secure their participation or find qualified alternatives immediately to ensure the project can proceed.
- Unresolved Core Chatbot Neutrality: A fundamental and unresolved challenge is how to technically implement and guarantee the election chatbot's neutrality and impartiality. There is no defined strategy for preventing the AI from giving biased responses, especially when faced with leading or manipulative user questions, which poses a major reputational and functional risk to the project's core objective.
- Lack of a Defined Community Contribution Workflow: The entire process for receiving, validating, and integrating community-submitted content is undefined. This includes the creation of an AI agent to automate judging submissions and a clear workflow for handling both new and existing topics. Without this, the project cannot scale or effectively leverage community input, which is stated as a dependency for success.
- Undefined Technical Foundation for Data Processing and Storage: Key decisions about the project's technical architecture are still pending. The team has not selected a database for the RAG system (e.g., NoSQL/MongoDB) or the specific Large Language Models (LLMs) for the final implementation. Furthermore, the method for processing varied document types like PDFs remains uncertain. These foundational decisions must be made to avoid delays in development.