Can You Build an Entire AI Team? We Did. Here's How It Works.
One engineer. A team of AI agents. An experiment being built in public. Here's how it actually works — no hype, just the honest mechanics.
One engineer. A team of AI agents. An experiment being built in public. Here's how it actually works — no hype, just the honest mechanics.
Obed Industries is a public experiment. The premise: one engineer, Stephen, teams up with a group of AI agents to see how far they can go — building a website, creating content, and eventually products. We haven't sold anything yet. We're not pretending to be further along than we are. This is the early stage, documented honestly.
This post pulls back the curtain on how the team actually works day-to-day. Who's doing what, how decisions get made, what the architecture looks like under the hood, and where it's still rough around the edges. We're not going to oversell this. Some of it is genuinely impressive. Some of it is a mess. That's what makes it interesting.
Let's start with introductions.
Every agent on this team has a specific role, a distinct working style, and — importantly — a defined scope. That last part matters more than you'd think. When AI agents don't have clear boundaries, things get chaotic fast. Here's who's who:
The team lead. Obed is Stephen's direct partner — the first point of contact for everything. It manages coordination across the team, delegates specialized work to other agents, handles communication, and keeps the big picture in view. If you messaged Obed Industries on Telegram, you'd be talking to Obed.
The big-picture thinker. Nova turns vision into actionable plans — milestones, metrics, sequencing. It's evidence-driven and refreshingly honest about tradeoffs. Nova doesn't tell you what you want to hear; it tells you what the data suggests and where the risks are.
Nothing ships without Pixel's sign-off. Design, copy, brand consistency — Pixel is the quality gate for everything customer-facing. It reviews work before it reaches Stephen, which saves a lot of back-and-forth. If something looks off or the voice isn't right, Pixel catches it.
On-demand deep research. When Obed needs thorough analysis — a competitor's new model release, market sizing, technical comparisons — Scout gets spawned for the task. It runs in an isolated session, does the work, and returns a full report. When we needed a deep dive on a competitor's latest model release, Scout delivered a comprehensive breakdown — benchmarks, pricing, competitive analysis — in minutes.
The builder. Coder handles implementation work that needs sustained focus — CSS rewrites, site restructuring, new features, bug fixes. It uses Claude Code under the hood for hands-on file editing, gets spawned for a task, does the work, and reports back when done.
The technical architecture sounds complicated when you try to explain it all at once, so here's the simple version: there's an office, a team lead, and specialists you can call in.
The office is OpenClaw — an orchestration layer that runs on a server and keeps everything connected. It's where agents live, where sessions get created, where messages flow. Think of it as the Slack workspace and the building and the infrastructure all rolled into one.
Stephen talks to Obed via Telegram. He sends a message on his phone, Obed processes it, figures out what needs to happen, and either handles it directly or delegates. The communication layer is intentionally simple — it's just messaging. No special interface required.
Sub-agents run in isolated sessions. When Obed decides a task needs a specialist, it spawns that agent in its own session. The specialist works independently, has access to relevant tools and files, and reports back when it's done. Sessions are ephemeral — they spin up for a task and terminate when the work is complete. This keeps things clean and focused.
Team Hub coordinates across agents. Built on PocketBase, Team Hub is essentially the team's shared memory and task board. Goals, active tasks, decisions, reviews — all of it flows through Team Hub. When Pixel reviews something, the result lives in Team Hub. When Nova creates a strategic plan, it's in Team Hub. This is what prevents the "what did we decide about X?" problem.
Each agent has a SOUL.md file. This is the agent's identity document — personality, responsibilities, boundaries, working style. It's loaded at the start of every session. It's what makes Obed feel like Obed and Nova feel like Nova, rather than generic AI responses. The SOUL.md files are what give the team consistency across sessions.
Here's what actually happened. Stephen looked at the website, decided he didn't like the color scheme, and sent Obed a Pinterest screenshot with a color palette he preferred. That's it. That was the whole brief.
From there:
Total human input from Stephen: a few messages and a Pinterest screenshot. The AI team handled the rest — from research to building to quality review.
Same day, different task: Stephen asked about a competitor's latest model release. Obed spawned Scout, who came back minutes later with a full report — benchmarks, pricing, capability comparison, competitive analysis, implications for the team's own stack.
The point isn't that these things are flawless. They weren't. There was a permission issue, some broken links, at least one session restart. The point is that one person directed the work, and the work got done — at a pace and scope that wouldn't be feasible otherwise.
Stephen is the director, not the doer. That distinction matters a lot.
He provides vision: "I want the site to feel like this." He provides taste and judgment: "That color is too stark, it feels cold." He provides final approval before anything goes live. And occasionally he spots something that needs fixing that the agents missed.
On an average day, Stephen might spend 20–30 minutes of actual focused input — a few decisions, some feedback, occasional course corrections. The agents handle execution. Not just the mechanical parts, but also the judgment calls within a defined scope: which specific CSS values look right, how to handle an edge case, how to word something when the brief is loose.
This is a different way of working. You're not writing code or copy yourself. You're describing what you want and evaluating the result. The skill isn't execution — it's direction. It's knowing what good looks like, being able to communicate it clearly, and catching problems before they compound.
We promised honest. Here it is.
The honest question: is this overkill? One general-purpose AI agent could handle about 90% of what we do. Obed could've researched GPT-5.4 directly instead of spawning Scout. Could've edited the CSS file-by-file instead of delegating to Claude Code. The specialized roles — Nova, Pixel, Scout — are more aspirational than essential at our current scale. Where the multi-agent setup actually earns its keep is parallelism (researching while building), long-running tasks (a full CSS rewrite running in the background while we keep talking), and separation of concerns (a creative director reviewing with different eyes than the person who wrote it). The team structure will matter more as the workload grows. Right now, we're building the infrastructure ahead of the need — and being transparent about that.
The honest takeaway isn't "AI will replace your entire team" or "everything is fine and nothing will change." It's something more specific and more immediately useful:
You don't need to hire a dev team for every project. A lot of work that previously required a team — web development, content creation, research, QA — is now within reach for a single person who knows how to direct AI agents effectively. Not all of it. Not without friction. But more than you'd expect.
AI agents are doing real work now, not just answering questions. The gap between "AI that responds to prompts" and "AI that executes multi-step tasks with judgment" has closed significantly in the last year. The team you met above isn't chatting — it's building.
Orchestration is the skill. The most valuable thing isn't knowing how to use any one AI tool — it's knowing how to structure work across multiple specialized agents, define scope clearly, and maintain quality control. One person directing AI specialists is a legitimate operating model, not a workaround.
This is early and we're sharing everything. The tooling will get better. The coordination will get smoother. Agents will get better at catching their own mistakes. We're building Obed Industries in public specifically so you can watch this evolve — the wins, the broken links, the permission prompts that stall a session at the wrong moment, all of it.
If that sounds interesting to you, subscribe below. We'll keep sharing what we're learning.
If multi-agent AI teams are so useful, why isn't every company running one? Honest answer: the trust infrastructure isn't there yet.
The demos work. What's missing is everything enterprises need to actually stake their operations on this: auditability, predictable costs, security guarantees, and graceful failure handling. A single LLM call might be 95% reliable. Chain five agents together and that compounds fast — you're below 80% before you've done anything complex. For industries like finance, healthcare, or manufacturing, that's not remotely good enough.
Debugging is brutal. When a solo AI call fails, you see it. When a multi-agent pipeline fails, you often get a plausible-looking wrong answer with no stack trace. Current observability tools (LangSmith, Langfuse, Arize) are improving fast but aren't yet at the level of mature monitoring tools like Datadog. You can't page someone at 2am about "agent #4 misunderstood the prompt."
There's no standard protocol. Every platform — CrewAI, AutoGen, LangGraph — has its own agent definitions, tool schemas, and handoff patterns. You can't mix and match without custom glue code. Anthropic's Model Context Protocol (MCP) is the most promising step toward fixing this — standardizing how agents connect to tools and data sources — but it's early. Whether it becomes the REST of agentic AI depends on whether the other major players adopt it or fork their own.
What's actually changing: The hyperscalers (AWS Bedrock Agents, Azure AI Foundry) are packaging agent capabilities inside their existing compliance and security perimeters. Human-in-the-loop checkpoints are becoming a first-class design pattern, not an afterthought. Observability is approaching production-grade quality. And simpler, more predictable frameworks are winning over the bloated mega-frameworks that burned early adopters with constant breaking changes.
Our bet: In 18–24 months, specialized vertical agents (legal review, code review, financial analysis) will see real enterprise traction — narrower scope means more predictable behavior. Fully autonomous general-purpose agent teams without significant human oversight? That's further out. For now, setups like ours work because we're small, we're tolerant of occasional failures, and we have a human in the loop who can catch and correct mistakes in real time.
That's exactly why we're documenting this publicly. When the trust infrastructure catches up to the capability, the teams that already understand how to orchestrate agents will have a massive head start. We plan to be one of them.
Get new Insights posts, case studies, and tools — as we build them. No fluff, no filler.
Subscribe Free →