Skip to content
Back to Blog

Two Agencies Built the Same AI System Without Ever Talking to Each Other

AgencyBoxx Team
Two Agencies Built the Same AI System Without Ever Talking to Each Other

We had a call recently with another HubSpot agency owner. Someone we had never met before, operating in a completely different market, serving a different client base, running a different size team. What we discovered is a textbook case of ai agency operations convergent evolution.

Within 10 minutes, we realized we had independently built almost the exact same AI operations system.

Key takeaway: When two experienced operators independently build nearly identical multi-agent AI systems without ever communicating, you are witnessing convergent evolution that proves these are not experimental nice-to-haves but operational necessities driven by universal agency pain points.

Same agent types. Same use cases. Same architectural decisions. Same trust boundaries. Same "I don't let my agents send emails" policy. Even the same instinct to name their agents after fictional characters so the team could keep track of who does what.

This was not a coincidence. This was convergent evolution. According to Fortune Business Insights, the agentic AI market is projected to reach $196.6 billion by 2034 at 43.8% CAGR. When an entire market is converging on the same architecture independently, you are looking at something structural, not a trend. And it tells you something important about where agency operations are heading.

The Lineup Was Nearly Identical

Before this call, we had spent months building out our agent roster inside OpenClaw. Nine purpose built AI agents, each handling a specific operational function that was eating our time. We assumed our setup was unique because we had never seen anyone else in the HubSpot ecosystem doing this at our scale.

Then this other agency owner started walking through their agent list:

Time tracking enforcement. They had one. We had one. Both agents monitor the team's logged hours, flag gaps, and chase people down over Slack DM when entries are missing or descriptions are blank. Both of us had built this because project managers were spending 15+ minutes every single day manually checking who had logged time. Multiply that by 250 working days and you are burning 60+ hours a year on a task a script can do better.

Email triage and personal assistant. They had one. We had one. Both systems classify incoming email, filter spam, detect newsletters, auto label by category, and generate draft replies that go into a review queue. Neither system sends anything automatically. Both require a human to review and approve before anything goes out.

Sales prospecting and contact enrichment. They had one. We had one. Both agents pull prospect companies from defined sources, enrich contacts through services like Hunter.io, validate email addresses, discover LinkedIn profiles, and organize everything for outreach. Our system had processed 7,300+ prospects and found 2,880+ contacts. Theirs was on a similar trajectory.

Project management and account oversight. They had one. We had one. Both monitor project activity, create task summaries from meeting transcripts, push follow ups back into the project management system, and flag when things are falling behind.

File and asset management. They had one. We had one. Both organize client deliverables in Google Drive, ensure folder structures stay clean, and keep documentation properly filed.

Personal assistant and calendar management. Both of us had built agents to handle scheduling logistics and daily briefings.

QA and monitoring. Both of us had agents checking the output of other agents and systems, validating that work was done correctly before it reached a client.

We are talking about two completely independent implementations, built months apart, by people who had never exchanged a single message. And the overlap was not 50%. It was closer to 90%.

Here is how the agent types lined up side by side:

Agent TypeAgency A (Us)Agency B (Them)Core Function
Time TrackingYesYesMonitor logged hours, flag gaps, chase missing entries
Email Triage / AssistantYesYesClassify, filter spam, draft replies with human approval
Sales ProspectingYesYesEnrich contacts, validate emails, organize for outreach
Project OversightYesYesSummarize tasks, flag overdue items, push follow-ups
File / Asset ManagementYesYesOrganize deliverables, maintain folder structures
Calendar / BriefingsYesYesSchedule logistics, daily intelligence briefings
QA / MonitoringYesYesValidate agent output, check work before client delivery

This level of ai agency operations convergent evolution is not coincidence. It is signal.

"2-5 humans supervising 50-100 agents is the emerging organizational model for knowledge work." -- McKinsey & Company

For a detailed look at the 8 agents every agency should build first, the sequence matches almost exactly what both of us built independently.

Why This Keeps Happening

Agencies look different on the surface. Different niches, different team sizes, different tech stacks, different client expectations. But underneath all of that, every agency that runs on HubSpot and serves multiple clients simultaneously hits the same set of operational walls:

Shared inboxes grow faster than humans can triage them. Whether you have 10 clients or 75, the email volume eventually outpaces the team's ability to stay on top of it. SLA clocks start ticking the moment a message lands, and the first sign of trouble is usually a client escalation, not an internal catch.

Time tracking is a daily battle. Every agency owner we have ever spoken to has the same complaint: the team forgets to log time, leaves descriptions blank, or lets timers run overnight. This creates billing gaps, inaccurate budgets, and awkward client conversations. And every day, someone on the team (usually the most expensive person) spends time chasing the rest of the team to fix it.

The founder is the bottleneck. The person who should be selling, strategizing, and building relationships is instead spending hours on email triage, meeting follow ups, and operational oversight. This is true whether the agency has 5 people or 50.

Prospecting never happens because delivery always takes priority. Outbound sales requires sustained, repetitive effort: finding contacts, enriching data, validating emails, personalizing outreach. But when a client deadline is looming, prospecting is the first thing that gets dropped. Every time.

Institutional knowledge is trapped in people's heads. SOPs live in Notion pages nobody reads. HubSpot documentation is scattered across help articles. Onboarding a new hire means weeks of tribal knowledge transfer that could be handled by a searchable knowledge system.

These problems are not unique to any single agency. They are structural. They come with the business model. Multi-agent workflows grew over 300% in 2025-2026, according to Syncari and PYMNTS research. And when two experienced operators independently set out to solve them with AI agents, they arrive at nearly the same architecture because the problems are the same everywhere.

The Trust Boundary Was Identical Too

The most striking overlap was not the agent lineup. It was the trust model.

Both of us had independently arrived at the same hard rule: AI agents draft. Humans approve. Nothing goes out without a checkpoint.

When we asked the other agency owner about their approach to outbound communication, their response was immediate and blunt. No hesitation. Every agent creates Gmail drafts. Every draft gets reviewed by a human. No agent has permission to send anything to a client autonomously.

This was not a decision either of us made because we read it in a best practices guide. We both arrived at it because we understand what a single wrong email can do to a client relationship. When you are a white label agency or you are managing someone else's HubSpot portal, a misrouted message or a hallucinated response is not an embarrassing mistake. It is a relationship ending event.

The human in the loop pattern is not a limitation of the technology. It is the entire point. The value of AI agents is not replacing human judgment. It is eliminating the 45 minutes of prep work that happens before a human makes a two minute decision. Draft the email. Prepare the report. Stage the response. Then let a human hit send.

Anyone telling agencies to remove humans from the loop entirely is selling something they have never had to stand behind with real client relationships on the line.

Both of Us Named Our Agents

This is a small detail, but it stuck with us.

Both agencies had independently decided to give their agents human readable names instead of functional labels. We went with characters from 1980s Tron. They started with Transformers, then switched to human names when the roster grew beyond what they could remember.

It sounds trivial. It is not.

When you have 15 or 20 agents running and something breaks, "the time tracking agent has an error" is far less useful than a specific name that everyone on the team instantly recognizes. Naming agents creates a mental model. The team starts thinking of them as colleagues with defined responsibilities, not abstract processes. It makes the whole system easier to reason about, easier to troubleshoot, and easier to onboard new team members into.

If you are running more than three or four agents, name them. Pick a theme. Your team will thank you.

The Architectural Split: Scripts vs. SOPs

The one area where we diverged was in how much decision making we give the AI.

Our approach leans heavily on scripted Python. When we build a new agent, we start in Claude Code and build out the core functionality as deterministic scripts. API calls, data formatting, routing logic, compliance checks: all scripted. Then we bring AI in surgically for the parts that genuinely require reasoning, natural language understanding, or content generation. For a deeper dive into this architecture, see how we run 20 agents for $1 a day with a script-first approach.

The other agency took a different approach. Their agents operate primarily through AI driven standard operating procedures. The agent receives an SOP that says "this is how we do X, this is what happens after a meeting ends, this is the next step." The AI interprets and executes against those instructions, making more autonomous decisions within the boundaries of the SOP.

Both approaches work. Both have tradeoffs.

The scripted approach costs almost nothing to run (we operate 20 agents across three OpenClaw instances for about $1/day in AI credits) because the agents are making fewer AI calls. The tradeoff is that scripted agents cannot easily learn from their own mistakes, because they are making fewer independent decisions.

The SOP approach gives agents more flexibility and allows them to self assess (the other agency's agents write daily self assessments documenting what worked and what failed). The tradeoff is higher token costs and a wider surface area for unexpected behavior.

There is no single right architecture. But if you are burning through AI credits and wondering why, check how much decision making you are outsourcing to the model versus scripting deterministically. We learned this lesson the hard way when our first pure AI approach burned through $150 in four hours.

What Convergent Evolution Tells You

When two experienced operators, working independently, arrive at nearly identical solutions to the same problems, that is signal. It means the problems are real, the solutions are proven, and the pattern is repeatable.

According to McKinsey, the emerging organizational model is 2-5 humans supervising 50-100 agents. That projection aligns with what we are seeing on the ground: agencies that adopt multi-agent systems are not replacing their teams, they are amplifying them. The humans stay in the loop for judgment and approval while agents handle the repetitive operational load.

It also means the window is open. Right now.

The tools are available to everyone. OpenClaw is open source. Claude Code can handle most of the setup. A Mac Studio with decent specs can run the entire operation. The technical barriers have never been lower.

But 95% of agency owners are not going to sit down and build this. Not because they cannot. Because they are running agencies. They are on client calls, chasing invoices, hiring, and putting out fires. The tools are commodities. The implementation is the differentiator. For those who are ready to start, we wrote an open letter to agency owners sitting on the AI sidelines that lays out the case plainly.

That is the gap AgencyBoxx was built to fill. We spent 200+ hours and 12+ months building, breaking, fixing, and refining a production AI operations system on 75+ real agency clients. We did the work so that the next agency does not have to start from zero.

But whether you build it yourself or bring in a system that is already battle tested, the takeaway from this call was clear: the agencies that figure this out first will operate at a fundamentally different level than the ones still doing everything manually.

The problems are universal. The solutions are converging. The only question is how long you wait to start. Explore the full agent roster to see how these roles map to a production system.

Frequently Asked Questions

Why are different agencies building the same AI system?

Because agencies share the same structural operational pain points regardless of niche, size, or tech stack. Time tracking gaps, inbox overload, founder bottlenecks, inconsistent prospecting, and trapped institutional knowledge are universal to the multi-client agency business model. When experienced operators independently set out to solve these problems with AI agents, they converge on nearly identical architectures because the underlying problems are identical.

What is convergent evolution in agency technology?

Convergent evolution is a concept borrowed from biology where unrelated organisms independently develop similar traits because they face the same environmental pressures. In agency technology, it describes how independent agency owners, with no communication between them, are building nearly identical multi-agent AI systems. The overlap in agent types, trust boundaries, and architectural decisions is not coincidence; it is a signal that the pattern is structurally sound and repeatable.

How many AI agents does an agency need?

Most agencies start with 2 to 3 agents (time tracking enforcement and email triage are the universal starting points) and expand from there. Our production system runs 20 agents across three instances. The other agency we spoke with had a comparable roster. The sweet spot depends on your team size and operational complexity, but the 8 core agents (time tracking, email triage, client experience, knowledge base, prospecting, project oversight, critical alerts, and service watchdog) cover the pain points that every multi-client agency faces.

Is building AI operations a competitive moat?

Yes, and the moat deepens over time. Every day the system runs, it accumulates operational intelligence: learned email domains, proven recipes, client voice profiles, correction history, and refined escalation thresholds. After 12+ months of production use, our system is meaningfully ahead of anyone starting from scratch today. The tools are free and available to everyone, but the accumulated operational knowledge and the hundreds of hours of refinement are what create a durable competitive advantage.

AgencyBoxx is an AI operations platform built inside a real agency, not a lab. Book a Walkthrough to see the system running on live client data.