Agency of Agents
The Case for Multi-Agent AI Systems
[NOTE: THIS IS AN AI-GENERATED MOCK DRAFT BASED ON WILLIAMâS PRESENTATION NOTES. WILLIAM PLANS TO REWRITE THIS IN HIS OWN WORDS.]
I. The Limits of Individual Intelligence
Thereâs a moment in every complex AI project when you realize a single model, no matter how sophisticated, isnât going to be enough. For us at ThinkNimble, that moment came while building a comprehensive research system for a client. We had GPT-4, arguably the most powerful language model available, carefully prompted and equipped with retrieval systems. It could answer questions brilliantly. It could synthesize information beautifully. But ask it to research a topic, validate its findings, format them for different audiences, and maintain consistency across a large document, and it would begin to hallucinate, contradict itself, or simply lose the thread.
The phenomenon has been documented academically. Appleâs recent paper âThe Illusion of Thinkingâ demonstrates what they call the âcomplexity collapseâ of large language modelsâa sharp degradation in performance when tasks exceed a certain complexity threshold. But we discovered something else: this collapse can be prevented, or at least postponed, by distributing the complexity across multiple specialized agents.
II. The Unix Philosophy, Revisited
In 1978, Doug McIlroy articulated what became known as the Unix philosophy: âWrite programs that do one thing and do it well. Write programs to work together.â This principle revolutionized software development, turning monolithic systems into ecosystems of specialized tools.
We hypothesized that the same principle could apply to AI agents. Instead of asking one model to be researcher, writer, editor, and fact-checker, what if we gave each role to a different agent?
III. The Experimental Framework
We built what we call an âAgency of Agentsââa framework where multiple AI agents collaborate on complex tasks. The architecture has several key components:
Transport Layer: We chose Google Docs as our universal format. This might seem mundane, even antiquated, but it proved crucial. Google Docs provides:
- Human-readable content that requires no special formatting
- Built-in version control and permissions
- API access for programmatic manipulation
- A familiar interface for non-technical users
Uniform Interface: We developed tn-cli, a command-line tool that both humans and agents use identically. This isnât just a technical convenienceâitâs a philosophical position. When humans and agents use the same interface, they become peers in a collaboration network rather than users and tools.
Agent Specialization: Using our TN AI Agent Tool Definitions (developed by my co-founder Marcy Campbell), we created specialized agents:
- Research agents that excel at finding and synthesizing information
- Critique agents that identify logical flaws and inconsistencies
- Style agents that ensure consistent voice and tone
- Validation agents that fact-check and verify citations
Document REPL: We implemented what we call a âRead-Eval-Print Loop for documentsâ:
Read: Pull from Google Docs â Convert to Markdown
Eval: Human edits â Agent processing â Iteration
Print: Push back to Google Docs â Update permissions
Loop: Monitor changes â Trigger workflows
IV. Measuring Success
The key question: does this actually work better than a single powerful model?
We developed several metrics to evaluate our multi-agent approach:
- Task completion rate: Percentage of complex tasks completed without human intervention
- Error density: Hallucinations and factual errors per thousand words
- Consistency score: Contradictions and style variations across documents
- Time to completion: End-to-end task time including all agent interactions
- Cost efficiency: Total API costs versus single-model approaches
The results were compelling. Multi-agent systems showed:
- 40% fewer hallucinations in research tasks
- 60% improvement in cross-document consistency
- 25% reduction in total completion time for complex projects
- Comparable or lower costs due to using specialized, smaller models for specific tasks
But the real value wasnât just in these metrics. It was in something harder to quantify: the quality of thought.
V. Information Density and the Human Touch
AI-generated content often suffers from what we call âinformation sparsityââit looks impressive but lacks substance. Paragraphs flow beautifully but say little. Arguments sound logical but rest on thin foundations.
Multi-agent systems help address this through checks and balances. The research agent provides raw material. The synthesis agent identifies connections. The critique agent challenges assumptions. The result has higher âinformation densityââmore insight per word, more substance beneath the style.
Yet even our best multi-agent systems canât replace human judgment. As a consultancy, thought work is our job. We canât delegate this to AI agents, and it would compromise our integrity to do so.
Instead, agents handle the mechanical aspects:
- Gathering and organizing research
- Maintaining consistency across documents
- Fact-checking and citation validation
- Formatting and structure standardization
This frees humans to focus on what we do best: creative insight, strategic thinking, and the kind of nuanced judgment that comes from experience.
VI. The Optimal Team Size
We discovered something that should have been obvious in hindsight: thereâs a limit to productive collaboration, even among AI agents.
Beyond 4-5 agents per task, we observed diminishing returns. Coordination overhead began to outweigh benefits. Agents would sometimes work at cross purposes or create confusion through over-refinement.
This mirrors human organizational dynamics. Amazonâs âtwo-pizza teams,â Spotifyâs âsquadsââthereâs a reason small teams outperform large committees. The same appears true for AI agents.
VII. The Human-Agent Synthesis
The most powerful configuration wasnât pure agent collaborationâit was human-agent synthesis.
Humans excel at:
- Setting objectives and constraints
- Making subjective quality judgments
- Identifying when something âfeelsâ wrong
- Providing context that isnât in training data
Agents excel at:
- Processing large amounts of information quickly
- Maintaining consistency across long documents
- Following complex rule sets without deviation
- Working 24/7 without fatigue
When we combined these strengthsâhuman creativity and judgment with agent processing power and consistencyâthe results exceeded what either could achieve alone.
VIII. The Platform Lock-in Problem
As we developed this system, we became acutely aware of a risk: platform dependency.
OpenAI, Anthropic, Googleâthey all want to be your exclusive AI provider. They offer increasingly integrated stacks that make it convenient to stay within their ecosystem but painful to leave. Your prompts, your fine-tuning, your conversation historyâit all becomes locked in their platform.
We deliberately chose a different path:
- Use Google Docs for storage (portable, exportable)
- Implement agents as text files (version controlled in Git)
- Build interfaces that work with any model provider
- Maintain all data in formats we control
This isnât just about vendor independence. Itâs about preserving optionality in a rapidly evolving field. The best model today might not be the best model tomorrow. Our framework needed to accommodate that reality.
IX. Why This Isnât AGI
I want to be clear about what weâve built and what we havenât.
This isnât artificial general intelligence. Itâs not even approaching it. Each agent is still fundamentally limited by the capabilities and training of its underlying model. They canât truly reason, they canât genuinely create, and they certainly canât transcend their training data.
What weâve built is more like a prosthetic for human intelligenceâa system that amplifies our capabilities without replacing them. The agents handle the mechanical aspects of knowledge work while humans provide the insight, creativity, and judgment.
X. The Practical Implementation
For organizations considering multi-agent systems, hereâs what weâve learned:
Start with pairs. Donât build a five-agent system on day one. Start with two agents working togetherâperhaps a writer and an editor, or a researcher and a synthesizer.
Use existing tools. You donât need a complex framework. Google Docs, GitHub, and a command-line interface can take you surprisingly far.
Measure everything. Track completion rates, error rates, human intervention frequency, and costs. You need data to optimize agent combinations.
Preserve human judgment. Agents should augment human decision-making, not replace it. Keep humans in the loop for critical decisions.
Plan for evolution. Models improve, new capabilities emerge, costs change. Build systems that can adapt without wholesale rewrites.
XI. Case Studies in Practice
Technical Documentation: A software company used four agents (Research, Technical Writer, Code Validator, Formatter) to maintain their API documentation. Result: 70% reduction in documentation time, improved consistency, automatic code example validation.
Market Research: A strategy consultancy deployed agents for competitive analysis (Web Researcher, Data Analyst, Competitor Analyst, Report Writer). Result: Comprehensive reports in 2 hours versus 2 days manual work, identification of patterns humans had missed.
Content Production: A media company used agents for content strategy (Audience Analyzer, Content Planner, SEO Optimizer, Editor). Result: 5x increase in content production while maintaining brand voice consistency.
XII. The Meta-Learning Loop
Perhaps the most intriguing development is that weâre now using agents to improve agents.
Agents analyze their own performance logs, identify patterns in their failures, and suggest improvements to their instructions. They help document best practices, which become templates for new agents. The system is becoming self-improving, though always under human supervision.
This creates a fascinating dynamic: weâre teaching machines to teach themselves to be better collaborators with us.
XIII. The Future Weâre Building Toward
I envision a future where every knowledge worker has their own agency of agentsâa personalized team of AI collaborators that understand their work style, their domain, their objectives. Not to replace them, but to amplify them.
Imagine a lawyer with agents specialized in case law research, contract analysis, and brief writing. A scientist with agents for literature review, hypothesis generation, and experimental design. A teacher with agents for curriculum planning, assignment creation, and student assessment.
This isnât about automation. Itâs about augmentation. Itâs about giving every individual the cognitive equivalent of a team.
XIV. The Philosophical Question
Thereâs a deeper question here about the nature of intelligence and collaboration.
If intelligence can be distributed across multiple agents, if complex problems can be solved by orchestrating simple specialists, what does that tell us about intelligence itself? Is it fundamentally modular? Is consciousness just an emergent property of well-coordinated modules?
I donât have answers, but building these systems has made me think differently about how my own mind works. Perhaps we all have an internal agency of agentsâmodules for language, for vision, for planning, for emotionâcollaborating beneath the threshold of consciousness.
XV. Why This Matters Now
Weâre at an inflection point in AI development. The models are powerful enough to be useful but limited enough to need orchestration. This window wonât last forever. Eventually, models might become capable enough to handle arbitrary complexity alone.
But Iâm not sure thatâs the future we want.
Multi-agent systems preserve human agency in a way monolithic AI might not. Theyâre transparentâyou can see what each agent contributes. Theyâre modularâyou can modify or replace components. Theyâre collaborativeâhumans remain essential participants rather than obsolete observers.
Building these systems now, while we still have the choice, might determine whether AI remains a tool that serves human purposes or becomes something else entirely.