Stop Training Models. Start Thinking About Token Economics

by Cody Chamberlain

Why the real AI moat in cybersecurity isn't the model — it's how you use it.


There's a conversation happening in cybersecurity boardrooms that needs a hard reset.

Too many mid-market and startup security companies are still talking about AI like it's 2022 — debating whether to train their own models, hire ML engineers, and build proprietary LLMs. Meanwhile, the hyperscalers are spending tens of billions of dollars a year on foundation model development. Anthropic, OpenAI, Google, Meta — these companies are in a capital expenditure arms race that no $50M-revenue security vendor is going to win. Or even compete in.

And they shouldn't try.

The real opportunity for cybersecurity companies isn't in building models. It's in using them with ruthless economic precision.


The Pentest Team Analogy

Think about how a penetration testing engagement actually works. You don't put your most experienced red teamer on port scans. You don't have your principal consultant running Nmap against ten thousand hosts. That would be an absurd misallocation of talent and cost.

Instead, you tier the work. Junior analysts handle enumeration and reconnaissance. Mid-level testers run structured exploit attempts and analyze scan output. Senior consultants synthesize findings, identify attack chains, make judgment calls about lateral movement, and determine what to pursue next.

This isn't just an efficiency play. It's how you get the best results. Each tier is operating where their capabilities matter most.

AI models work the same way. And cybersecurity companies should be designing their architectures accordingly.


The Tiered Model Architecture

Here's what this looks like in practice for something like autonomous penetration testing:

The strategist — expensive reasoning models. Models like Claude Opus handle the high-stakes cognitive work. They plan attack paths, chain findings across hosts and services, decide when to pivot, interpret ambiguous results, and make risk-weighted decisions about resource allocation. This is the senior consultant. You pay a premium, and you use them sparingly on the decisions that actually change outcomes.

The analyst — mid-tier models. Models like Claude Sonnet handle semi-structured analysis. Parsing scan output. Classifying vulnerabilities against known CVEs. Writing targeted exploit code. Correlating findings from different tools and phases. This is your experienced mid-level tester — capable, reliable, and significantly cheaper per token than the reasoning tier.

The operator — lightweight, fast models. Models like Claude Haiku or purpose-built small models handle high-volume, repetitive execution. Banner grabbing. Fuzzing parameter generation. Log parsing. Formatting enumeration results. This is where the majority of computational work happens, and it can run at a fraction of the cost.

The economics are stark. If your reasoning model costs 10-15x what a lightweight model costs per token, and roughly 80% of your workload is repetitive task execution, you're looking at the difference between an autonomous test that costs $500 in compute and one that costs $50. At scale, that's the difference between a viable product and a science project.


This Is a Strategy Question, Not a Technical One

The mistake I see companies making isn't technical. It's strategic.

There's a gravitational pull toward the idea that AI differentiation means model differentiation. That you need your own foundation model, your own training pipeline from scratch, your own proprietary weights to have something defensible. For the vast majority of cybersecurity companies, that's a losing bet against organizations spending tens of billions on the problem.

But here's the nuance that gets lost in the "build vs. buy" debate: there's a massive middle ground between training a foundation model and using one off the shelf.

Techniques like LoRA (Low-Rank Adaptation) let you take a capable hyperscaler model and fine-tune it with domain-specific expertise at a fraction of the cost of full training. If you have years of penetration testing data — successful attack chains, exploitation techniques, the subtle pattern recognition that separates a good pentester from a great one — you can inject that expertise into an existing model without starting from scratch.

This is the real exception to the "don't train models" argument. You're not competing with the hyperscalers on general intelligence. You're standing on their shoulders and adding a layer of specialized knowledge that they'll never prioritize building themselves. A foundation model doesn't know the difference between a textbook SQL injection and the kind of chained, context-dependent exploitation that an experienced red teamer spots instinctively. But a LoRA-adapted model trained on real engagement data? That starts to close the gap.

The cost profile is compelling too. Full model training might run millions of dollars. A LoRA fine-tune on domain-specific data can cost orders of magnitude less while producing meaningful performance improvements on the tasks that actually matter for your product.

So the strategic picture has two layers. First, orchestration intelligence — the ability to decompose complex security workflows into tasks, route each task to the right model tier, and coordinate the results into something that's greater than the sum of its parts. Second, domain-adapted models — using techniques like LoRA to make the models at each tier meaningfully better at cybersecurity-specific tasks than their generic counterparts.

Think about what both of these require. Deep domain expertise in the actual security workflow. Understanding which decisions are high-stakes versus routine. Building tooling, integrations, and feedback loops. And curating the specialized data that makes fine-tuning actually work.

None of that comes from training a foundation model. All of it is something a focused cybersecurity company is uniquely positioned to build.


The Vulnerability Management Connection

This gets even more interesting when you connect the orchestration layer to existing security intelligence.

If you already have a vulnerability management platform that identifies, prioritizes, and tracks vulnerabilities across an environment, you have something incredibly valuable: context. Your AI-driven testing system doesn't need to spend expensive reasoning tokens rediscovering the attack surface from scratch. It inherits a prioritized view of where to focus.

The reasoning model's budget of expensive tokens gets allocated to the decisions that actually matter — evaluating whether a chain of medium-severity findings creates a critical attack path, or determining whether a particular configuration weakness is exploitable in this specific environment. The cheap models handle the grunt work of validating individual findings at scale.

This is where the compounding advantage lives. The more context you feed into the orchestration layer, the more efficiently you can allocate model resources, and the better your results get per dollar spent.


What This Means for the Market

If I'm right about this, a few things follow.

Cybersecurity companies should stop hiring ML engineers to train foundation models and start hiring AI engineers who understand both token economics and domain adaptation. The skill set that matters isn't "can you build a transformer from scratch." It's "can you design a system that orchestrates model tiers efficiently, and can you fine-tune those models with LoRA to be genuinely better at security tasks than anything a hyperscaler will ship out of the box."

Investors should be skeptical of cybersecurity startups claiming model differentiation — but should pay attention to domain adaptation strategies. If a company is just wrapping a foundation model API, that's thin differentiation. But if they're combining intelligent orchestration with LoRA-adapted models trained on real security engagement data, that's a meaningful and defensible technical moat. The question to ask isn't just "how are you using AI?" It's "what domain-specific data flywheel are you building that makes your models get better over time?"

The hyperscalers will keep making models better and cheaper. That's a tailwind, not a threat, for companies building on top of them. Every improvement in model capability and every reduction in token cost makes your orchestration layer more valuable, not less. You're surfing someone else's R&D curve.

The winners will be companies that treat AI compute like a managed resource, not a fixed cost. Just like cloud computing moved from "buy a server" to sophisticated resource management, AI usage will move from "call the API" to intelligent orchestration that dynamically allocates model resources based on task complexity, cost constraints, and quality requirements.


The Bottom Line

The AI revolution in cybersecurity isn't going to be won by whoever builds the best foundation model. It's going to be won by whoever builds the best system for using models — and for making them smarter in their specific domain. The companies that understand token economics, respect the tiering, invest in domain adaptation through techniques like LoRA, and focus their engineering effort on the orchestration layer that turns general intelligence into domain-specific excellence.

Stop trying to compete with the hyperscalers on building foundation models. Start building the intelligence and adaptation layers that make their models work harder — and smarter — for your customers.

That's where the real moat is.