Data is the only moat a bootstrapper has left

If the only thing your SaaS does is transform data on the way through, an agent can replace it. Arvid Kahl’s estimate is that a transformative-only product is about two hours away from being a skill someone drops into Claude Code, and once you accept that number, the question of what a small founder should build answers itself. Build a system of record. Make it API-first. Own the data, because the data is the one thing the agent cannot regenerate.

This is the most useful planning principle I have heard for indie SaaS in 2026, and it is worth working through carefully because it quietly invalidates a large share of what people are currently building.

The two kinds of SaaS, and only one survives

Kahl’s distinction is between a transformative product and a system-of-record product, and the line between them is sharp once you see it.

A transformative product takes data in, does something to it, and sends data out, while storing nothing of lasting value. A tool that takes a transcript and returns a summary. A tool that takes a CSV and returns a cleaned CSV. A tool that takes an image and returns a resized image. The value is entirely in the transformation, and the transformation is exactly the kind of thing a model now does natively. His two-hours-away line is about this category. If your whole product is a transformation, then the moment a capable model can do that transformation, your product is a prompt, and someone will write that prompt as a skill inside their coding tool and never pay you again.

A system-of-record product is different. It holds the canonical version of something the customer’s business depends on. The list of their patients, their inventory, their compliance events, their maintenance logs, their customer interactions over years. The value is not in any single transformation. It is in being the place where the truth lives, accumulated over time, that everything else in the customer’s operation refers back to. An agent can transform data in two hours. It cannot, in two hours, become the four-year-old trusted record of a company’s operations, because that record is the slow accumulation of real-world human activity that only ever happened once.

So the first planning question for a small founder is brutal and clarifying. Is what I am building a transformation or a record. If it is a transformation, the model is coming for it. If it is a record, the model needs you, because the model has no data of its own.

The line between the two is not always obvious at a glance, and a lot of products that feel like records are actually transformations wearing a database. A tool that summarizes meetings looks like it has data, because it stores the summaries, but if the customer’s real source of truth lives elsewhere and your summaries are a disposable byproduct, you are a transformation with a cache, and the moment a model summarizes meetings natively the cache is worthless. The honest test is whether the customer’s business would be damaged if your stored data vanished overnight. If they would shrug and regenerate it, you are a transformation. If they would panic because you held the canonical version of something they depend on, the patient list, the inventory counts, the compliance log, the years of customer interactions, then you are a record. Kahl’s two-hours-away estimate applies to the first kind and not the second, and the difference between them is the difference between a product a skill replaces in an afternoon and a product the skill has to phone for the data.

The reason this question is uncomfortable is that the most appealing products to build are usually transformations. A transformation is easy to demo, easy to explain, and often genuinely clever, which is exactly what makes it fun to work on and fun to launch. A record is the opposite. It is boring to demo, because the value only appears after months of accumulation, and boring to explain, because “we are the place your data lives” does not light anyone up at a launch. The market rewards the boring one and punishes the clever one, which is the inverse of what a founder’s instincts and the launch-day incentives push toward. Most people building transformations are not making a mistake out of ignorance. They are following the dopamine, building the thing that demos well, and the data-moat argument is largely a warning that the thing that demos well is the thing with no defense.

Why data is specifically the bootstrapper’s moat

Kahl’s framing of data as the only moat is sometimes read as generic advice, but it is sharper for a small team than for anyone else, and the reason is the economics of agents.

Agents are expensive and ephemeral. He makes the point with his own product, PodScan, which collects podcast data continuously in the background, cheaply, at scale. To have an agent do that same collection live would cost an absurd amount, tens of thousands of dollars a day, because the agent has to do the work every time and the work is enormous. The data that a small founder has been quietly collecting in the background for years is, by contrast, just sitting there, already paid for, already accumulated. The asymmetry is the moat. You spent three years and modest server costs building a dataset that an agent would have to spend a fortune to reconstruct, and even then it could not reconstruct the historical part, because the past already happened and was only recorded by you.

This is why data is a better moat for a two-person team than for a large company in one specific sense. The large company’s moats were capital, headcount, and distribution, all of which AI is eroding. The small team’s moat is a thing AI cannot erode at all, because it is not a capability, it is a possession. Capability gets cheaper every quarter. A four-year-old proprietary dataset does not get cheaper. It gets more valuable, because the world keeps moving and your record keeps being the only one that captured it.

The PodScan example is worth dwelling on because it makes the asymmetry concrete rather than abstract. PodScan sits in the background collecting podcast data continuously, cheaply, accumulating something that grows a little every day at almost no marginal cost. For an agent to reproduce that on demand, it would have to do the entire collection live, every time, which Kahl pegs at the absurd end of the cost scale, tens of thousands of dollars a day, because the work is enormous and the agent has no stored result to draw on. That gap between the cheap, patient, background accumulation and the expensive, frantic, live reconstruction is the moat stated as a number. And the part that makes it permanent is that even unlimited money cannot buy the historical slice. The podcasts that aired three years ago and were recorded by your system can never be re-collected by anyone, because that moment is gone and only your record kept it. An agent can match your capability tomorrow. It can never retroactively have been collecting since three years ago. Time is the one input no model can prompt its way to, and a record-keeping product turns time into an asset that compounds in exactly the direction AI cannot follow.

Having the data is only half of it

The part of Kahl’s argument that people skip is the second half, and it is the half that turns data from a dormant asset into a moat. Having data is half the moat. Making the data available is the other half.

The instinct of a founder who has been told data is the moat is to hoard it, lock it behind the UI, and guard it. Kahl argues the opposite. The product has to be API-first, and you should chase what he calls usage parity, where a human in the UI, a developer hitting the API, and an agent calling your tool all have first-class access to the same data and actions. The reasoning is about where the customer’s work is moving. The customer is increasingly going to be operating through agents. If your data is only reachable by a human clicking through your interface, then when the customer’s agent goes looking for that data, it cannot get to yours and it routes around you to whatever it can reach. The locked-up dataset is a moat that strands you outside the customer’s actual workflow.

Making it API-first, and going further to make it callable by agents directly, means you become the place the agent has to go to get the truth. The data is the moat, and the API is the drawbridge that lets the moat actually defend something instead of just sitting there.

The usage-parity idea is the part worth internalizing, because it runs against a decade of SaaS instinct. The old playbook said the UI was the product and the API was a grudging add-on for the technical few. Kahl’s argument flips the priority. As the customer’s work moves into agents, the API and the agent-callable interface become the primary way the data gets used, and the human UI becomes one of several front doors rather than the only one. A product where the human screen, the developer API, and the agent tool call all reach the same data and the same actions with equal standing is a product that stays embedded in the customer’s workflow no matter how that workflow shifts. A product that locked everything behind the human screen is a product the customer’s agent cannot use, and the moment the customer starts working through agents, that product is invisible to the exact place the work is now happening. Owning the data does you no good if the data is unreachable from where the work moved.

What this changes about what you build

The combined principle reshapes the product decision in a way I think most indie hackers have not absorbed.

It pushes you toward boring, accumulating, record-keeping products and away from clever, one-shot, transformation products. The clever transformation is the fun thing to build and the thing that demos well, and it is precisely the thing with no durable moat. The boring record-keeper is the unglamorous thing nobody wants to build, and it is the thing with a moat that compounds. This rhymes with everything else worth knowing about small SaaS. The defensible businesses are almost always the boring ones, because boring usually means deep in a specific domain with real data piling up, and that depth is the part AI cannot shortcut.

It also pushes you to start collecting data on day one even before you know exactly how it becomes the moat. Every interaction your product records, the metadata, the corrections users make, the choices they take, is a brick in a wall that gets harder to climb the longer you run. Kahl’s point about metadata being your unique moat is easy to underrate. The exhaust of your product’s normal operation is, over years, a dataset nobody else has, and the founder who designed to capture it from the start ends up with a moat the founder who threw it away can never rebuild.

The corrections point is the sharpest version of this, because it is the kind of data that is genuinely impossible to manufacture. When a user edits what your product produced, fixes a categorization, overrides a default, flags an output as wrong, they are handing you a signal about the real world that exists nowhere else. That signal is the gap between what a generic model would do and what is actually correct for this specific customer in this specific situation, and it is generated by the one person capable of generating it, the human who knows the right answer. Accumulate enough of those corrections over years and you have a dataset of human judgment about your exact domain that no amount of model capability can reproduce, because it was never a transformation, it was a record of what real people knew and the model did not. The founder who logs every correction from the first week is quietly building the most defensible asset available to a small team, and it costs almost nothing to capture, only the discipline to design for it before you understand its value.

The check to run on your own roadmap

Take whatever you are building or planning and run it through two questions. First, am I a transformation or a record. If you are purely a transformation, assume a model eats you and rethink toward holding something the customer depends on over time. Second, is my data reachable by the customer’s agents and developers, not just by a human clicking my UI. If it is locked behind the interface, you have a moat that cannot defend you because the workflow is moving to a place your data cannot reach.

The summary is the one Kahl keeps returning to, and it is the right one to plan around. As building gets trivial and transformation gets free, the only thing that stays scarce is real-world human-generated data made accessible. A bootstrapper who builds a record instead of a transformation, and opens that record to the agents the customer is starting to run, is building the one kind of moat that does not get cheaper every time a better model ships.

Data is the only moat a bootstrapper has left

More in data

The Wisdom Deficit: Why More Data Is Producing Worse Decisions

The New Data Supply Chain - How Information Becomes Value

Ways Authentic Payday Lending Stores in West Covina Safeguard Customer Data

Write for entrepreneurs, founders, and builders.

Why write for Venture?

Posts Across the Network

10 Tools for Monitoring Remote Workers, Ranked by What They Prove

Reading the CS2 Skin Market: What the Numbers Actually Tell You

How CS2 Skin Marketplaces Actually Work, and How to Pick One

Top 10 React Native Development Companies With AI Product Engineering Capabilities in the USA

What Happens to Your ID After You Upload It for Online Verification?

How Provably Fair Systems Use Hashes to Make Digital Results Verifiable