Adrian Krebs,Co-Founder & CEO of KadoaCollecting data from the web hasn't changed in decades and remains fundamentally broken, yet everyone depends on it. Engineers write brittle scripts for each individual source and need to do constant maintenance. This means that the tech debt is accumulating and it's becoming prohibitive on what and what not to scrape.
Today, we are launching our Kadoa Assistant and Web Scraping OS to give everyone the tools to source web datasets at scale in a speed and quality that wasn't possible before.
One platform, two users:
We built it alongside the world's most sophisticated and demanding hedge funds and asset managers.
Imagine having a team of senior web scraping engineers always available to you and your team. We spent years building reliable, accurate scrapers by hand, reverse-engineered the tools and judgment that work takes, and encoded all of it in an agent. Now you have access to all of this through a simple chat interface.
Until now, setting up a web scraper meant either writing a code or clicking through a configuration wizard, inspecting pages, and manually fixing things whenever something broke. We don't think of the Kadoa Assistant as just a better tool, but a fundamentally better way to extract web data.
The Assistant is simply a faster and more intuitive way to create web datasests.
We built the Assistant alongside leading investment firms, and early adoption has been strong. Across the investment firms that were part of our early-access program:
The large firms with the big engineering muscles all have their in-house web scraping infrastructure and teams. These are often the central data teams that then manage and maintain all data pipelines for the different teams/pods.
The small-mid sized firms usually outsource their web scraping because they don't have the manpower to build and especially maintain a large fleet of pipelines. They often still have a very traditional and slow process going from an internal Jira ticket -> compliance -> data engineer -> web scraping provider -> QA -> dataset.
AI definitely helps with vibe-generating scripts for simple scrapers, but the firms that try to build it all in-house tend to struggle with these challenges:
blocking is an increasing issue (because of the flood of AI bots) and most funds don't have the proxy and unblocking capabilities to keep up with it
scaling web scrapers is really hard because every scrape comes with tech debt and cost of maintaining it. You need to have proper observability, infra monitoring, etc. to handle a lot of scrapes.
compliance usually wants to audit and approve the scrapes, and generally there are no automated processes in place to do that properly
with Claude Code, PMs and analysts now start to vibe code their own scraper scripts, which is leading to security, compliance, and data governance issues.
central data teams are actively moving away from siloed data and trying to centralize all data to then give AI access to that.
Our Web Scraping OS deals with all of the complexities above and bundles our AI data infrastructure that generates, maintains, and monitors your pipelines automatically, while your central data team keeps control of everything running across every workflow.

Many teams have a fragmented web data landscape: multiple scraping vendors and tools, the internal engineering time to build and maintain pipelines, and the opportunity cost of the data they never collect because it is too expensive to justify. The Web Scraping OS consolidates it all and become the one-stop-shop for all web scraping.
In finance, if data is late, missing, or wrong, somebody feels it (and calls you in the middle of the night). That's why we built a lot of tooling and infrastructure to ensure the highest accuracy and reliability for our customers. All of our data has to be verifiable and auditable while the workflows run on autopilot,
Data used for investment decisions cannot just be "probably correct." LLMs produce probabilistic output that hallucinates and is not verifiable. Kadoa produces deterministic pipelines that generate verifiable data: every value traces back to its source, down to the page it came from. You audit the number, you do not just trust it.

You see all health metrics of every workflow in one place on our observability dashboard. It includes success rate per source, MTTR, turnaround time, SLA metrics, etc.

Kadoa Assistant and the Web Scraping OS are available today.
Get in touch to test it for free

We spoke with Dan Entrup about how web scraping in finance hasn't evolved much in 20+ years and how AI is changing that now.
How investment firms transform their data stacks to make best use of AI.
What hedge funds actually need to build at scale: the signals worth tracking, the pipeline that holds up, and the compliance layer that doesn't block every new source.