Blog post illustration

Introducing Kadoa Assistant and Web Scraping OS

Adrian Krebs,Co-Founder & CEO of Kadoa

Collecting data from the web hasn't changed in decades and remains fundamentally broken, yet everyone depends on it. Engineers write brittle scripts for each individual source and need to do constant maintenance. This means that the tech debt is accumulating and it's becoming prohibitive on what and what not to scrape.

Today, we are launching our Kadoa Assistant and Web Scraping OS to give everyone the tools to source web datasets at scale in a speed and quality that wasn't possible before.

One platform, two users:

  • Kadoa Assistant lets you build a web dataset with just a prompt. PMs and analysts use it without writing code, while data engineers can speed up their pipeline generation.
  • Web Scraping OS is our AI data infrastructure that generates, maintains, and monitors data pipelines automatically. Central data teams stay in control of everything that's happening across all their workflows.

We built it alongside the world's most sophisticated and demanding hedge funds and asset managers.

Kadoa Assistant

Imagine having a team of senior web scraping engineers always available to you and your team. We spent years building reliable, accurate scrapers by hand, reverse-engineered the tools and judgment that work takes, and encoded all of it in an agent. Now you have access to all of this through a simple chat interface.

Until now, setting up a web scraper meant either writing a code or clicking through a configuration wizard, inspecting pages, and manually fixing things whenever something broke. We don't think of the Kadoa Assistant as just a better tool, but a fundamentally better way to extract web data.

How it works

  1. You write a prompt on what data you want to extract.
  2. Kadoa explores the site and finds the most reliable source for the data (e.g. API endpoint, embedded JSON, CSV file)
  3. It proposes a data structure that you can customize individually
  4. It builds the deterministic data pipeline, runs tests, and validates the data
  5. You review and approve the sample data. You stay in full control.
  6. The workflow goes live and produces the full dataset, with automated scheduling, notifications, and validations.

The Assistant is simply a faster and more intuitive way to create web datasests.

Early results

We built the Assistant alongside leading investment firms, and early adoption has been strong. Across the investment firms that were part of our early-access program:

  • Workflow setup time cut by 60%
  • Non-technical team members like PMs and analysts building datasets on their own
  • Configuration-related support requests down 52%
  • Batch updates and complex configurations 5X faster

Web Scraping OS

The large firms with the big engineering muscles all have their in-house web scraping infrastructure and teams. These are often the central data teams that then manage and maintain all data pipelines for the different teams/pods.

The small-mid sized firms usually outsource their web scraping because they don't have the manpower to build and especially maintain a large fleet of pipelines. They often still have a very traditional and slow process going from an internal Jira ticket -> compliance -> data engineer -> web scraping provider -> QA -> dataset.

AI definitely helps with vibe-generating scripts for simple scrapers, but the firms that try to build it all in-house tend to struggle with these challenges:

  • blocking is an increasing issue (because of the flood of AI bots) and most funds don't have the proxy and unblocking capabilities to keep up with it

  • scaling web scrapers is really hard because every scrape comes with tech debt and cost of maintaining it. You need to have proper observability, infra monitoring, etc. to handle a lot of scrapes.

  • compliance usually wants to audit and approve the scrapes, and generally there are no automated processes in place to do that properly

  • with Claude Code, PMs and analysts now start to vibe code their own scraper scripts, which is leading to security, compliance, and data governance issues.

  • central data teams are actively moving away from siloed data and trying to centralize all data to then give AI access to that.

Our Web Scraping OS deals with all of the complexities above and bundles our AI data infrastructure that generates, maintains, and monitors your pipelines automatically, while your central data team keeps control of everything running across every workflow.

How the Kadoa OS works

Many teams have a fragmented web data landscape: multiple scraping vendors and tools, the internal engineering time to build and maintain pipelines, and the opportunity cost of the data they never collect because it is too expensive to justify. The Web Scraping OS consolidates it all and become the one-stop-shop for all web scraping.

Total cost of web scrapingtoday vs. consolidated on Kadoa
Today · fragmented
Traditional scraping vendorsInternal engineering timeOpportunity cost of limits
Consolidated on Kadoa~30% lower TCO

Purpose-built for finance

In finance, if data is late, missing, or wrong, somebody feels it (and calls you in the middle of the night). That's why we built a lot of tooling and infrastructure to ensure the highest accuracy and reliability for our customers. All of our data has to be verifiable and auditable while the workflows run on autopilot,

Provably right data and deterministic code

Data used for investment decisions cannot just be "probably correct." LLMs produce probabilistic output that hallucinates and is not verifiable. Kadoa produces deterministic pipelines that generate verifiable data: every value traces back to its source, down to the page it came from. You audit the number, you do not just trust it.

Deterministic, not probabilistic.

Observability

You see all health metrics of every workflow in one place on our observability dashboard. It includes success rate per source, MTTR, turnaround time, SLA metrics, etc.

Observability dashboard

Kadoa Assistant and the Web Scraping OS are available today.

Get in touch to test it for free


Adrian Krebs
Co-Founder & CEO of Kadoa