SCRAPR

The data layer for the agentic web

379 followers

The data layer for the agentic web

379 followers

Visit website

Automation tools

•

Browser Automation

SCRAPR is a new approach to web data extraction. Instead of relying on fragile DOM selectors or heavy browser automation, SCRAPR looks at how modern websites actually load their data and extracts structured responses directly from those sources. The goal is to make web data pipelines faster, more reliable, and easier to maintain. Right now SCRAPR is in early MVP and we’re looking for developers, data teams, and AI builders who need clean structured data from websites.

Free Options

Launch tags:Productivity•API•Artificial Intelligence

Launch Team

EOL Dataset by HeroDevs — Find every EOL dependency in your stack. Free. In 5 minutes.

Find every EOL dependency in your stack. Free. In 5 minutes.

Promoted

SCRAPR

Hunter

📌

I built SCRAPR after running into the same problem again and again:

Getting structured data from websites is still way harder than it should be.

Most tools fall into two buckets:

• Browser automation (Puppeteer / Selenium) — slow and expensive
• Traditional scrapers — fragile and constantly breaking

SCRAPR tries a different approach.

Instead of rendering pages or parsing messy HTML, it focuses on how websites actually load their data and extracts structured responses from there.

The goal is to make web data extraction more reliable — especially for AI pipelines and data workflows.

It’s still early (MVP stage), and I’m looking for builders who want to try it and give feedback.

Report

2mo ago

@vemulasukrit
Thanks for the insight, Sukrit! Relying on how sites load data (XHR/Fetch) is much more stable for AI pipelines than Puppeteer-heavy stacks.

I’ve worked on various data aggregation projects and would love to put SCRAPR through its paces on some complex sites I've encountered. I'm happy to provide detailed feedback or help contribute to the codebase.

Let me know how I can get started!

Report

2mo ago

Cue

Intercepting network calls instead of rendering pages is a smart approach. Way less fragile than the usual scraping setups. What kinds of sites have been trickiest to support so far?

Report

2mo ago

SCRAPR

Hunter

@dparrelli Thanks, appreciate that!

Some of the trickier ones tend to be sites that generate requests dynamically or rely heavily on session-based flows, since those can behave differently depending on how the page loads.

But overall most modern sites still rely on some form of underlying data requests.

Report

2mo ago

BrandingStudio.ai

Most scrapers fight the rendered HTML. This goes upstream to where the data actually comes from, am I understanding that right? That's quite interesting.

What gets me most is the stability angle. Anything built on CSS selectors or DOM structure breaks the moment a site redesigns its front-end. If you're anchored to the underlying API calls instead, that problem should mostly disappear.

I'm building an AI platform that pulls structured data into its pipeline, so this is genuinely relevant to me. The edge case I keep running into with this type of approach: sites that sign their internal API requests dynamically, session tokens, HMAC signatures, that kind of thing. How does SCRAPR handle those? That's usually where it gets complicated in my experience.

Report

2mo ago

SCRAPR

Hunter

@joao_seabra Yes, you’re understanding it correctly. The main idea is to focus on where the data actually comes from instead of relying on fragile DOM selectors, which is where a lot of traditional scrapers break when the UI changes. For cases like signed requests, session tokens, or other protections, those are definitely some of the harder scenarios. SCRAPR doesn’t rely on a single rigid method there — it adapts to how the site normally serves its data and works within that flow. The goal is not to bypass a site’s logic, but to make data extraction more stable and reliable compared to approaches that depend purely on the rendered HTML. Also really cool to hear you’re building an AI platform around structured data pipelines — that’s exactly the kind of use case we’re seeing more of.

Report

2mo ago

This approach is super clever — basically doing what I always do manually in Chrome DevTools Network tab (hunting for those fetch/GraphQL calls) but automated 😮

Does the engine just statically analyze the page source to find those internal API requests, or does it use AI/LLM in some way to detect and reconstruct the right endpoints even on tricky sites?

And how well does it handle completely arbitrary URLs — like, throw any random modern site at it and it still finds the clean data source reliably?

Report

2mo ago

SCRAPR

Hunter

@paxhumana Yeah that’s actually a great way to think about it 😄 it’s basically automating the kind of discovery people usually do manually in DevTools. Under the hood it analyzes how modern sites load their data and figures out the clean data sources from there. It’s not tied to specific selectors or layouts, which helps it stay stable even when sites change their UI. The goal is that you can throw pretty much any modern site at it and it will still find the structured data without needing manual setup. There are always edge cases of course, but it works reliably across a wide range of sites.

Report

2mo ago

The interception approach is clever, way faster than spinning up a headless browser for every request. Have you thought about a batch endpoint where you can throw a list of URLs at it in one call? Anytime I've built a scraping pipeline for a project, the single-URL-at-a-time loop is where things get slow and annoying to manage.

Report

2mo ago

SCRAPR

Hunter

@juelz Thanks Julian, really appreciate that! And yes — that’s a great point. Running things one URL at a time can definitely become slow when you’re building pipelines. There’s already support for batch-style requests where you can pass multiple URLs in one call, and I’m planning to expand that further so it works better for larger data pipelines.

Report

2mo ago

Copus

Really smart approach to web scraping. Focusing on where data actually comes from rather than relying on DOM selectors is a much more resilient strategy. Most scraping tools break the moment a site updates its frontend, so anchoring to underlying API calls makes a lot of sense.

Curious about how you handle rate limiting and sites that aggressively block automated access. Either way, congrats on the launch!

Report

2mo ago

SCRAPR

Hunter

@handuo Thanks, really appreciate that! For things like rate limiting or stricter access controls, it really depends on how the specific site handles requests. SCRAPR focuses on keeping requests lightweight and behaving like a normal client rather than relying on heavy browser automation.

Report

2mo ago

Smart approach intercepting the underlying API calls instead of fighting the DOM. I've built data pipelines that relied on traditional scraping and the maintenance burden of broken selectors is brutal. Curious -- do you have plans for a schema definition layer where users can map the intercepted responses to a consistent output format? That would make it really useful for feeding structured data into AI workflows.

Report

2mo ago

SCRAPR

Hunter

@emad_ibrahim Thanks Emad, really appreciate that. And yeah, the maintenance from broken selectors is exactly one of the main problems I wanted to avoid. A schema / mapping layer is definitely something I’ve been thinking about. Right now the focus is on getting clean structured responses out reliably, but adding a way for users to map or normalize outputs for pipelines and AI workflows would make a lot of sense.

Report

2mo ago

1 2 3

Reviews

Most Informative