Chandan Kumar

Geekflare Scraping API v2 - RAG-ready web scraping that cuts your LLM token costs

Feeding raw data directly into your AI agents eats up context windows and spikes your OpenAI and Anthropic costs. Earlier this year, we launched standard HTML, JSON, and Markdown extraction. Today, we are introducing outputs built entirely for AI: markdown-llm, text-llm, and html-llm. We automatically strip out navbars, footers, ads, and scripts, delivering only the context your models actually need. You can save up to 85% on tokens compared to raw HTML when using text-llm output format.

Add a comment

Replies

Best
Chandan Kumar

Hello, everyone! 👋

Earlier this year, we launched the Geekflare Scraping API with standard Markdown, JSON, and HTML support. We prioritized your feedback about feeding our scraping results directly into AI agents and RAG pipelines.

Today we are launching our new -llm endpoints (markdown-llm, text-llm, html-llm). We do the heavy lifting behind the scenes to clean the DOM, strip the boilerplate, and return optimized structured content ready for generation.

Refer to the API reference for all supported formats.

You save up to 85% on tokens, speed up your LLM response times, and get better AI accuracy because the noise is gone.

I will be hanging out in the comments all day. Please let me know what you think and what you are building!

DAYAL PUNJABI

@chandankumar How consistent is the DOM cleaning across different CMS like Webflow vs WordPress? Any "gotcha" content types you've seen trip up the llm endpoints?

Chandan Kumar

@dayal_punjabi hello Dayal,

Our DOM cleaning is consistent across platforms because we don’t rely on CMS-specific class names. Instead, our engine uses a mix of semantic HTML analysis (<article>, <main>, etc.) and text-to-DOM density scoring to isolate the primary content block and strip away the noise.

We continuously tune as we come across any issues around tables, pre or code tags.

If you run into any issues, please let me know.

DAYAL PUNJABI

@chandankumar Thanks a lot for the response. And congrats on the launch!

Chandan Kumar

@dayal_punjabi thank you so much!

Emma Watson

Congratulations to the luanch.

Chandan Kumar

@emma_watson21 Thank you so much!