How are you currently turning websites into RAG-ready data?
byβ’
While building OpenFetcher, I noticed a common pain point across RAG projects:
Most web crawlers give you raw HTML, broken text, or too much noise β and fixing it costs time and tokens.
OpenFetcher approaches crawling differently:
Crawls full domains (even when sitemaps are missing or broken)
Converts content into clean, structured Markdown
Optimized for embeddings, agents, and context windows
That said, the hosted version has limits.
Power users can self-host OpenFetcher for faster crawls and unlimited pages.
Curious to learn from the community:
What tools are you using today for web β RAG?
Whatβs the biggest issue you face: cost, quality, speed, or scale?
Would love your feedback and ideas π

7 views


Replies