How are you currently turning websites into RAG-ready data?

While building OpenFetcher, I noticed a common pain point across RAG projects:

Most web crawlers give you raw HTML, broken text, or too much noise — and fixing it costs time and tokens.

OpenFetcher approaches crawling differently:

That said, the hosted version has limits.
Power users can self-host OpenFetcher for faster crawls and unlimited pages.

Curious to learn from the community:

Would love your feedback and ideas 🙌

7 views