Hi ProductHunt! I'm thrilled to introduce my new product, which enables you to effortlessly scrape structured data from any website without the need for custom scraping logic. It's as simple as providing the URL and specifying a JSON schema for the data you want to get back.
π Extract data from any website with our robust scraping engine, eliminating the need for custom selectors.
π Enhance your data using our built-in data enrichment tools to fill in missing details in your dataset.
π We handle headless browsers, proxies, and LLMs to ensure you obtain the structured data you're looking for.
π₯ Under the hood, SingleAPI uses a headless browser to render the website and extract the data from it. After some additional processing, the raw text data is processed using LLMs (Large Language Models) to extract the desired data in JSON format.
This means that SingleAPI is layout/design agnostic, which means that it can get data from any website, even after the website changes its layout/design.
Feel free to ask me any questions. Thank you! π
@naoto_shibata_morph Thanks, Naoto! π There should be no limitations on websites that SimpleAPI is able to parse. We're also thinking about letting users choose between different types of LLM models. This should help with making responses faster in different use cases.
@semanser Thank you Andriy! I'm wondering if your product will solve the problems our customers have.
Your idea of choosing different models sounds great! I will try it!
Report
Very cool concept, feels like it's 5 years in the future π€―
What does the response time look like? :)
Huge congrats on the launch! Psyched to give this a try.
@esus Thanks! π The response time really depends on how much data you want to fetch from a single page.
I would say it's around 4-5 seconds on a typical page (e.g., to fetch product information) and up to 15 seconds when fetching arrays of items (e.g., the list of news or some very big chunks of text).
I've done lots of optimizations to try to minimize the response time: prompt text filtering, some advanced logic to identify when the website is fully loaded, etc. This has allowed me to reduce the response time from 20+ seconds to just a few :)
There are still some things that I would like to implement, but I'm pretty satisfied with the results so far.
@vzotov Thanks! The model is used on every request, which makes it a bit slower (the average response time for the LLM model is around 1-2 seconds), but it helps to handle a lot of edge cases related to the layout changes, incorrect selectors, etc. It can actually be a nice additional feature to be able to select between selector caching and using the LLM on every request. π
Such a great idea, JSON makes software development super duper easy. It is human readable and that's about it. I hate YAML btw but I still use it because Google Cloud requires it.
@kingromstar Thank you! We're planning to support more data formats in the future. In the end, it's pretty easy to convert JSON to anything nowadays. But JSON was an obvious choice since it's one of the most popular and well-supported data formats out there.
Report
A very good one. I do hope to see this further developed soon as our company will be needing this kind of service.
DepsHub
Morph
DepsHub
Morph
DepsHub
Siteline
DepsHub
Go Mail Merge
DepsHub
DepsHub
Morph
DepsHub