Extract emails, dates, and URLs instantly and privately

DataSieve 2.0 - Extract structured data from text, files and archives.

Vologram Messages—Amaze, Engage, Connect

•10d ago

DataSieve helps you turn unstructured text into clean, usable data in seconds. Drop in text, files, folders, or even archives, and extract what you need in one pass. Emails, phone numbers, URLs, dates, financial data, and more. Everything runs locally on your device, with no cloud and no tracking. What you can do - Extract multiple data types at once - Process text, PDFs, EPUBs, CSV, JSON, Word files, and more - Export results to JSON, XLSX, DOCX, and more - Define your own custom extractors

Replies

Best

Vologram Messages—Amaze, Engage, Connect

Maker

📌

Hey everyone, I’m the developer behind DataSieve (previously TextMine). This update has been a big step forward compared to the first version. The main focus for 2.x was flexibility and scale. Being able to scan folders and archives, and define custom extractors, makes it much more useful for real workflows instead of just one-off text inputs. I also spent time improving extraction accuracy for more complex data types like financial info and international formats. Happy to answer any questions, and I’d really appreciate any feedback, especially around usability and edge cases.

Report

12d ago

@albemala What about PDFs were the data is an image instead of actual text. Example a financial report scanned as an img pdf instead of as a document....

Report

10d ago

Vologram Messages—Amaze, Engage, Connect

Maker

@jonathan_alonso That's not supported yet. I'm planning to support extracting data from images in the next major release, and that will cover also images inside PDFs.

Report

10d ago

Hi Alberto, I like your idea of running everything locally. Is the list of attributes to extract static, or can I define custom ones?

Report

10d ago

Vologram Messages—Amaze, Engage, Connect

Maker

@alberto_polini You can define custom ones using regexes!

Report

10d ago

Does it work on external websites? For example, I need to extract the names of all geographical objects mentioned across several websites. Can your system do that?

Report

9d ago

Vologram Messages—Amaze, Engage, Connect

Maker

@natalia_iankovych Not yet. Extracting data from websites is something I'm planning to add in a future major release. Stay tuned!

Report

9d ago

Nice — structured data extraction is one of those problems that sounds simple until you actually try it. How does it handle ambiguous fields? For example, does it distinguish between a phone number and a fax number in unstructured text? Asking because I work on a similar challenge with voice-to-form mapping.

Report

10d ago

Vologram Messages—Amaze, Engage, Connect

Maker

@webappski as of now, the app doesn't distinguish between phone numbers and fax numbers. Structured data extraction is not easy indeed!

Report

10d ago