Launching today

LocalPDF.io

Launching today

Process your legal/medical/financial documents locally

36 followers

Process your legal/medical/financial documents locally

36 followers

Visit website

PDF Editor

Processing sensitive data shouldn't mean compromising privacy. With cloud AI platforms raising trust concerns and compliance hurdles, privacy-centric firms need complete control. LocalPDF lets you analyze legal, medical, and financial documents 100% locally on your device. Zero data leaves your machine, ensuring enterprise-grade security compliance without relying on OpenAI, Gemini or Anthropic.

Free Options

Launch tags:Privacy•Legal•Artificial Intelligence

Launch Team / Built With

Framer — Launch websites with enterprise needs at startup speeds.

Launch websites with enterprise needs at startup speeds.

Promoted

LocalPDF.io

Hunter

@Y Combinator @aaron_epstein I am launching LocalPDF.io can you please checkout,

Thanks.

Report

1d ago

LocalPDF.io

Hunter

📌

Hey Product Hunt! 👋 I'll be honest: I don't trust cloud AI platforms. Handing over sensitive legal, medical, or financial documents to third-party servers always felt like an unacceptable security risk to me. We shouldn't have to choose between using powerful AI tools and keeping our data secure. I built LocalPDF.io because I wanted a way to process and analyze my documents where I am the only one in control. It operates completely offline, right on your device. Nothing is sent to the cloud. Nothing is used to train someone else's model. If you work in a privacy-centric firm or you're just protective of your own data like I am, I built this for you. I'm here all day to answer your questions. Would love to hear your feedback!

Report

1d ago

Product Hunt

A lot of local-document tools break down on real-world PDFs (scanned docs, tables, multi-column layouts, mixed file types). What did you implement to make extraction/indexing reliable in those cases, and what are the failure modes you still see?

Report

18h ago

LocalPDF.io

Hunter

Hey@curiouskitty Here is exactly what we built and where it still breaks:

How we made extraction reliable:

Scanned & Messy Layouts: We try native extraction first. If the file is scanned, or if our app detects the layout is mangled (e.g., averaging >300 chars per line), it automatically falls back to Apple's Vision AI. It renders the page at 2x scale and reads the text visually.
Mixed File Types: We don't force everything into a PDF parser. We built dedicated native extractors for different formats (.docx, .md, .csv, etc.) before passing them to the AI pipeline.
Safe Indexing (Chunking): We don't chop text blindly at a character limit. Our chunking algorithm actively searches backward to split only at natural paragraph (\n\n) or sentence (. ) breaks, with overlapping text so context is never lost.

Our honest failure modes:

Multi-Column Layouts: Our OCR currently sorts text strictly by the Y-axis (top-to-bottom). So on a 2-column academic paper, it mistakenly reads straight across the page (left-to-right), which jumbles the text.
Tables: We extract the text, but the grid structure gets completely flattened to plain text. If you ask the AI to "compare row 3 and 4," it struggles to understand the spatial formatting.

We are actively working on a layout-aware X/Y sorting update to fix columns and tables!

Report

9h ago

Hey@curiouskitty Here is exactly what we built and where it still breaks:

How we made extraction reliable:

Scanned & Messy Layouts: We try native extraction first. If the file is scanned, or if our app detects the layout is mangled (e.g., averaging >300 chars per line), it automatically falls back to Apple's Vision AI. It renders the page at 2x scale and reads the text visually.
Mixed File Types: We don't force everything into a PDF parser. We built dedicated native extractors for different formats (.docx, .md, .csv, etc.) before passing them to the AI pipeline.
Safe Indexing (Chunking): We don't chop text blindly at a character limit. Our chunking algorithm actively searches backward to split only at natural paragraph (\n\n) or sentence (. ) breaks, with overlapping text so context is never lost.

Our honest failure modes:

Multi-Column Layouts: Our OCR currently sorts text strictly by the Y-axis (top-to-bottom). So on a 2-column academic paper, it mistakenly reads straight across the page (left-to-right), which jumbles the text.
Tables: We extract the text, but the grid structure gets completely flattened to plain text. If you ask the AI to "compare row 3 and 4," it struggles to understand the spatial formatting.

We are actively working on a layout-aware X/Y sorting update to fix columns and tables!