Stan Girard

Megaparse [LW24] - Open-source Document Parser to Markdown with OCR/LLMs

Megaparse is a file parser optimized for LLM Ingestion. It can parse PDFs, DOCX, PPTX in a format that is ideal for LLMs. All of that accessible from a python package, an API, or a queue.

Add a comment

Replies

Best
Tony Tong
Megaparse sounds super useful for prepping docs for LLMs! Love the flexibility with Python, API, or queue. Does it handle complex layouts or metadata well?
Ioannis Tsiokos
Love it. Markdown is becoming the de-facto in AI input processing, and proper conversion to it (without having to install a million packages) will be paramount.
Robin Philibert
Really nice! Open source, with OCR and table optimization, perfect for LLM workflows. Congrats to the team! πŸ™Œ