Olga
A Rust engine for Intelligent Document Processing, exposed through strictly typed Python bindings (PyO3 + maturin). Four formats, one API: PDF, DOCX, XLSX, HTML → clean structure, cross-page tables, RAG chunks, pre-flight processability.
# Ten-second tour. import olgadoc doc = olgadoc.Document.open("report.pdf") print(doc.format, doc.page_count) # ('PDF', 12) # Health pre-flight report = doc.processability() if report.is_blocked(): raise SystemExit(report.blockers) # Structured JSON tree — mypy-narrowed for el in doc.to_json()["elements"]: if el["type"] == "heading": print(f"h{el['level']}: {el['text']}")