Skip to content

JSON tree

Document.to_json() returns a full structural tree of the document. Every node is discriminated on its type literal, so mypy --strict narrows exactly one variant per branch.

Top-level

DocumentJson

Bases: TypedDict

Full structured JSON tree returned by :meth:Document.to_json.

The tree carries every structural element the engine resolved, along with document-level metadata, per-page geometry, and (when relevant) decoder warnings. The payload round-trips cleanly through :func:json.dumps / :func:json.loads.

JsonSource

Bases: TypedDict

Document-level metadata block inside the JSON tree.

JsonPageInfo

Bases: TypedDict

Per-page geometry entry inside the JSON tree's pages array.

JsonWarning

Bases: TypedDict

Decoder or structuring warning embedded in the JSON tree.

JsonBBox

Bases: TypedDict

Bounding box in normalized 0–1 page coordinates inside the JSON tree.

Discriminated element union

JsonElementType module-attribute

JsonElementType = Literal['document', 'section', 'heading', 'paragraph', 'table', 'table_row', 'table_cell', 'list', 'list_item', 'image', 'code_block', 'block_quote', 'page_header', 'page_footer', 'footnote', 'aligned_line']

Element variants

JsonDocumentElement

Bases: TypedDict

Root document element in the JSON tree.

JsonSectionElement

Bases: TypedDict

A section element (logical grouping under a heading).

JsonHeadingElement

Bases: TypedDict

A heading element with its text and level.

JsonParagraphElement

Bases: TypedDict

A paragraph element with its text content.

JsonTableElement

Bases: TypedDict

A table element with headers, data grid, and optional detailed cells.

JsonTableRowElement

Bases: TypedDict

A table_row element — normally inlined into its parent table.

JsonTableCellElement

Bases: TypedDict

A table_cell element with its row/column position and text.

JsonTableCellDetail

Bases: TypedDict

Detailed cell inside a :class:JsonTableElement cells array.

The cells array is only emitted when at least one cell has a non-trivial rowspan or colspan. rowspan, colspan and is_header are only present when they depart from their defaults.

JsonListElement

Bases: TypedDict

A list element (ordered distinguishes ordered from bulleted).

JsonListItemElement

Bases: TypedDict

A list_item element with its text content.

JsonImageElement

Bases: TypedDict

An image element with its MIME format and optional alt text.

JsonCodeBlockElement

Bases: TypedDict

A code_block element with optional language tag.

JsonBlockQuoteElement

Bases: TypedDict

A block_quote element with its text content.

JsonPageHeaderElement

Bases: TypedDict

A page_header element — recurring header text on a page.

JsonPageFooterElement

Bases: TypedDict

A page_footer element — recurring footer text on a page.

JsonFootnoteElement

Bases: TypedDict

A footnote element carrying its identifier and text.

JsonAlignedLineElement

Bases: TypedDict

A layout-preserving line with optional per-span formatting.

JsonSpan

Bases: TypedDict

A formatted run inside a :class:JsonAlignedLineElement.

bold, italic and link are only emitted when at least one span on the line carries formatting — a plain paragraph line omits the spans array entirely.