JSON tree¶
Document.to_json() returns a full structural tree of the document.
Every node is discriminated on its type literal, so mypy --strict
narrows exactly one variant per branch.
Top-level¶
DocumentJson ¶
Bases: TypedDict
Full structured JSON tree returned by :meth:Document.to_json.
The tree carries every structural element the engine resolved, along
with document-level metadata, per-page geometry, and (when relevant)
decoder warnings. The payload round-trips cleanly through
:func:json.dumps / :func:json.loads.
JsonSource ¶
Bases: TypedDict
Document-level metadata block inside the JSON tree.
JsonPageInfo ¶
Bases: TypedDict
Per-page geometry entry inside the JSON tree's pages array.
JsonWarning ¶
Bases: TypedDict
Decoder or structuring warning embedded in the JSON tree.
JsonBBox ¶
Bases: TypedDict
Bounding box in normalized 0–1 page coordinates inside the JSON tree.
Discriminated element union¶
JsonElement
module-attribute
¶
JsonElement = Union[JsonDocumentElement, JsonSectionElement, JsonHeadingElement, JsonParagraphElement, JsonTableElement, JsonTableRowElement, JsonTableCellElement, JsonListElement, JsonListItemElement, JsonImageElement, JsonCodeBlockElement, JsonBlockQuoteElement, JsonPageHeaderElement, JsonPageFooterElement, JsonFootnoteElement, JsonAlignedLineElement]
JsonElementType
module-attribute
¶
JsonElementType = Literal['document', 'section', 'heading', 'paragraph', 'table', 'table_row', 'table_cell', 'list', 'list_item', 'image', 'code_block', 'block_quote', 'page_header', 'page_footer', 'footnote', 'aligned_line']
Element variants¶
JsonDocumentElement ¶
Bases: TypedDict
Root document element in the JSON tree.
JsonSectionElement ¶
Bases: TypedDict
A section element (logical grouping under a heading).
JsonHeadingElement ¶
Bases: TypedDict
A heading element with its text and level.
JsonParagraphElement ¶
Bases: TypedDict
A paragraph element with its text content.
JsonTableElement ¶
Bases: TypedDict
A table element with headers, data grid, and optional detailed cells.
JsonTableRowElement ¶
Bases: TypedDict
A table_row element — normally inlined into its parent table.
JsonTableCellElement ¶
Bases: TypedDict
A table_cell element with its row/column position and text.
JsonTableCellDetail ¶
Bases: TypedDict
Detailed cell inside a :class:JsonTableElement cells array.
The cells array is only emitted when at least one cell has a
non-trivial rowspan or colspan. rowspan, colspan and
is_header are only present when they depart from their defaults.
JsonListElement ¶
Bases: TypedDict
A list element (ordered distinguishes ordered from bulleted).
JsonListItemElement ¶
Bases: TypedDict
A list_item element with its text content.
JsonImageElement ¶
Bases: TypedDict
An image element with its MIME format and optional alt text.
JsonCodeBlockElement ¶
Bases: TypedDict
A code_block element with optional language tag.
JsonBlockQuoteElement ¶
Bases: TypedDict
A block_quote element with its text content.
JsonPageHeaderElement ¶
Bases: TypedDict
A page_header element — recurring header text on a page.
JsonPageFooterElement ¶
Bases: TypedDict
A page_footer element — recurring footer text on a page.
JsonFootnoteElement ¶
Bases: TypedDict
A footnote element carrying its identifier and text.
JsonAlignedLineElement ¶
Bases: TypedDict
A layout-preserving line with optional per-span formatting.
JsonSpan ¶
Bases: TypedDict
A formatted run inside a :class:JsonAlignedLineElement.
bold, italic and link are only emitted when at least one
span on the line carries formatting — a plain paragraph line omits the
spans array entirely.