Payloads¶
Every method that returns a dict returns a real TypedDict — these are
runtime classes, introspectable with __annotations__, and fully typed
for mypy --strict consumers.
Geometry¶
BoundingBox ¶
Bases: TypedDict
Axis-aligned bounding box in page coordinates (points).
PageDimensions ¶
Bases: TypedDict
Effective page dimensions returned by :attr:Page.dimensions.
Extracted content¶
Link ¶
Bases: TypedDict
A hyperlink extracted from a document.
Table ¶
Bases: TypedDict
A reconstructed table, potentially spanning multiple pages.
TableCell ¶
Bases: TypedDict
A single cell inside a :class:Table.
SearchHit ¶
Bases: TypedDict
A single match returned by :meth:Document.search or :meth:Page.search.
Chunk ¶
Bases: TypedDict
A per-page text chunk returned by :meth:Document.chunks_by_page.
OutlineEntry ¶
Bases: TypedDict
A single heading in :meth:Document.outline.
ExtractedImage ¶
Bases: TypedDict
A raster image extracted from a document.
alt_text is required to be present as a key but may be None when
the source format does not carry alt text.
Processability¶
HealthIssueKind
module-attribute
¶
HealthIssueKind = Literal['Encrypted', 'EmptyContent', 'DecodeFailed', 'ApproximatePagination', 'HeuristicStructure', 'PartialExtraction', 'MissingPart', 'UnresolvedStyle', 'UnresolvedRelationship', 'MissingMedia', 'TruncatedContent', 'MalformedContent', 'FilteredArtifact', 'SuspectedArtifact', 'OtherWarnings']
HealthIssue
module-attribute
¶
HealthIssue = Union[HealthIssueSimple, HealthIssueApproximatePagination, HealthIssueHeuristicStructure, HealthIssueCounted]
HealthIssueSimple ¶
Bases: TypedDict
Health issue variants that carry no extra payload.
HealthIssueApproximatePagination ¶
Bases: TypedDict
Pages are approximate — the engine inferred boundaries heuristically.
HealthIssueHeuristicStructure ¶
Bases: TypedDict
Document structure was reconstructed heuristically on pages pages.
HealthIssueCounted ¶
Bases: TypedDict
Every health variant that carries an occurrence count.