Skip to content

Page

Page

A single page inside a :class:Document.

Pages are 1-indexed. Obtain them via :meth:Document.pages (all pages) or :meth:Document.page (a specific page number).

Example

page = doc.page(1) page.number 1 page.text()[:40] 'Quarterly revenue report — 2024 Q4 ...'

number property

number: int

1-based page number within the parent document.

Returns:

Type Description
int

An integer greater than or equal to 1.

dimensions property

dimensions: PageDimensions | None

Effective physical dimensions of the page, when available.

PDF pages expose physical geometry; HTML and XLSX do not.

Returns:

Name Type Description
A PageDimensions | None

class:PageDimensions dict, or None if the format does

PageDimensions | None

not carry page geometry.

text

text() -> str

Plain-text rendering of the page.

Returns:

Type Description
str

The page's text as a UTF-8 string, potentially empty for blank

str

pages or image-only pages without OCR.

markdown

markdown() -> str

Markdown rendering of the page with headings and list structure.

Returns:

Type Description
str

The page's content as GitHub-flavoured markdown.

images

images() -> list[ExtractedImage]

Every raster image that lives on this page.

Returns:

Type Description
list[ExtractedImage]

A list of :class:ExtractedImage dicts.

image_count

image_count() -> int

Number of images on this page.

Returns:

Type Description
int

Same as len(page.images()), without materialising the list.

links() -> list[Link]

Hyperlinks anchored on this page.

Returns:

Type Description
list[Link]

A list of :class:Link dicts.

link_count() -> int

Number of hyperlinks on this page.

Returns:

Type Description
int

Same as len(page.links()), without materialising the list.

tables

tables() -> list[Table]

Reconstructed tables whose first page is this one.

Cross-page tables are anchored on their first page — inspect the is_cross_page key to detect them.

Returns:

Type Description
list[Table]

A list of :class:Table dicts.

table_count

table_count() -> int

Number of tables anchored on this page.

Returns:

Type Description
int

Same as len(page.tables()), without materialising the list.

search

search(query: str) -> list[SearchHit]

Search for a literal substring inside this page's text.

The match is case-insensitive and substring-based.

Parameters:

Name Type Description Default
query str

The text to look for. An empty string returns no hits.

required

Returns:

Type Description
list[SearchHit]

A list of :class:SearchHit dicts.

chunk

chunk() -> Chunk | None

Text chunk produced by the default chunker for this page.

Returns:

Name Type Description
A Chunk | None

class:Chunk dict, or None when the page is empty.