Skip to content

Processability

Processability

Health report for an opened :class:Document.

A Processability instance tells you — before the rest of your pipeline starts spending money — whether the document actually carries native, extractable text, and how cleanly Olga can process it. It distinguishes blockers (issues that stop processing outright — most commonly EmptyContent on scanned PDFs that need OCR upstream, plus Encrypted and DecodeFailed) from degradations (issues that still allow processing but reduce fidelity).

Example
report = doc.processability()
report.health              # -> 'degraded'
report.is_processable      # -> True
[i["kind"] for i in report.degradations]
# -> HeuristicStructure, PartialExtraction

health property

health: HealthLabel

The overall health verdict.

Returns:

Type Description
HealthLabel

One of "ok", "degraded" or "blocked".

is_processable property

is_processable: bool

Whether the document can be processed at all.

Returns:

Type Description
bool

False only when :attr:health is "blocked".

pages_total property

pages_total: int

Total number of pages detected in the document.

Returns:

Type Description
int

Page count, greater than or equal to zero.

pages_with_content property

pages_with_content: int

Number of pages that carry non-empty text after extraction.

Returns:

Type Description
int

Page count, bounded above by :attr:pages_total.

warning_count property

warning_count: int

Total number of warnings emitted while loading the document.

Returns:

Type Description
int

Warning count. 0 for a clean document.

blockers property

blockers: list[HealthIssue]

Issues that prevent processing outright.

Returns:

Type Description
list[HealthIssue]

A list of {"kind": str, ...} dicts. Empty when the document

list[HealthIssue]

is processable.

degradations property

degradations: list[HealthIssue]

Issues that allow processing but reduce extraction fidelity.

Returns:

Type Description
list[HealthIssue]

A list of {"kind": str, ...} dicts.

is_ok

is_ok() -> bool

Whether the document is fully processable with no degradations.

Returns:

Type Description
bool

True when :attr:health is "ok".

is_degraded

is_degraded() -> bool

Whether the document is processable but has at least one degradation.

Returns:

Type Description
bool

True when :attr:health is "degraded".

is_blocked

is_blocked() -> bool

Whether the document cannot be processed.

Returns:

Type Description
bool

True when :attr:health is "blocked".