Processability¶
Processability ¶
Health report for an opened :class:Document.
A Processability instance tells you — before the rest of your
pipeline starts spending money — whether the document actually
carries native, extractable text, and how cleanly Olga can process
it. It distinguishes blockers (issues that stop processing outright
— most commonly EmptyContent on scanned PDFs that need OCR
upstream, plus Encrypted and DecodeFailed) from
degradations (issues that still allow processing but reduce
fidelity).
Example
health
property
¶
health: HealthLabel
The overall health verdict.
Returns:
| Type | Description |
|---|---|
HealthLabel
|
One of |
is_processable
property
¶
Whether the document can be processed at all.
Returns:
| Type | Description |
|---|---|
bool
|
|
pages_total
property
¶
Total number of pages detected in the document.
Returns:
| Type | Description |
|---|---|
int
|
Page count, greater than or equal to zero. |
pages_with_content
property
¶
Number of pages that carry non-empty text after extraction.
Returns:
| Type | Description |
|---|---|
int
|
Page count, bounded above by :attr: |
warning_count
property
¶
Total number of warnings emitted while loading the document.
Returns:
| Type | Description |
|---|---|
int
|
Warning count. |
blockers
property
¶
blockers: list[HealthIssue]
Issues that prevent processing outright.
Returns:
| Type | Description |
|---|---|
list[HealthIssue]
|
A list of |
list[HealthIssue]
|
is processable. |
degradations
property
¶
degradations: list[HealthIssue]
Issues that allow processing but reduce extraction fidelity.
Returns:
| Type | Description |
|---|---|
list[HealthIssue]
|
A list of |
is_ok ¶
Whether the document is fully processable with no degradations.
Returns:
| Type | Description |
|---|---|
bool
|
|
is_degraded ¶
Whether the document is processable but has at least one degradation.
Returns:
| Type | Description |
|---|---|
bool
|
|
is_blocked ¶
Whether the document cannot be processed.
Returns:
| Type | Description |
|---|---|
bool
|
|