Reader

Note

It’s required to install reader, pdf and codes from Optional dependencies section.

Aiviro Reader allows you to process PDF files and extract crucial information. Whether it’s vendor IDs, customer IDs, total amounts, or more, it simplifies data extraction from invoices.

Invoice Processing

Supported field extraction
Name	Type	Description
language	`Language`	Language of the invoice
customer	`InvoiceCustomer`	Customer details
vendor	`InvoiceVendor`	Vendor details
invoice_id	`str`	Invoice number
invoice_date	`datetime.date`	Date the invoice was issued
due_date	`datetime.date`	Date payment for this invoice is due
tax_date	`datetime.date`	Date the tax was applied to the invoice
order_number	`list`, `str`	Order reference number
primary_total	`list`, `str`	Details about primary amounts (total_amount, total_amount_without_tax, total_tax, amount_due) and currency
secondary_total	`InvoiceTotals`	Details about secondary amounts (total_amount, total_amount_without_tax, total_tax, amount_due) and currency
exchange_rate	`decimal.Decimal`	Exchange rate between primary and secondary currency
variable_symbol	`str`	Variable symbol of the invoice
payment_term	`str`	The terms of payment for the invoice
bank_accounts	`list`, `InvoiceBankAccount`	List of bank accounts
items	`list`, `InvoiceItem`	List of invoice items, filtered by total_amount and total_amount_without_tax
raw_items	`list`, `InvoiceItem`	List of unfiltered invoice items

Invoice Address
Name	Type	Description
house_number	`str`	House number of the address
road	`str`	Road of the address
city	`str`	Name of the city
postal_code	`str`	Postal code of the city
street_address	`str`	Full street address (combined as road and house_number)
country_code_A3	`str`	Country code as alpha-3 (3 letters), e.g., CZE, DEU, AUT, SVK, etc.

Invoice totals
Name	Type	Description
total_amount	`decimal.Decimal`	Total amount of the invoice
total_amount_without_tax	`decimal.Decimal`	Total amount of the invoice without tax
total_tax	`decimal.Decimal`	Total tax amount of the invoice
amount_due	`decimal.Decimal`	Amount due for the invoice
currency	`str`	Currency of the invoice

Customer
id	`str`	Customer reference ID
ico	`str`	The Czech company ID number of the customer
tax_id	`str`	The taxpayer number associated with the customer
address	`InvoiceAddress`	Mailing address for the customer
address_recipient	`str`	Name associated with the customer address
name	`str`	Name of the customer

Vendor
id	`str`	Vendor reference ID
ico	`str`	The Czech company ID number of the vendor
tax_id	`str`	The taxpayer number associated with the vendor
address	`InvoiceAddress`	Vendor mailing address
address_recipient	`str`	Name associated with the vendor address
name	`str`	Name of the vendor
is_not_tax_payer	`bool`	Check for vendor tax payer status

Invoice Bank Account
Name	Type	Description
iban	`str`	IBAN of the bank account
swift	`str`	SWIFT (BIC) code of the bank account
bank_name	`str`	Name of the bank
country_code	`str`	Country code of the bank account, e.g., CZ, DE, etc.
local_account	`LocalBankAccount`	Contains bank code, account number and account prefix

Invoice items
Name	Type	Description
index	`int`	Line item index, starting from 0
item	`str`	Full string text line of the line item
description	`str`	The text description for the invoice line item
quantity	`decimal.Decimal`	The quantity of the item
unit_price	`decimal.Decimal`	The net or gross price of one unit of this item, primarily net
unit	`str`	The unit of the line item, e.g, kg, lb etc.
product_code	`str`	Product code, product number, SKU, etc.
tax	`decimal.Decimal`	Tax associated with the line item
tax_rate	`decimal.Decimal`	Tax rate associated with the line item
amount	`decimal.Decimal`	Total gross amount of the line item
amount_without_tax	`decimal.Decimal`	Total net amount of the line item
identifier	`str`	Identifier of the item, found in product code, description or item’s content
tag	`str`	Tag of the item associated based on the identifier

class aiviro.modules.reader.Language(value)

Supported languages for invoice reader.

CZ = 'cs': Czech

SK = 'sk': Slovak

EN = 'en': English

DE = 'de': German

PL = 'pl': Polish

class aiviro.modules.reader.InvoiceReader(pdf_r: PDFRobot, reader_config: ReaderConfig | None = None, global_context: GlobalContext | None = None)

Reads pdf invoice file and extracts data from it.

Parameters:

pdf_r – PDFRobot instance.
reader_config – Configuration for InvoiceReader.

Note

If you receive a 401 Unauthorized Error, please contact our support team. This error typically indicates that you may be missing the necessary permissions for the API.

Example:

>>> from aiviro.modules.reader import InvoiceReader
>>> from aiviro.modules.pdf import create_pdf_robot
>>>
>>> if __name__ == "__main__":
...     pdf_r = create_pdf_robot("path/to/invoice.pdf")
...     reader = InvoiceReader(pdf_r)
...     extracted_data = reader.parse()
...
...     # print value of invoice-id
...     print(extracted_data.invoice_id.value)
...     # '123456789'
...
...     # print items
...     for item in extracted_data.items:
...         print(item.product_code.value)
...         print(item.amount.value)
...     # "ACD-123"
...     # Decimal('100.00')
...     # "DC-456"
...     # Decimal('157.23')

>>> from aiviro.modules.reader import InvoiceReader, ReaderConfig
>>> from aiviro.modules.pdf import create_pdf_robot
>>>
>>> if __name__ == "__main__":
...     pdf_r = create_pdf_robot("path/to/invoice.pdf")
...     reader_config = ReaderConfig([r"(\d{4,6})", r"([A-Z]{2}\d{4})"])
...     reader = InvoiceReader(pdf_r, reader_config)
...     extracted_data = reader.parse()
...
...     # print value of order-numbers
...     for order_number in extracted_data.order_number:
...         print(order_number.value)
...     # '4654'
...     # '123456'
...     # 'AC1234'

property invoice_data: InvoiceData

Return merged extracted data from all processors. Call parse() first to extract data.

Returns:: Extracted data from invoice.

parse(offline_cloud_data: DocumentInvoiceV2 | None = None, isdoc_file: Path | None = None) → InvoiceData

Parse invoice and return extracted data.

Parameters:

offline_cloud_data – If set, cloud processor will not be used and this data will be used instead.
isdoc_file – Path to .isdoc file, which is not included in the pdf.

Returns:

Extracted data from invoice.

add_items_tag(items: list[InvoiceItem], item_identifiers: set[str] | dict[str, set[str]], overwrite: bool = False) → list[InvoiceItem]

Add identifier & tag to items based on their identifiers.

Parameters:

items – List of InvoiceItems.
item_identifiers – Dictionary where key is tag name and value is set of identifiers.
overwrite – If True, tags will be overwritten, otherwise they will be appended.

Returns:

List of items with added identifiers & tags.

class aiviro.modules.reader.ISDOCProcessor(reader_config: ReaderConfig)

Processor for isdoc files. If you’re processing Invoices, we recommend using InvoiceReader. If for some reason you need to process only isdoc files, use this class.

Example:

>>> import pathlib
>>> from collections import OrderedDict
>>> from aiviro.modules.reader import ISDOCProcessor, ReaderConfig, InvoiceData
>>> from aiviro.modules.pdf import create_pdf_robot
>>>
>>> # Process .isdoc file directly, without PDFRobot
>>> def process_isdoc(isdoc_path: pathlib.Path) -> InvoiceData:
...     processor = ISDOCProcessor(ReaderConfig())
...     processor.isdoc_path = isdoc_path
...     return processor.process(None, OrderedDict())
...
>>> # Process .isdoc file from PDF, file is included as pdf attachment
>>> def process_isdoc_pdf(pdf_path: pathlib.Path) -> InvoiceData:
...    pdf_robot = create_pdf_robot(pdf_path)
...    processor = ISDOCProcessor(ReaderConfig())
...    return processor.process(pdf_robot, OrderedDict())

class aiviro.modules.reader.ReaderConfig(order_number_formats: list[str] = <factory>, order_number_ignore_keywords: bool = False, document_language: ~aiviro.modules.reader.common.keywords.Language | None = None, item_identifiers: set[str] | dict[str, set[str]] = <factory>, known_customer_tax_id: str | None = None, known_vendor_tax_id: str | None = None)

Configuration for InvoiceReader.

Parameters:

order_number_formats – List of regex patterns for order number, if not provided, default patterns will be used.
order_number_ignore_keywords – If set, keywords for order number will be ignored. And therefore, the reader will try to find the order number on the every page. Keyword is a word or phrase that defines where the order number is located, e.g. “Reference: OD1234”. The “Reference” is the keyword in this case, and “OD1234” is the order number.
document_language – If set auto-detection of language will be skipped.
item_identifiers – If provided, reader will try to find the identifier in the item’s product_code, description or content. If the identifier is found, it will be stored in the item’s identifier field. In case dictionary is provided, the key is the tag name and the value is a set of possible identifiers.
known_customer_tax_id – If provided, Reader will correctly recognize if vendor and customer data are swapped and corrects it
known_vendor_tax_id – If provided, Reader will correctly recognize if vendor and customer data are swapped and corrects it

Example:

>>> from aiviro.modules.reader import OrderNumberFormats, ReaderConfig
>>>
>>> if __name__ == "__main__":
...     # use predefined patterns (helios pattern '123456789/123')
...     reader_config = ReaderConfig(
...         order_number_formats=OrderNumberFormats.PRIMARY_REGEX_9SLASH3,
...         order_number_ignore_keywords=True,
...         item_identifiers={"CODE1234", "DESC1234", "CONTENT1234"},
...         known_customer_tax_id="CZ12345678"
...     )

class aiviro.modules.reader.InvoiceData(language: Optional[aiviro.modules.reader.common.keywords.Language] = None, customer: aiviro.modules.reader.storage.InvoiceCustomer = InvoiceCustomer(id=InvoiceField(value=None, bound_box=None, page_index=-1), tax_id=InvoiceField(value=None, bound_box=None, page_index=-1), address=InvoiceField(value=None, bound_box=None, page_index=-1), address_recipient=InvoiceField(value=None, bound_box=None, page_index=-1), ico=InvoiceField(value=None, bound_box=None, page_index=-1), name=InvoiceField(value=None, bound_box=None, page_index=-1)), vendor: aiviro.modules.reader.storage.InvoiceVendor = InvoiceVendor(id=InvoiceField(value=None, bound_box=None, page_index=-1), tax_id=InvoiceField(value=None, bound_box=None, page_index=-1), address=InvoiceField(value=None, bound_box=None, page_index=-1), address_recipient=InvoiceField(value=None, bound_box=None, page_index=-1), ico=InvoiceField(value=None, bound_box=None, page_index=-1), name=InvoiceField(value=None, bound_box=None, page_index=-1), is_not_tax_payer=InvoiceField(value=None, bound_box=None, page_index=-1)), shipping_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), shipping_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), due_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), tax_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), order_number: list[aiviro.modules.reader.storage.InvoiceField[str]] = <factory>, primary_total: aiviro.modules.reader.storage.InvoiceTotals = InvoiceTotals(total_amount=InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax=InvoiceField(value=None, bound_box=None, page_index=-1), total_tax=InvoiceField(value=None, bound_box=None, page_index=-1), amount_due=InvoiceField(value=None, bound_box=None, page_index=-1), currency=InvoiceField(value=None, bound_box=None, page_index=-1)), secondary_total: aiviro.modules.reader.storage.InvoiceTotals = InvoiceTotals(total_amount=InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax=InvoiceField(value=None, bound_box=None, page_index=-1), total_tax=InvoiceField(value=None, bound_box=None, page_index=-1), amount_due=InvoiceField(value=None, bound_box=None, page_index=-1), currency=InvoiceField(value=None, bound_box=None, page_index=-1)), exchange_rate: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), variable_symbol: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), payment_term: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), bank_accounts: list[aiviro.modules.reader.storage.InvoiceBankAccount] = <factory>, items: list[aiviro.modules.reader.storage.InvoiceItem] = <factory>, raw_items: list[aiviro.modules.reader.storage.InvoiceItem] = <factory>)

get_page_boxes(exclude_fields: set[str] | None = None) → dict[int, list[BoundBox]]

Returns a dictionary of page index and list of BoundBox objects on that page.

Parameters:: exclude_fields – Set of field names to exclude extraction of boxes.

class aiviro.modules.reader.InvoiceAddress(house_number: str, road: str, city: str, postal_code: str, street_address: str, country_code_A3: str)

class aiviro.modules.reader.InvoiceBankAccount(iban: aiviro.modules.reader.storage.InvoiceField[str], swift: aiviro.modules.reader.storage.InvoiceField[str], bank_name: aiviro.modules.reader.storage.InvoiceField[str], country_code: aiviro.modules.reader.storage.InvoiceField[str], local_account: aiviro.modules.reader.storage.LocalBankAccount)

class aiviro.modules.reader.LocalBankAccount(bank_code: aiviro.modules.reader.storage.InvoiceField[str], account_code: aiviro.modules.reader.storage.InvoiceField[str], account_prefix: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1))

property full_account_number: str: Returns full account number in format ‘prefix-account_code/bank_code’.

classmethod from_acc_number(account_number: str, bank_code: str = '') → LocalBankAccount

Creates LocalBankAccount object from account number and bank code.

Parameters:

account_number – usually in form of ‘123456-1234567890/1234 (prefix-account_code/bank_code)’, or just ‘123456-1234567890’ (prefix-account_code), or ‘1234567890/1234’ (account_code/bank_code)
bank_code – code of the bank (might be provided already in account_number) - e.g. ‘1234’, ‘ABCD’

class aiviro.modules.reader.InvoiceItem(index: int, item: aiviro.modules.reader.storage.InvoiceField[str], description: aiviro.modules.reader.storage.InvoiceField[str], quantity: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit_price: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit: aiviro.modules.reader.storage.InvoiceField[str], product_code: aiviro.modules.reader.storage.InvoiceField[str], tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], tax_rate: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], amount_without_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), identifier: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), tags: list[aiviro.modules.reader.storage.InvoiceField[str]] = <factory>)

class aiviro.modules.reader.InvoiceTotals(total_amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), amount_due: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), currency: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1))

property contains_totals: bool: Returns True if all totals are not empty, otherwise False, currency can be missing.

property is_valid: bool: Returns True if totals exist and their values are valid towards each other

class aiviro.modules.reader.InvoiceField(value: T | None = None, bound_box: aiviro.core.utils.bound_box.bbox.BoundBox | None = None, page_index: int = -1)

class aiviro.modules.reader.InvoiceCustomer(id: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), tax_id: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), address: InvoiceField[InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), address_recipient: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), ico: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), name: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1)): Represents customer data obtained from the invoice

class aiviro.modules.reader.InvoiceVendor(id: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), tax_id: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), address: InvoiceField[InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), address_recipient: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), ico: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), name: InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), is_not_tax_payer: InvoiceField[bool] = InvoiceField(value=None, bound_box=None, page_index=-1)): Represents vendor data obtained from the invoice