Reader
Warning
Previous implementation of the Reader was moved to Universal Parser section. This is a new improved version of the Reader, old version is deprecated and will be removed in the future.
Note
It’s required to install reader
, pdf
and codes
from Optional dependencies section.
Aiviro Reader allows you to process PDF files and extract crucial information. Whether it’s vendor IDs, customer IDs, total amounts, or more, it simplifies data extraction from invoices.
Invoice Processing
Name |
Type |
Description |
---|---|---|
language |
Language of the invoice |
|
customer_id |
Customer reference ID |
|
customer_ico |
The Czech company ID number of the customer |
|
customer_tax_id |
The taxpayer number associated with the customer |
|
customer_address |
Mailing address for the customer |
|
customer_address_recipient |
Name associated with the customer address |
|
customer_name |
Name of the customer |
|
vendor_id |
Vendor reference ID |
|
vendor_ico |
The Czech company ID number of the vendor |
|
vendor_tax_id |
The taxpayer number associated with the vendor |
|
vendor_address |
Vendor mailing address |
|
vendor_address_recipient |
Name associated with the vendor address |
|
vendor_name |
Name of the vendor |
|
vendor_is_not_tax_payer |
Check for vendor tax payer status |
|
invoice_id |
Invoice number |
|
invoice_date |
Date the invoice was issued |
|
due_date |
Date payment for this invoice is due |
|
tax_date |
Date the tax was applied to the invoice |
|
order_number |
Order reference number |
|
primary_total |
Details about primary amounts (total_amount, total_amount_without_tax, total_tax, amount_due) and currency |
|
secondary_total |
Details about secondary amounts (total_amount, total_amount_without_tax, total_tax, amount_due) and currency |
|
exchange_rate |
Exchange rate between primary and secondary currency |
|
variable_symbol |
Variable symbol of the invoice |
|
payment_term |
The terms of payment for the invoice |
|
bank_accounts |
List of bank accounts |
|
items |
List of invoice items, filtered by total_amount and total_amount_without_tax |
|
raw_items |
List of unfiltered invoice items |
Name |
Type |
Description |
house_number |
House number of the address |
|
road |
Road of the address |
|
city |
Name of the city |
|
postal_code |
Postal code of the city |
|
street_address |
Full street address (combined as road and house_number) |
|
country_code_A3 |
Country code as alpha-3 (3 letters), e.g., CZE, DEU, AUT, SVK, etc. |
Name |
Type |
Description |
total_amount |
Total amount of the invoice |
|
total_amount_without_tax |
Total amount of the invoice without tax |
|
total_tax |
Total tax amount of the invoice |
|
amount_due |
Amount due for the invoice |
|
currency |
Currency of the invoice |
Name |
Type |
Description |
---|---|---|
iban |
IBAN of the bank account |
|
swift |
SWIFT (BIC) code of the bank account |
|
bank_name |
Name of the bank |
|
country_code |
Country code of the bank account, e.g., CZ, DE, etc. |
|
local_account |
Contains bank code, account number and account prefix |
Name |
Type |
Description |
---|---|---|
index |
Line item index, starting from 0 |
|
item |
Full string text line of the line item |
|
description |
The text description for the invoice line item |
|
quantity |
The quantity of the item |
|
unit_price |
The net or gross price of one unit of this item, primarily net |
|
unit |
The unit of the line item, e.g, kg, lb etc. |
|
product_code |
Product code, product number, SKU, etc. |
|
tax |
Tax associated with the line item |
|
tax_rate |
Tax rate associated with the line item |
|
amount |
Total gross amount of the line item |
|
amount_without_tax |
Total net amount of the line item |
|
identifier |
Identifier of the item, found in product code, description or item’s content |
|
tag |
Tag of the item associated based on the identifier |
- class aiviro.modules.reader.Language(value)
Supported languages for invoice reader.
- CZ = 'cs'
Czech
- SK = 'sk'
Slovak
- EN = 'en'
English
- DE = 'de'
German
- PL = 'pl'
Polish
- class aiviro.modules.reader.InvoiceReader(pdf_r: PDFRobot, reader_config: ReaderConfig | None = None)
Reads pdf invoice file and extracts data from it.
- Parameters:
pdf_r – PDFRobot instance.
reader_config – Configuration for InvoiceReader.
Note
If you receive a 401 Unauthorized Error, please contact our support team. This error typically indicates that you may be missing the necessary permissions for the API.
- Example:
>>> from aiviro.modules.reader import InvoiceReader >>> from aiviro.modules.pdf import create_pdf_robot >>> >>> if __name__ == "__main__": ... pdf_r = create_pdf_robot("path/to/invoice.pdf") ... reader = InvoiceReader(pdf_r) ... extracted_data = reader.parse() ... ... # print value of invoice-id ... print(extracted_data.invoice_id.value) ... # '123456789' ... ... # print items ... for item in extracted_data.items: ... print(item.product_code.value) ... print(item.amount.value) ... # "ACD-123" ... # Decimal('100.00') ... # "DC-456" ... # Decimal('157.23')
>>> from aiviro.modules.reader import InvoiceReader, ReaderConfig >>> from aiviro.modules.pdf import create_pdf_robot >>> >>> if __name__ == "__main__": ... pdf_r = create_pdf_robot("path/to/invoice.pdf") ... reader_config = ReaderConfig([r"(\d{4,6})", r"([A-Z]{2}\d{4})"]) ... reader = InvoiceReader(pdf_r, reader_config) ... extracted_data = reader.parse() ... ... # print value of order-numbers ... for order_number in extracted_data.order_number: ... print(order_number.value) ... # '4654' ... # '123456' ... # 'AC1234'
- property invoice_data: InvoiceData
Return merged extracted data from all processors. Call
parse()
first to extract data.- Returns:
Extracted data from invoice.
- parse(offline_cloud_data: Dict | None = None, isdoc_file: Path | None = None) InvoiceData
Parse invoice and return extracted data.
- Parameters:
offline_cloud_data – If set, cloud processor will not be used and this data will be used instead.
isdoc_file – Path to .isdoc file, which is not included in the pdf.
- Returns:
Extracted data from invoice.
- add_items_tag(items: List[InvoiceItem], item_identifiers: Set[str] | Dict[str, Set[str]], overwrite: bool = False) List[InvoiceItem]
Add identifier & tag to items based on their identifiers.
- Parameters:
items – List of InvoiceItems.
item_identifiers – Dictionary where key is tag name and value is set of identifiers.
overwrite – If True, tags will be overwritten, otherwise they will be appended.
- Returns:
List of items with added identifiers & tags.
- class aiviro.modules.reader.ISDOCProcessor(config: GlobalConfig, reader_config: ReaderConfig)
Processor for isdoc files. If you’re processing Invoices, we recommend using
InvoiceReader
. If for some reason you need to process only isdoc files, use this class.Example:
>>> import pathlib >>> from collections import OrderedDict >>> from aiviro.modules.reader import ISDOCProcessor, ReaderConfig, InvoiceData >>> from aiviro.core.utils.configuration import get_global_config >>> from aiviro.modules.pdf import create_pdf_robot >>> >>> # Process .isdoc file directly, without PDFRobot >>> def process_isdoc(isdoc_path: pathlib.Path) -> InvoiceData: ... processor = ISDOCProcessor(get_global_config(), ReaderConfig()) ... processor.isdoc_path = isdoc_path ... return processor.process(None, OrderedDict()) ... >>> # Process .isdoc file from PDF, file is included as pdf attachment >>> def process_isdoc_pdf(pdf_path: pathlib.Path) -> InvoiceData: ... pdf_robot = create_pdf_robot(pdf_path) ... processor = ISDOCProcessor(get_global_config(), ReaderConfig()) ... return processor.process(pdf_robot, OrderedDict())
- class aiviro.modules.reader.ReaderConfig(order_number_formats: ~typing.List[str] = <factory>, order_number_ignore_keywords: bool = False, document_language: ~aiviro.modules.reader.common.keywords.Language | None = None, item_identifiers: ~typing.Set[str] | ~typing.Dict[str, ~typing.Set[str]] = <factory>)
Configuration for InvoiceReader.
- Parameters:
order_number_formats – List of regex patterns for order number, if not provided, default patterns will be used.
order_number_ignore_keywords – If set, keywords for order number will be ignored. And therefore, the reader will try to find the order number on the every page. Keyword is a word or phrase that defines where the order number is located, e.g. “Reference: OD1234”. The “Reference” is the keyword in this case, and “OD1234” is the order number.
document_language – If set auto-detection of language will be skipped.
item_identifiers – If provided, reader will try to find the identifier in the item’s product_code, description or content. If the identifier is found, it will be stored in the item’s
identifier
field. In case dictionary is provided, the key is thetag
name and the value is a set of possible identifiers.
- Example:
>>> from aiviro.modules.reader import OrderNumberFormats, ReaderConfig >>> >>> if __name__ == "__main__": ... # use predefined patterns (helios pattern '123456789/123') ... reader_config = ReaderConfig( ... order_number_formats=OrderNumberFormats.PRIMARY_REGEX_9SLASH3, ... order_number_ignore_keywords=True, ... item_identifiers={"CODE1234", "DESC1234", "CONTENT1234"}, ... )
- class aiviro.modules.reader.InvoiceData(language: Optional[aiviro.modules.reader.common.keywords.Language] = None, customer_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_tax_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_ico: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_name: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_tax_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_ico: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_name: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_is_not_tax_payer: aiviro.modules.reader.storage.InvoiceField[bool] = InvoiceField(value=None, bound_box=None, page_index=-1), shipping_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), shipping_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), due_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), tax_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), order_number: List[aiviro.modules.reader.storage.InvoiceField[str]] = <factory>, primary_total: aiviro.modules.reader.storage.InvoiceTotals = InvoiceTotals(total_amount=InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax=InvoiceField(value=None, bound_box=None, page_index=-1), total_tax=InvoiceField(value=None, bound_box=None, page_index=-1), amount_due=InvoiceField(value=None, bound_box=None, page_index=-1), currency=InvoiceField(value=None, bound_box=None, page_index=-1)), secondary_total: aiviro.modules.reader.storage.InvoiceTotals = InvoiceTotals(total_amount=InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax=InvoiceField(value=None, bound_box=None, page_index=-1), total_tax=InvoiceField(value=None, bound_box=None, page_index=-1), amount_due=InvoiceField(value=None, bound_box=None, page_index=-1), currency=InvoiceField(value=None, bound_box=None, page_index=-1)), exchange_rate: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), variable_symbol: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), payment_term: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), bank_accounts: List[aiviro.modules.reader.storage.InvoiceBankAccount] = <factory>, items: List[aiviro.modules.reader.storage.InvoiceItem] = <factory>, raw_items: List[aiviro.modules.reader.storage.InvoiceItem] = <factory>)
- class aiviro.modules.reader.InvoiceAddress(house_number: str, road: str, city: str, postal_code: str, street_address: str, country_code_A3: str)
- class aiviro.modules.reader.InvoiceBankAccount(iban: aiviro.modules.reader.storage.InvoiceField[str], swift: aiviro.modules.reader.storage.InvoiceField[str], bank_name: aiviro.modules.reader.storage.InvoiceField[str], country_code: aiviro.modules.reader.storage.InvoiceField[str], local_account: aiviro.modules.reader.storage.LocalBankAccount)
- class aiviro.modules.reader.LocalBankAccount(bank_code: aiviro.modules.reader.storage.InvoiceField[str], account_code: aiviro.modules.reader.storage.InvoiceField[str], account_prefix: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1))
- property full_account_number: str
Returns full account number in format ‘prefix-account_code/bank_code’.
- classmethod from_acc_number(account_number: str, bank_code: str = '') LocalBankAccount
Creates LocalBankAccount object from account number and bank code.
- Parameters:
account_number – usually in form of ‘123456-1234567890/1234 (prefix-account_code/bank_code)’, or just ‘123456-1234567890’ (prefix-account_code), or ‘1234567890/1234’ (account_code/bank_code)
bank_code – code of the bank (might be provided already in account_number) - e.g. ‘1234’, ‘ABCD’
- class aiviro.modules.reader.InvoiceItem(index: int, item: aiviro.modules.reader.storage.InvoiceField[str], description: aiviro.modules.reader.storage.InvoiceField[str], quantity: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit_price: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit: aiviro.modules.reader.storage.InvoiceField[str], product_code: aiviro.modules.reader.storage.InvoiceField[str], tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], tax_rate: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], amount_without_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), identifier: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), tags: List[aiviro.modules.reader.storage.InvoiceField[str]] = <factory>)
- class aiviro.modules.reader.InvoiceTotals(total_amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), amount_due: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), currency: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1))
- class aiviro.modules.reader.InvoiceField(value: T | None = None, bound_box: aiviro.core.utils.bound_box.bbox.BoundBox | None = None, page_index: int = -1)