Reader
Warning
Previous implementation of the Reader was moved to Universal Parser section. This is a new improved version of the Reader, old version is deprecated and will be removed in the future.
Note
It’s required to install reader
, pdf
and codes
from Optional dependencies section.
Aiviro Reader allows you to process PDF files and extract crucial information. Whether it’s vendor IDs, customer IDs, total amounts, or more, it simplifies data extraction from invoices.
Invoice Processing
Name |
Type |
Description |
---|---|---|
language |
Language of the invoice |
|
customer_id |
Customer reference ID |
|
customer_ico |
The Czech company ID number of the customer |
|
customer_tax_id |
The taxpayer number associated with the customer |
|
customer_address |
Mailing address for the customer |
|
customer_address_recipient |
Name associated with the customer address |
|
customer_name |
Name of the customer |
|
vendor_id |
Vendor reference ID |
|
vendor_ico |
The Czech company ID number of the vendor |
|
vendor_tax_id |
The taxpayer number associated with the vendor |
|
vendor_address |
Vendor mailing address |
|
vendor_address_recipient |
Name associated with the vendor address |
|
vendor_name |
Name of the vendor |
|
invoice_id |
Invoice number |
|
invoice_date |
Date the invoice was issued |
|
due_date |
Date payment for this invoice is due |
|
tax_date |
Date the tax was applied to the invoice |
|
order_number |
Order reference number |
|
total_amount |
Total amount of the invoice |
|
total_amount_without_tax |
Total amount of the invoice without tax |
|
total_tax |
Total tax amount of the invoice |
|
amount_due |
Amount due for the invoice |
|
currency |
Currency of the invoice |
|
variable_symbol |
Variable symbol of the invoice |
|
payment_terms |
The terms of payment for the invoice |
|
bank_accounts |
List of bank accounts (IBAN, SWIFT) |
|
items |
List of invoice items, filtered by total_amount and total_amount_without_tax |
|
raw_items |
List of unfiltered invoice items |
Name |
Type |
Description |
---|---|---|
index |
Line item index, starting from 0 |
|
item |
Full string text line of the line item |
|
description |
The text description for the invoice line item |
|
quantity |
The quantity of the item |
|
unit_price |
The net or gross price of one unit of this item |
|
unit |
The unit of the line item, e.g, kg, lb etc. |
|
product_code |
Product code, product number, SKU, etc. |
|
tax |
Tax associated with the line item |
|
tax_rate |
Tax rate associated with the line item |
|
amount |
Total amount of the line item (can refer to net or gross amount) |
- class aiviro.modules.reader.Language(value)
Supported languages for invoice reader.
- CZ = 'cs'
Czech
- SK = 'sk'
Slovak
- EN = 'en'
English
- DE = 'de'
German
- PL = 'pl'
Polish
- class aiviro.modules.reader.InvoiceReader(pdf_r: PDFRobot, reader_config: ReaderConfig | None = None)
Reads pdf invoice file and extracts data from it.
- Parameters:
pdf_r – PDFRobot instance.
reader_config – Configuration for InvoiceReader.
- Example:
>>> from aiviro.modules.reader import InvoiceReader >>> from aiviro.modules.pdf import create_pdf_robot >>> >>> if __name__ == "__main__": ... pdf_r = create_pdf_robot("path/to/invoice.pdf") ... reader = InvoiceReader(pdf_r) ... extracted_data = reader.parse() ... ... # print value of invoice-id ... print(extracted_data.invoice_id.value) ... # '123456789' ... ... # print items ... for item in extracted_data.items: ... print(item.product_code.value) ... print(item.amount.value) ... # "ACD-123" ... # Decimal('100.00') ... # "DC-456" ... # Decimal('157.23')
>>> from aiviro.modules.reader import InvoiceReader, ReaderConfig >>> from aiviro.modules.pdf import create_pdf_robot >>> >>> if __name__ == "__main__": ... pdf_r = create_pdf_robot("path/to/invoice.pdf") ... reader_config = ReaderConfig([r"(\d{4,6})", r"([A-Z]{2}\d{4})"]) ... reader = InvoiceReader(pdf_r, reader_config) ... extracted_data = reader.parse() ... ... # print value of order-numbers ... for order_number in extracted_data.order_number: ... print(order_number.value) ... # '4654' ... # '123456' ... # 'AC1234'
- property invoice_data: InvoiceData
Return merged extracted data from all processors. Call
parse()
first to extract data.- Returns:
Extracted data from invoice.
- parse(offline_cloud_data: Dict | None = None) InvoiceData
Parse invoice and return extracted data.
- Parameters:
offline_cloud_data – If set, cloud processor will not be used and this data will be used instead.
- Returns:
Extracted data from invoice.
- class aiviro.modules.reader.ReaderConfig(order_number_formats: ~typing.List[str] = <factory>, document_language: ~aiviro.modules.reader.common.keywords.Language | None = None)
Configuration for InvoiceReader.
- Parameters:
order_number_formats – List of regex patterns for order number, if not provided, default patterns will be used.
document_language – If set auto-detection of language will be skipped.
- Example:
>>> from aiviro.modules.reader import OrderNumberFormats, ReaderConfig >>> >>> if __name__ == "__main__": ... # use predefined patterns (helios pattern '123456789/123') ... reader_config = ReaderConfig(OrderNumberFormats.PRIMARY_REGEX_9SLASH3)
- class aiviro.modules.reader.InvoiceData(language: Optional[aiviro.modules.reader.common.keywords.Language] = None, customer_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_tax_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_ico: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), customer_name: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_tax_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_ico: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), vendor_name: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), shipping_address: aiviro.modules.reader.storage.InvoiceField[aiviro.modules.reader.storage.InvoiceAddress] = InvoiceField(value=None, bound_box=None, page_index=-1), shipping_address_recipient: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_id: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), invoice_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), due_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), tax_date: aiviro.modules.reader.storage.InvoiceField[datetime.date] = InvoiceField(value=None, bound_box=None, page_index=-1), order_number: List[aiviro.modules.reader.storage.InvoiceField[str]] = <factory>, total_amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_amount_without_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), total_tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), amount_due: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal] = InvoiceField(value=None, bound_box=None, page_index=-1), currency: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), variable_symbol: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), payment_terms: aiviro.modules.reader.storage.InvoiceField[str] = InvoiceField(value=None, bound_box=None, page_index=-1), bank_accounts: List[aiviro.modules.reader.storage.InvoiceBankAccount] = <factory>, items: List[aiviro.modules.reader.storage.InvoiceItem] = <factory>, raw_items: List[aiviro.modules.reader.storage.InvoiceItem] = <factory>)
- class aiviro.modules.reader.InvoiceAddress(house_number: str, road: str, city: str, postal_code: str, street_address: str)
- class aiviro.modules.reader.InvoiceBankAccount(iban: aiviro.modules.reader.storage.InvoiceField[str], swift: aiviro.modules.reader.storage.InvoiceField[str])
- class aiviro.modules.reader.InvoiceItem(index: int, item: aiviro.modules.reader.storage.InvoiceField[str], description: aiviro.modules.reader.storage.InvoiceField[str], quantity: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit_price: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], unit: aiviro.modules.reader.storage.InvoiceField[str], product_code: aiviro.modules.reader.storage.InvoiceField[str], tax: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], tax_rate: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal], amount: aiviro.modules.reader.storage.InvoiceField[decimal.Decimal])