Excel

Note

It’s required to install reader from Optional dependencies section.

class aiviro.modules.excel.ExcelReader(file_path: Path | str)

Reader to loads Excel file and extract tables from it. Supports reading same file extensions as openpyxl package.

Parameters:: file_path – Path to the Excel file
Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> ws = excel_r.active_sheet  # access active sheet

set_worksheet(sheet_id: int | str | None) → Worksheet

Sets new active worksheet.

Parameters:: sheet_id – Identifier of the worksheet
Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> excel_r.set_worksheet(0)  # set sheet by index
>>> excel_r.set_worksheet("Sheet2")  # set sheet by its name

property sheet_names: list[str]: Returns name of all sheets

property active_sheet: Worksheet: Returns active sheet

read_table(header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False, filter_condition: Callable[[int, list[Any]], bool] | None = None, header_fields: list[str] | None = None, table_name: str = '') → Table

Returns extracted table.

Parameters:

header_row – Index of the row that contains header
end_row – Index of the last row to read, if None read until end of the file
start_col – Index of the first column to read, if None read from the first column
end_col – Index of the last column to read, if None read until end of the file
sheet – Sheet to use for reading
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
filter_condition – Rows that don’t meet this condition are filtered out
header_fields – Custom names for header that can be used for data identification
table_name – Name of the table

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> t1 = excel_r.read_table(
...     header_row=2,
...     end_row=23,
...     sheet="Super Sheet Name",
...     mapping_key=("item_name", "item_category"),
...     filter_condition=lambda i, values: i == 20, # ignore 20th row
... )
>>> r1 = t1[("item1", "small")]  # returns row by specified mapping_key
>>> r1_amount = r1["amount"]  # returns value from column "amount"

>>> t2 = excel_r.read_table(
...     header_row=5,
...     mapping_key="a",  # map based on values from the column "a"
...     filter_condition=lambda _, values: len(values[2]) != 3,
...     header_fields=["a", "b", "c", "d"],  # custom header names
... )
>>> r2 = t2["name"]["c"]

Reads sheet, line by line.

Parameters:

start_row – Index of the first row to read, if None read from the first row
end_row – Index of the last row to read, if None read until end of the file
start_col – Index of the first column to read, if None read from the first column
end_col – Index of the last column to read, if None read until end of the file
sheet – Sheet to use for reading
filter_condition – Rows that don’t meet this condition are filtered out

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> for row in excel_r.read_line(start_row=5, start_col=2, end_col=8, sheet=0):
...     # process row

read_auto(expected_header: list[str], sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False) → list[Table]

Automatically detects multiple tables based on the expected_header argument.

Parameters:

expected_header – Header that table should have
sheet – Sheet to use for reading
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.pdf")
>>> detected_tables = excel_r.read_auto(
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet1",
...     mapping_key="Identifier",
... )

class aiviro.modules.excel.Table(title: str, header: TableHeader, mapping_key: MAPPING_KEY_TYPE)

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> for map_key, row in table:
...     # iterates through all rows
>>> row = table[0]  # access row at index 0

get_column(name: str | int) → list[Any]: Returns all values from the specified column

as_pretty_table() → str: Returns table as string in pretty format

class aiviro.modules.excel.TableRow(index: int, header: TableHeader | None, _values: list[Any])

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> row_value = table[0]["b"]

property values: list[Any]: Returns all values of the row