Excel

Note

It’s required to install reader from Optional dependencies section.

class aiviro.modules.excel.ExcelReader(file_path: Path | str)

Reader to loads Excel file and extract tables from it. Supports reading same file extensions as openpyxl package.

Parameters:

file_path – Path to the Excel file

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> ws = excel_r.active_sheet  # access active sheet
set_worksheet(sheet_id: int | str | None) Worksheet

Sets new active worksheet.

Parameters:

sheet_id – Identifier of the worksheet

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> excel_r.set_worksheet(0)  # set sheet by index
>>> excel_r.set_worksheet("Sheet2")  # set sheet by its name
property sheet_names: List[str]

Returns name of all sheets

property active_sheet: Worksheet

Returns active sheet

read_table(header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False, filter_condition: Callable[[int, List[Any]], bool] | None = None, header_fields: List[str] | None = None, table_name: str = '') Table

Returns extracted table.

Parameters:
  • header_row – Index of the row that contains header

  • end_row – Index of the last row to read, if None read until end of the file

  • start_col – Index of the first column to read, if None read from the first column

  • end_col – Index of the last column to read, if None read until end of the file

  • sheet – Sheet to use for reading

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with same mapping_key

  • filter_condition – Rows that don’t meet this condition are filtered out

  • header_fields – Custom names for header that can be used for data identification

  • table_name – Name of the table

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> t1 = excel_r.read_table(
...     header_row=2,
...     end_row=23,
...     sheet="Super Sheet Name",
...     mapping_key=("item_name", "item_category"),
...     filter_condition=lambda i, values: i == 20, # ignore 20th row
... )
>>> r1 = t1[("item1", "small")]  # returns row by specified mapping_key
>>> r1_amount = r1["amount"]  # returns value from column "amount"
>>> t2 = excel_r.read_table(
...     header_row=5,
...     mapping_key="a",  # map based on values from the column "a"
...     filter_condition=lambda _, values: len(values[2]) != 3,
...     header_fields=["a", "b", "c", "d"],  # custom header names
... )
>>> r2 = t2["name"]["c"]
read_line(start_row: int | None = None, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: int | str | None = None, filter_condition: Callable[[int, List[Any]], bool] | None = None) Iterator[TableRow]

Reads sheet, line by line.

Parameters:
  • start_row – Index of the first row to read, if None read from the first row

  • end_row – Index of the last row to read, if None read until end of the file

  • start_col – Index of the first column to read, if None read from the first column

  • end_col – Index of the last column to read, if None read until end of the file

  • sheet – Sheet to use for reading

  • filter_condition – Rows that don’t meet this condition are filtered out

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> for row in excel_r.read_line(start_row=5, start_col=2, end_col=8, sheet=0):
...     # process row
read_auto(expected_header: List[str], sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False) List[Table]

Automatically detects multiple tables based on the expected_header argument.

Parameters:
  • expected_header – Header that table should have

  • sheet – Sheet to use for reading

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with same mapping_key

Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.pdf")
>>> detected_tables = excel_r.read_auto(
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet1",
...     mapping_key="Identifier",
... )
class aiviro.modules.excel.Table(title: str, header: TableHeader, mapping_key: MAPPING_KEY_TYPE)
Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> for map_key, row in table:
...     # iterates through all rows
>>> row = table[0]  # access row at index 0
get_column(name: str | int) List[Any]

Returns all values from the specified column

as_pretty_table() str

Returns table as string in pretty format

class aiviro.modules.excel.TableRow(index: int, header: TableHeader | None, _values: List[Any])
Example:

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> row_value = table[0]["b"]
property values: List[Any]

Returns all values of the row