Excel

Note

It’s required to install excel from Optional dependencies section.

class aiviro.modules.excel.ExcelReader(file_path: Union[pathlib.Path, str])

Reader to loads Excel file and extract tables from it. Supports reading same file extensions as openpyxl package.

Parameters

file_path – Path to the Excel file

Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> ws = excel_r.active_sheet  # access active sheet
set_worksheet(sheet_id: Optional[Union[int, str]]) openpyxl.worksheet.worksheet.Worksheet

Sets new active worksheet.

Parameters

sheet_id – Identifier of the worksheet

Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> excel_r.set_worksheet(0)  # set sheet by index
>>> excel_r.set_worksheet("Sheet2")  # set sheet by its name
property sheet_names: List[str]

Returns name of all sheets

property active_sheet: openpyxl.worksheet.worksheet.Worksheet

Returns active sheet

read_table(header_row: int, end_row: Optional[int] = None, start_col: Optional[int] = None, end_col: Optional[int] = None, sheet: Optional[Union[int, str]] = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False, filter_condition: Optional[Callable[[int, List[Any]], bool]] = None, header_fields: Optional[List[str]] = None, table_name: str = '') aiviro.modules.excel.common.Table

Returns extracted table.

Parameters
  • header_row – Index of the row that contains header

  • end_row – Index of the last row to read, if None read until end of the file

  • start_col – Index of the first column to read, if None read from the first column

  • end_col – Index of the last column to read, if None read until end of the file

  • sheet – Sheet to use for reading

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with same mapping_key

  • filter_condition – Rows that don’t meet this condition are filtered out

  • header_fields – Custom names for header that can be used for data identification

  • table_name – Name of the table

Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> t1 = excel_r.read_table(
...     header_row=2,
...     end_row=23,
...     sheet="Super Sheet Name",
...     mapping_key=("item_name", "item_category"),
...     filter_condition=lambda i, values: i == 20, # ignore 20th row
... )
>>> r1 = t1[("item1", "small")]  # returns row by specified mapping_key
>>> r1_amount = r1["amount"]  # returns value from column "amount"
>>> t2 = excel_r.read_table(
...     header_row=5,
...     mapping_key="a",  # map based on values from the column "a"
...     filter_condition=lambda _, values: len(values[2]) != 3,
...     header_fields=["a", "b", "c", "d"],  # custom header names
... )
>>> r2 = t2["name"]["c"]
read_line(start_row: Optional[int] = None, end_row: Optional[int] = None, start_col: Optional[int] = None, end_col: Optional[int] = None, sheet: Optional[Union[int, str]] = None, filter_condition: Optional[Callable[[int, List[Any]], bool]] = None) Iterator[aiviro.modules.excel.common.TableRow]

Reads sheet, line by line.

Parameters
  • start_row – Index of the first row to read, if None read from the first row

  • end_row – Index of the last row to read, if None read until end of the file

  • start_col – Index of the first column to read, if None read from the first column

  • end_col – Index of the last column to read, if None read until end of the file

  • sheet – Sheet to use for reading

  • filter_condition – Rows that don’t meet this condition are filtered out

Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> for row in excel_r.read_line(start_row=5, start_col=2, end_col=8, sheet=0):
...     # process row
read_auto(expected_header: List[str], sheet: Optional[Union[int, str]] = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False) List[aiviro.modules.excel.common.Table]

Automatically detects multiple tables based on the expected_header argument.

Parameters
  • expected_header – Header that table should have

  • sheet – Sheet to use for reading

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with same mapping_key

Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.pdf")
>>> detected_tables = excel_r.read_auto(
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet1",
...     mapping_key="Identifier",
... )
class aiviro.modules.excel.Table(title: str, header: aiviro.modules.excel.common.TableHeader, mapping_key: MAPPING_KEY_TYPE)
Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> for map_key, row in table:
...     # iterates through all rows
>>> row = table[0]  # access row at index 0
get_column(name: Union[str, int]) List[Any]

Returns all values from the specified column

as_pretty_table() str

Returns table as string in pretty format

class aiviro.modules.excel.TableRow(index: int, header: Optional[aiviro.modules.excel.common.TableHeader], _values: List[Any])
Example

>>> from aiviro.modules.excel import ExcelReader
>>> excel_r = ExcelReader("path/to/file.xlsx")
>>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"])
>>> row_value = table[0]["b"]
property values: List[Any]

Returns all values of the row