Excel
Note
It’s required to install reader
from Optional dependencies section.
- class aiviro.modules.excel.ExcelReader(file_path: Path | str)
Reader to loads Excel file and extract tables from it. Supports reading same file extensions as openpyxl package.
- Parameters:
file_path – Path to the Excel file
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> ws = excel_r.active_sheet # access active sheet
- set_worksheet(sheet_id: int | str | None) Worksheet
Sets new active worksheet.
- Parameters:
sheet_id – Identifier of the worksheet
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> excel_r.set_worksheet(0) # set sheet by index >>> excel_r.set_worksheet("Sheet2") # set sheet by its name
- property active_sheet: Worksheet
Returns active sheet
- read_table(header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False, filter_condition: Callable[[int, List[Any]], bool] | None = None, header_fields: List[str] | None = None, table_name: str = '') Table
Returns extracted table.
- Parameters:
header_row – Index of the row that contains header
end_row – Index of the last row to read, if None read until end of the file
start_col – Index of the first column to read, if None read from the first column
end_col – Index of the last column to read, if None read until end of the file
sheet – Sheet to use for reading
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
filter_condition – Rows that don’t meet this condition are filtered out
header_fields – Custom names for header that can be used for data identification
table_name – Name of the table
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> t1 = excel_r.read_table( ... header_row=2, ... end_row=23, ... sheet="Super Sheet Name", ... mapping_key=("item_name", "item_category"), ... filter_condition=lambda i, values: i == 20, # ignore 20th row ... ) >>> r1 = t1[("item1", "small")] # returns row by specified mapping_key >>> r1_amount = r1["amount"] # returns value from column "amount"
>>> t2 = excel_r.read_table( ... header_row=5, ... mapping_key="a", # map based on values from the column "a" ... filter_condition=lambda _, values: len(values[2]) != 3, ... header_fields=["a", "b", "c", "d"], # custom header names ... ) >>> r2 = t2["name"]["c"]
- read_line(start_row: int | None = None, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: int | str | None = None, filter_condition: Callable[[int, List[Any]], bool] | None = None) Iterator[TableRow]
Reads sheet, line by line.
- Parameters:
start_row – Index of the first row to read, if None read from the first row
end_row – Index of the last row to read, if None read until end of the file
start_col – Index of the first column to read, if None read from the first column
end_col – Index of the last column to read, if None read until end of the file
sheet – Sheet to use for reading
filter_condition – Rows that don’t meet this condition are filtered out
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> for row in excel_r.read_line(start_row=5, start_col=2, end_col=8, sheet=0): ... # process row
- read_auto(expected_header: List[str], sheet: int | str | None = None, mapping_key: MAPPING_KEY_TYPE = None, skip_same_mapping_keys: bool = False) List[Table]
Automatically detects multiple tables based on the expected_header argument.
- Parameters:
expected_header – Header that table should have
sheet – Sheet to use for reading
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.pdf") >>> detected_tables = excel_r.read_auto( ... expected_header=["Name", "Identifier", "Amount", "km"], ... sheet="sheet1", ... mapping_key="Identifier", ... )
- class aiviro.modules.excel.Table(title: str, header: TableHeader, mapping_key: MAPPING_KEY_TYPE)
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"]) >>> for map_key, row in table: ... # iterates through all rows >>> row = table[0] # access row at index 0
- class aiviro.modules.excel.TableRow(index: int, header: TableHeader | None, _values: List[Any])
- Example:
>>> from aiviro.modules.excel import ExcelReader >>> excel_r = ExcelReader("path/to/file.xlsx") >>> table = excel_r.read_table(header_row=1, header_fields=["a", "b", "c"]) >>> row_value = table[0]["b"]