Excel & CSV
There are multiple readers available in Aiviro for processing ‘.xlsx’ and ‘.csv’ files, for the full list see below.
Excel Actions
Note
It’s required to install reader
from Optional dependencies section.
- class aiviro.actions.documents.tables.ExcelAutoReader(file_path: Path | str, expected_header: list[str], sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False)
Automatically reads the loaded Excel file and extracts tables from it, based on the argument expected_header. If an empty file is provided, an empty list is returned. Supports reading the same file extensions as openpyxl package.
- Parameters:
file_path – Path to the Excel file
expected_header – Header which should be found in the table
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with the same mapping_key
- Example:
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> # read multiple tables from the same file >>> excel_auto_r = ExcelAutoReader( ... file_path="path/to/file.xlsx", ... expected_header=["Name", "Identifier", "Amount", "km"], ... sheet="sheet1", ... mapping_key="Identifier", ... ) >>> detected_tables = excel_auto_r()
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> # when reading an empty xlsx file, an empty list is returned >>> excel_auto_r = ExcelAutoReader( ... file_path="path/to/empty-file.xlsx", ... expected_header=["Name", "Identifier", "Amount", "km"], ... sheet="sheet2", ... ) >>> tables = excel_auto_r() >>> print(tables) # will print []
- class aiviro.actions.documents.tables.ExcelTableReader(file_path: Path | str, header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False, header_fields: list[str] | None = None, table_name: str = '')
Returns extracted table from the required file’s sheet.
- Parameters:
file_path – Path to the Excel file
header_row – Row that contains header, starting from 1.
end_row – The last row to read, if None read until end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
header_fields – Custom names for header that can be used for data identification
table_name – Name of the table
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)() >>> for row in table: ... # process row
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> excel_table_r = ExcelTableReader( ... file_path="path/to/file.xlsx", ... header_row=1, ... end_row=23, ... start_col=5, ... sheet="Super Sheet Name", ... mapping_key=("item_name", "item_category"), ... ) >>> t1 = excel_table_r() >>> r1 = t1[("item1", "small")] # returns row by specified mapping_key >>> r1_amount = r1["amount"] # returns row's value from column "amount" ... >>> another_excel_table_r = ExcelTableReader( ... file_path="path/to/file.xlsx", ... header_row=5, ... mapping_key="a", # map based on values from the column "a" ... header_fields=["a", "b", "c", "d"], # custom header names ... ) >>> t2 = excel_table_r() >>> r2 = t2["name"]["c"]
- class aiviro.actions.documents.tables.ExcelLineReader(file_path: Path | str, start_row: int | None = None, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None)
Reads provided sheet, line by line. If the provided file is empty or the coordinates are set incorrectly, it returns an iterator with empty TableRow objects’ values.
- Parameters:
file_path – Path to the Excel file
start_row – The first row to read, if None read from the first row. Starting from 1.
end_row – The last row to read, if None read until the end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until the end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
- Example:
>>> from aiviro.actions.documents.tables import ExcelLineReader >>> excel_line_r = ExcelLineReader( ... file_path="path/to/file.xlsx", ... start_row=5, ... start_col=1, ... end_col=8, ... sheet=3, ... ) >>> for row in excel_line_r(): ... # process row
>>> from aiviro.actions.documents.tables import ExcelLineReader >>> # iterator can be transformed into the list >>> excel_line_r = ExcelLineReader( ... file_path="path/to/some-file.xlsx", ... start_row=3, ... end_row=8, ... start_col=1, ... end_col=5, ... sheet="my-sheet", ... ) >>> rows = list(excel_line_r()) >>> print(len(rows)) # print number of rows >>> # or process rows here
- class aiviro.actions.documents.tables.ExcelGetSheetNames(file_path: Path | str)
Simple action that returns names of all sheets in the provided Excel file.
- Example:
>>> from aiviro.actions.documents.tables import ExcelGetSheetNames >>> sheet_names = ExcelGetSheetNames(file_path="path/to/file.xlsx")() >>> print(sheet_names[0]))
CSV Actions
- class aiviro.actions.documents.tables.csv_reader.CSVTableReader(file_path: Path | str, expected_header: list[str], delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)
Reads the provided CSV file and extracts tables from it, based on the argument expected_header, which is expected to be in the first row.
- Parameters:
file_path – Path to the CSV file
expected_header – Header that csv file should have in its first row
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function
- Raises:
FileNotFoundError – If the file does not exist.
TableHeaderNotProvided – If the expected_header is empty.
- Example:
>>> from aiviro.actions.documents import CSVTableReader >>> # gets data from the csv file, where the first row is taken as a header >>> table = CSVTableReader(file_path="path/to/your/file.csv")() >>> # print the table extracted from the csv file >>> print(table)
>>> from aiviro.actions.documents import CSVTableReader >>> table = CSVTableReader( ... file_path="path/to/file.csv", expected_header=["Name", "Amount"], delimiter=";", encoding="utf-16" ... )() >>> # print data from the third row of column 'Amount' >>> print(table[2]["Amount"])
- class aiviro.actions.documents.tables.csv_reader.CSVLineReader(file_path: Path | str, delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)
Yields rows from the provided CSV file, line by line.
- Parameters:
file_path – Path to the CSV file
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function
- Raises:
FileNotFoundError – If the file does not exist.
>>> from aiviro.actions.documents import CSVLineReader >>> res = CSVLineReader(file_path="path/to/your/file.csv")() >>> # process each row one by one >>> for row in res: >>> # process row
>>> from aiviro.actions.documents import CSVLineReader >>> # can be processed as a list >>> res = list(CSVLineReader( ... file_path="path/to/file.csv", ... delimiter="-", ... skipinitialspace=True, # any other kwargs for csv.reader can be provided here ... )()) >>> # prints out values from the third row in a list format >>> print(res[2].values)
Data Schemas
- pydantic model aiviro.actions.documents.tables.schemas.TableHeader
- pydantic model aiviro.actions.documents.tables.schemas.TableRow
- Example:
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> excel_auto_r = ExcelAutoReader(file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"]) >>> table = excel_auto_r() >>> row_value = table[0]["b"]
- field header: TableHeader | None [Required]
- pydantic model aiviro.actions.documents.tables.schemas.Table
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader( ... file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"] ... )() >>> for map_key, row in table: ... # iterates through all rows >>> row = table[0] # access row at index 0
- field header: TableHeader [Required]
- dump_by_columns(use_normalized_keys: bool = True) dict[str, list[Any]]
- Parameters:
use_normalized_keys – if True, the normalized keys are used. Otherwise, the keys are the same as in the header.
- Returns:
dictionary where each key represents a column name and its value is a list of values from that column.
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=3)() >>> res = table.dump_by_columns() >>> print(res) ... # { ... # "columnA": [10, 11], # columnA with row 1 and 2 values ... # "columnB": [20, 21], # columnB with row 1 and 2 values ... # } ... ... # print out value of the 'columnB' from the second row >>> print(res["columnB"][1]) # prints 21
- dump_by_rows(use_normalized_keys: bool = True) list[dict[str, Any]]
The first dictionary in the list represents the row with index 0.
- Parameters:
use_normalized_keys – If True, the normalized keys are used. Otherwise, the keys are the same as in the header.
- Returns:
list of dictionaries where each dictionary represents a row with column name as its key.
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)() >>> result = table.dump_by_rows() >>> print(result) ... # [ ... # {"columnA": 10, "columnB": 20, "columnC": 30}, # row 1 ... # {"columnA": 11, "columnB": 21, "columnC": 31}, # row 2 ... # ] ... ... # print out value from the second row and second column ('columnB') >>> print(result[1]["columnB"]) # prints 21