Excel & CSV

There are multiple readers available in Aiviro for processing ‘.xlsx’ and ‘.csv’ files, for the full list see below.

Excel Actions

Note

It’s required to install reader from Optional dependencies section.

Automatically reads the loaded Excel file and extracts tables from it, based on the argument expected_header. If an empty file is provided, an empty list is returned. Supports reading the same file extensions as openpyxl package.

Parameters:

file_path – Path to the Excel file
expected_header – Header which should be found in the table
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with the same mapping_key

Example:

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> # read multiple tables from the same file
>>> excel_auto_r = ExcelAutoReader(
...     file_path="path/to/file.xlsx",
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet1",
...     mapping_key="Identifier",
... )
>>> detected_tables = excel_auto_r()

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> # when reading an empty xlsx file, an empty list is returned
>>> excel_auto_r = ExcelAutoReader(
...     file_path="path/to/empty-file.xlsx",
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet2",
... )
>>> tables = excel_auto_r()
>>> print(tables)  # will print []

execute() → list[Table]

If the file is empty or the header is not found, an empty list is returned.

Returns:: List of tables extracted from the Excel file.

Returns extracted table from the required file’s sheet.

Parameters:

file_path – Path to the Excel file
header_row – Row that contains header, starting from 1.
end_row – The last row to read, if None read until end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
header_fields – Custom names for header that can be used for data identification
table_name – Name of the table

Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)()
>>> for row in table:
...     # process row

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> excel_table_r = ExcelTableReader(
...     file_path="path/to/file.xlsx",
...     header_row=1,
...     end_row=23,
...     start_col=5,
...     sheet="Super Sheet Name",
...     mapping_key=("item_name", "item_category"),
... )
>>> t1 = excel_table_r()
>>> r1 = t1[("item1", "small")]  # returns row by specified mapping_key
>>> r1_amount = r1["amount"]  # returns row's value from column "amount"
...
>>> another_excel_table_r = ExcelTableReader(
...     file_path="path/to/file.xlsx",
...     header_row=5,
...     mapping_key="a",  # map based on values from the column "a"
...     header_fields=["a", "b", "c", "d"],  # custom header names
... )
>>> t2 = excel_table_r()
>>> r2 = t2["name"]["c"]

execute() → Table

Returns:: Table object with extracted data.
Raises:: pydantic.ValidationError – If the table is empty or the coordinates of the table are set incorrectly.

Reads provided sheet, line by line. If the provided file is empty or the coordinates are set incorrectly, it returns an iterator with empty TableRow objects’ values.

Parameters:

file_path – Path to the Excel file
start_row – The first row to read, if None read from the first row. Starting from 1.
end_row – The last row to read, if None read until the end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until the end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active

Example:

>>> from aiviro.actions.documents.tables import ExcelLineReader
>>> excel_line_r = ExcelLineReader(
...     file_path="path/to/file.xlsx",
...     start_row=5,
...     start_col=1,
...     end_col=8,
...     sheet=3,
... )
>>> for row in excel_line_r():
...     # process row

>>> from aiviro.actions.documents.tables import ExcelLineReader
>>> # iterator can be transformed into the list
>>> excel_line_r = ExcelLineReader(
...     file_path="path/to/some-file.xlsx",
...     start_row=3,
...     end_row=8,
...     start_col=1,
...     end_col=5,
...     sheet="my-sheet",
... )
>>> rows = list(excel_line_r())
>>> print(len(rows))  # print number of rows
>>> # or process rows here

execute() → Iterator[TableRow]

Returns:: Iterator of TableRow objects with extracted data. If the provided file is empty or the coordinates of the table are set incorrectly, it returns an iterator with empty TableRow objects’ values.

class aiviro.actions.documents.tables.ExcelGetSheetNames(file_path: Path | str)

Simple action that returns names of all sheets in the provided Excel file.

Example:

>>> from aiviro.actions.documents.tables import ExcelGetSheetNames
>>> sheet_names = ExcelGetSheetNames(file_path="path/to/file.xlsx")()
>>> print(sheet_names[0]))

execute() → list[str]

Returns:: List of sheet names found in the Excel file.

CSV Actions

class aiviro.actions.documents.tables.csv_reader.CSVTableReader(file_path: Path | str, expected_header: list[str], delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)

Reads the provided CSV file and extracts tables from it, based on the argument expected_header, which is expected to be in the first row.

Parameters:

file_path – Path to the CSV file
expected_header – Header that csv file should have in its first row
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function

Raises:

FileNotFoundError – If the file does not exist.
TableHeaderNotProvided – If the expected_header is empty.

Example:

>>> from aiviro.actions.documents import CSVTableReader
>>> # gets data from the csv file, where the first row is taken as a header
>>> table = CSVTableReader(file_path="path/to/your/file.csv")()
>>> # print the table extracted from the csv file
>>> print(table)

>>> from aiviro.actions.documents import CSVTableReader
>>> table = CSVTableReader(
...     file_path="path/to/file.csv", expected_header=["Name", "Amount"], delimiter=";", encoding="utf-16"
... )()
>>> # print data from the third row of column 'Amount'
>>> print(table[2]["Amount"])

execute() → Table

Processes CSV file and returns a table. The first row is expected to be a header and is compared with the expected header.

Returns:: Tables extracted from the CSV file or None if the file is empty.
Raises:: TablesException – If the expected header does not match the actual header.

class aiviro.actions.documents.tables.csv_reader.CSVLineReader(file_path: Path | str, delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)

Yields rows from the provided CSV file, line by line.

Parameters:

file_path – Path to the CSV file
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function

Raises:

FileNotFoundError – If the file does not exist.

>>> from aiviro.actions.documents import CSVLineReader
>>> res = CSVLineReader(file_path="path/to/your/file.csv")()
>>> # process each row one by one
>>> for row in res:
>>>     # process row

>>> from aiviro.actions.documents import CSVLineReader
>>> # can be processed as a list
>>> res = list(CSVLineReader(
...     file_path="path/to/file.csv",
...     delimiter="-",
...     skipinitialspace=True,  # any other kwargs for csv.reader can be provided here
... )())
>>> # prints out values from the third row in a list format
>>> print(res[2].values)

execute() → Iterable[TableRow]

Processes the CSV file by reading its lines one by one.

Returns:: Iterable of TableRow. The rows of the CSV file represented as TableRow objects with their respective index and values.

Data Schemas

pydantic model aiviro.actions.documents.tables.schemas.TableHeader

field end_column: int [Required]

field keys: list[str] [Required]

field normalized_keys: list[str] [Optional]

field start_column: int [Required]

classmethod create_from_row(row_data: list[Any], custom_keys: list[str]) → TableHeader

pydantic model aiviro.actions.documents.tables.schemas.TableRow

Example:

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> excel_auto_r = ExcelAutoReader(file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"])
>>> table = excel_auto_r()
>>> row_value = table[0]["b"]

field header: TableHeader | None [Required]

field index: int [Required]

field values: list[Any] [Required]

pydantic model aiviro.actions.documents.tables.schemas.Table

Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(
...     file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"]
... )()
>>> for map_key, row in table:
...     # iterates through all rows
>>> row = table[0]  # access row at index 0

field header: TableHeader [Required]

field mapping_key: str | int | tuple[str | int, ...] | None [Required]

field title: str [Required]

add_row_raw(index: int, values: list[Any], skip_duplicates: bool = False) → str | int

dump_by_columns(use_normalized_keys: bool = True) → dict[str, list[Any]]

Parameters:: use_normalized_keys – if True, the normalized keys are used. Otherwise, the keys are the same as in the header.
Returns:: dictionary where each key represents a column name and its value is a list of values from that column.
Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=3)()
>>> res = table.dump_by_columns()
>>> print(res)
... # {
... #   "columnA": [10, 11],  # columnA with row 1 and 2 values
... #   "columnB": [20, 21],  # columnB with row 1 and 2 values
... # }
...
... # print out value of the 'columnB' from the second row
>>> print(res["columnB"][1])  # prints 21

dump_by_rows(use_normalized_keys: bool = True) → list[dict[str, Any]]

The first dictionary in the list represents the row with index 0.

Parameters:: use_normalized_keys – If True, the normalized keys are used. Otherwise, the keys are the same as in the header.
Returns:: list of dictionaries where each dictionary represents a row with column name as its key.
Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)()
>>> result = table.dump_by_rows()
>>> print(result)
... # [
... #   {"columnA": 10, "columnB": 20, "columnC": 30},  # row 1
... #   {"columnA": 11, "columnB": 21, "columnC": 31},  # row 2
... # ]
...
... # print out value from the second row and second column ('columnB')
>>> print(result[1]["columnB"])  # prints 21

get_column(name: str | int) → list[Any]: Returns all values from the specified column

property rows: dict[str | int, TableRow]