Excel & CSV

There are multiple readers available in Aiviro for processing ‘.xlsx’ and ‘.csv’ files, for the full list see below.

Excel Actions

Note

It’s required to install reader from Optional dependencies section.

class aiviro.actions.documents.tables.ExcelAutoReader(file_path: Path | str, expected_header: list[str], sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False)

Automatically reads the loaded Excel file and extracts tables from it, based on the argument expected_header. If an empty file is provided, an empty list is returned. Supports reading the same file extensions as openpyxl package.

Parameters:
  • file_path – Path to the Excel file

  • expected_header – Header which should be found in the table

  • sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with the same mapping_key

Example:

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> # read multiple tables from the same file
>>> excel_auto_r = ExcelAutoReader(
...     file_path="path/to/file.xlsx",
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet1",
...     mapping_key="Identifier",
... )
>>> detected_tables = excel_auto_r()
>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> # when reading an empty xlsx file, an empty list is returned
>>> excel_auto_r = ExcelAutoReader(
...     file_path="path/to/empty-file.xlsx",
...     expected_header=["Name", "Identifier", "Amount", "km"],
...     sheet="sheet2",
... )
>>> tables = excel_auto_r()
>>> print(tables)  # will print []
execute() list[Table]

If the file is empty or the header is not found, an empty list is returned.

Returns:

List of tables extracted from the Excel file.

class aiviro.actions.documents.tables.ExcelTableReader(file_path: Path | str, header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False, header_fields: list[str] | None = None, table_name: str = '')

Returns extracted table from the required file’s sheet.

Parameters:
  • file_path – Path to the Excel file

  • header_row – Row that contains header, starting from 1.

  • end_row – The last row to read, if None read until end of the file

  • start_col – The first column to read, if None read from the first column. Starting from 1.

  • end_col – The last column to read, if None read until end of the file

  • sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active

  • mapping_key – Key used for unique identification of the loaded rows, if None row_index is used

  • skip_same_mapping_keys – Ignore rows with same mapping_key

  • header_fields – Custom names for header that can be used for data identification

  • table_name – Name of the table

Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)()
>>> for row in table:
...     # process row
>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> excel_table_r = ExcelTableReader(
...     file_path="path/to/file.xlsx",
...     header_row=1,
...     end_row=23,
...     start_col=5,
...     sheet="Super Sheet Name",
...     mapping_key=("item_name", "item_category"),
... )
>>> t1 = excel_table_r()
>>> r1 = t1[("item1", "small")]  # returns row by specified mapping_key
>>> r1_amount = r1["amount"]  # returns row's value from column "amount"
...
>>> another_excel_table_r = ExcelTableReader(
...     file_path="path/to/file.xlsx",
...     header_row=5,
...     mapping_key="a",  # map based on values from the column "a"
...     header_fields=["a", "b", "c", "d"],  # custom header names
... )
>>> t2 = excel_table_r()
>>> r2 = t2["name"]["c"]
execute() Table
Returns:

Table object with extracted data.

Raises:

pydantic.ValidationError – If the table is empty or the coordinates of the table are set incorrectly.

class aiviro.actions.documents.tables.ExcelLineReader(file_path: Path | str, start_row: int | None = None, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None)

Reads provided sheet, line by line. If the provided file is empty or the coordinates are set incorrectly, it returns an iterator with empty TableRow objects’ values.

Parameters:
  • file_path – Path to the Excel file

  • start_row – The first row to read, if None read from the first row. Starting from 1.

  • end_row – The last row to read, if None read until the end of the file

  • start_col – The first column to read, if None read from the first column. Starting from 1.

  • end_col – The last column to read, if None read until the end of the file

  • sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active

Example:

>>> from aiviro.actions.documents.tables import ExcelLineReader
>>> excel_line_r = ExcelLineReader(
...     file_path="path/to/file.xlsx",
...     start_row=5,
...     start_col=1,
...     end_col=8,
...     sheet=3,
... )
>>> for row in excel_line_r():
...     # process row
>>> from aiviro.actions.documents.tables import ExcelLineReader
>>> # iterator can be transformed into the list
>>> excel_line_r = ExcelLineReader(
...     file_path="path/to/some-file.xlsx",
...     start_row=3,
...     end_row=8,
...     start_col=1,
...     end_col=5,
...     sheet="my-sheet",
... )
>>> rows = list(excel_line_r())
>>> print(len(rows))  # print number of rows
>>> # or process rows here
execute() Iterator[TableRow]
Returns:

Iterator of TableRow objects with extracted data. If the provided file is empty or the coordinates of the table are set incorrectly, it returns an iterator with empty TableRow objects’ values.

class aiviro.actions.documents.tables.ExcelGetSheetNames(file_path: Path | str)

Simple action that returns names of all sheets in the provided Excel file.

Example:

>>> from aiviro.actions.documents.tables import ExcelGetSheetNames
>>> sheet_names = ExcelGetSheetNames(file_path="path/to/file.xlsx")()
>>> print(sheet_names[0]))
execute() list[str]
Returns:

List of sheet names found in the Excel file.

CSV Actions

class aiviro.actions.documents.tables.csv_reader.CSVTableReader(file_path: Path | str, expected_header: list[str], delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)

Reads the provided CSV file and extracts tables from it, based on the argument expected_header, which is expected to be in the first row.

Parameters:
  • file_path – Path to the CSV file

  • expected_header – Header that csv file should have in its first row

  • delimiter – Delimiter used in the CSV file (by default “,”)

  • encoding – Encoding used in the CSV file (by default “utf-8”)

  • kwargs – Additional keyword arguments passed to the csv.reader function

Raises:
  • FileNotFoundError – If the file does not exist.

  • TableHeaderNotProvided – If the expected_header is empty.

Example:

>>> from aiviro.actions.documents import CSVTableReader
>>> # gets data from the csv file, where the first row is taken as a header
>>> table = CSVTableReader(file_path="path/to/your/file.csv")()
>>> # print the table extracted from the csv file
>>> print(table)
>>> from aiviro.actions.documents import CSVTableReader
>>> table = CSVTableReader(
...     file_path="path/to/file.csv", expected_header=["Name", "Amount"], delimiter=";", encoding="utf-16"
... )()
>>> # print data from the third row of column 'Amount'
>>> print(table[2]["Amount"])
execute() Table

Processes CSV file and returns a table. The first row is expected to be a header and is compared with the expected header.

Returns:

Tables extracted from the CSV file or None if the file is empty.

Raises:

TablesException – If the expected header does not match the actual header.

class aiviro.actions.documents.tables.csv_reader.CSVLineReader(file_path: Path | str, delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)

Yields rows from the provided CSV file, line by line.

Parameters:
  • file_path – Path to the CSV file

  • delimiter – Delimiter used in the CSV file (by default “,”)

  • encoding – Encoding used in the CSV file (by default “utf-8”)

  • kwargs – Additional keyword arguments passed to the csv.reader function

Raises:

FileNotFoundError – If the file does not exist.

>>> from aiviro.actions.documents import CSVLineReader
>>> res = CSVLineReader(file_path="path/to/your/file.csv")()
>>> # process each row one by one
>>> for row in res:
>>>     # process row
>>> from aiviro.actions.documents import CSVLineReader
>>> # can be processed as a list
>>> res = list(CSVLineReader(
...     file_path="path/to/file.csv",
...     delimiter="-",
...     skipinitialspace=True,  # any other kwargs for csv.reader can be provided here
... )())
>>> # prints out values from the third row in a list format
>>> print(res[2].values)
execute() Iterable[TableRow]

Processes the CSV file by reading its lines one by one.

Returns:

Iterable of TableRow. The rows of the CSV file represented as TableRow objects with their respective index and values.

Data Schemas

pydantic model aiviro.actions.documents.tables.schemas.TableHeader
field end_column: int [Required]
field keys: list[str] [Required]
field normalized_keys: list[str] [Optional]
field start_column: int [Required]
classmethod create_from_row(row_data: list[Any], custom_keys: list[str]) TableHeader
pydantic model aiviro.actions.documents.tables.schemas.TableRow
Example:

>>> from aiviro.actions.documents.tables import ExcelAutoReader
>>> excel_auto_r = ExcelAutoReader(file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"])
>>> table = excel_auto_r()
>>> row_value = table[0]["b"]
field header: TableHeader | None [Required]
field index: int [Required]
field values: list[Any] [Required]
pydantic model aiviro.actions.documents.tables.schemas.Table
Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(
...     file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"]
... )()
>>> for map_key, row in table:
...     # iterates through all rows
>>> row = table[0]  # access row at index 0
field header: TableHeader [Required]
field mapping_key: str | int | tuple[str | int, ...] | None [Required]
field title: str [Required]
add_row_raw(index: int, values: list[Any], skip_duplicates: bool = False) str | int
dump_by_columns(use_normalized_keys: bool = True) dict[str, list[Any]]
Parameters:

use_normalized_keys – if True, the normalized keys are used. Otherwise, the keys are the same as in the header.

Returns:

dictionary where each key represents a column name and its value is a list of values from that column.

Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=3)()
>>> res = table.dump_by_columns()
>>> print(res)
... # {
... #   "columnA": [10, 11],  # columnA with row 1 and 2 values
... #   "columnB": [20, 21],  # columnB with row 1 and 2 values
... # }
...
... # print out value of the 'columnB' from the second row
>>> print(res["columnB"][1])  # prints 21
dump_by_rows(use_normalized_keys: bool = True) list[dict[str, Any]]

The first dictionary in the list represents the row with index 0.

Parameters:

use_normalized_keys – If True, the normalized keys are used. Otherwise, the keys are the same as in the header.

Returns:

list of dictionaries where each dictionary represents a row with column name as its key.

Example:

>>> from aiviro.actions.documents.tables import ExcelTableReader
>>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)()
>>> result = table.dump_by_rows()
>>> print(result)
... # [
... #   {"columnA": 10, "columnB": 20, "columnC": 30},  # row 1
... #   {"columnA": 11, "columnB": 21, "columnC": 31},  # row 2
... # ]
...
... # print out value from the second row and second column ('columnB')
>>> print(result[1]["columnB"])  # prints 21
get_column(name: str | int) list[Any]

Returns all values from the specified column

property rows: dict[str | int, TableRow]