Module file_catalog.

FileCatalog

With file_catalog, you can set up a mapping of file names to their paths. An application can then use the catalog to retrieve the paths based on the file name. By keeping different catalogs with the same file names but different paths, different runs of the application can use each of the different catalogs as appropriate. For example, one catalog could be used for testing purposes and another for normal production.

Example:

instantiate production and test catalogs for one file

>>> from scottbrian_utils.file_catalog import FileCatalog
>>> prod_cat = FileCatalog({'file1': Path('/prod_files/file1.csv')})
>>> print(prod_cat.get_path('file1'))
/prod_files/file1.csv
>>> test_cat = FileCatalog(
...     {'file1': Path('/test_files/test_file1.csv')})
>>> print(test_cat.get_path('file1'))
/test_files/test_file1.csv
Example:

instantiate a catalog for two files:

>>> a_cat = FileCatalog({'sales': Path('/home/T/files/file1.csv'),
...                      'inventory': Path('/home/T/files/file2.csv')})
>>> print(a_cat)
FileCatalog({'sales': Path('/home/T/files/file1.csv'),
             'inventory': Path('/home/T/files/file2.csv')})
>>> print(a_cat.get_path('inventory'))
/home/T/files/file2.csv
>>> from os import fspath
>>> print(fspath(a_cat.get_path('sales')))
/home/T/files/file1.csv

The file_catalog module contains:

  1. FileCatalog class with add_paths, del_paths, get_path, save_catalog, and load_catalog methods

  2. FileSpec, FileSpecs type aliases that you can use for type hints

  3. Error exception classes:

    1. FileNameNotFound

    2. FileSpecIncorrect

    3. IllegalAddAttempt

    4. IllegalDelAtempt

class file_catalog.FileCatalog(file_specs=None)

Provides a mapping of file names to paths.

This is useful for cases where an application is to be used in various environments with files that are in different places. Another use is where one set of files is used for normal processing and another set is used for testing purposes.

Store the input file specs to a data frame.

Parameters:

file_specs (Optional[Dict[str, Path]]) – A dictionary of one or more entries. The key is the file name and the value is the path. The file name must be a sting and the path must be a pathlib Path.

Example:

instantiate a catalog with two files

>>> from scottbrian_utils.file_catalog import FileCatalog
>>> a_catalog = FileCatalog(
...     {'file_1': Path('/run/media/file1.csv'),
...      'file_2': Path('/run/media/file2.pdf')})
>>> print(a_catalog.get_path('file_2'))
/run/media/file2.pdf
add_paths(file_specs)

Add one or more paths to the catalog.

Parameters:

file_specs (Dict[str, Path]) – A dictionary of one or more entries. The key is the file name and the value is the path. The file name must be a sting and the path must be a pathlib Path.

Raises:
  • FileSpecIncorrect – The input path is not a string

  • IllegalAddAttempt – Entry already exists with different path

Return type:

None

The entries to be added are specified in the file_specs argument. For each file_spec, the specified file name is used to determine whether the entry already exists in the catalog. If the entry already exists, the specified path is compared against the found entry. If they do not match, an IllegalAddAttempt exception is raised and no entries for the add_paths request will be added. Otherwise, if the path matches, there is no need to add it again so processing continues with the next file_spec. If no errors are detected for any of the file_specs, any file names that do not yet exist in the catalog are added.

Example:

add some paths to the catalog

>>> from scottbrian_utils.file_catalog import FileCatalog
>>> from pathlib import Path
>>> a_catalog = FileCatalog()
>>> a_catalog.add_paths({'file1': Path('/run/media/file1.csv')})
>>> print(a_catalog)
FileCatalog({'file1': Path('/run/media/file1.csv')})
>>> a_catalog.add_paths({'file2': Path('/run/media/file2.csv'),
...                      'file3': Path('path3')})
>>> print(a_catalog)
FileCatalog({'file1': Path('/run/media/file1.csv'),
             'file2': Path('/run/media/file2.csv'),
             'file3': Path('path3')})
del_paths(file_specs)

Delete one or more paths from the catalog.

Parameters:

file_specs (Dict[str, Path]) – A dictionary of one or more entries. The key is the file name and the value is the path. The file name must be a sting and the path must be a pathlib Path.

Raises:
  • FileSpecIncorrect – The input path is not a string

  • IllegalDelAttempt – Attempt to delete entry with different path

Return type:

None

The entries to be deleted are specified in the file_specs argument. For each file_spec, the specified file name is used to find the entry in the catalog. If not found, processing continues with the next file_spec. Otherwise, if the entry is found, the specified path from the file_spec is compared against the path in the found entry. If not equal, an IllegalDelAttempt exception is raised and no entries for the del_paths request will be deleted. Otherwise, if the path matches, the entry will be deleted provided no errors are detected for any of the preceeding or remaining file_specs.

Example:

add and then delete paths from the catalog

>>> from scottbrian_utils.file_catalog import FileCatalog
>>> a_catalog = FileCatalog()
>>> a_catalog.add_paths({'file1': Path('/run/media/file1.csv'),
...                      'file2': Path('/run/media/file2.csv'),
...                      'file3': Path('path3'),
...                      'file4': Path('path4')})
>>> print(a_catalog)
FileCatalog({'file1': Path('/run/media/file1.csv'),
             'file2': Path('/run/media/file2.csv'),
             'file3': Path('path3'),
             'file4': Path('path4')})
>>> a_catalog.del_paths({'file1': Path('/run/media/file1.csv'),
...                      'file3': Path('path3')})
>>> print(a_catalog)
FileCatalog({'file2': Path('/run/media/file2.csv'),
             'file4': Path('path4')})
get_path(file_name)

Obtain a path given a file name.

Parameters:

file_name (str) – The name of the file whose path is needed

Return type:

Path

Returns:

A pathlib Path object for the input file name

Raises:

FileNameNotFound – The input file name is not in the catalog

Example:

instantiate a catalog with two files and get their paths

>>> from scottbrian_utils.file_catalog import FileCatalog
>>> a_catalog = FileCatalog(
...    {'file1': Path('/run/media/file1.csv'),
...     'file2': Path('/run/media/file2.pdf')})
>>> path1 = a_catalog.get_path('file1')
>>> print(path1)
/run/media/file1.csv
>>> from os import fspath
>>> fspath(a_catalog.get_path('file2'))
'/run/media/file2.pdf'
classmethod load_catalog(saved_cat_path)

Load catalog from a csv file.

Parameters:

saved_cat_path (Path) – The path from where the catalog is to be loaded

Return type:

FileCatalog

Returns:

A FileCatalog instance

save_catalog(saved_cat_path)

Save catalog as a csv file.

Parameters:

saved_cat_path (Path) – The path to where the catalog is to be saved

Return type:

None