Skip to content

SoLake

SoLake

Extractor class for the Azure Data Lake Container Storage.

__init__()

Extractor for Azure Data Lake Container Storage data.

This class handles downloading and uploading data to the Azure storage containers (also known as blobs)

delete_all_blobs_in_folder(container, folder_prefix)

Delete all blobs within a given folder in an Azure Blob container.

Parameters:

Name Type Description Default
container str

The name of the container where the folder exists. Example: 'pqs'

required
folder_prefix str

The "folder" prefix (path) within the container to delete. Example: '2024/'

required

download_from_blob_storage(container, file_path, location_to_save_file)

Function to download a file from the Azure storage account. This will save the file in the location_to_save_file + file_path location

Parameters:

Name Type Description Default
container(str)

Required. Name of the container. Ex: 'so-connect-migration-data'

required
file_path str or file)

Required. Location path to file you want to upload. Must include directory/file. (Ex: 'Test/test.rtf') or if the file is within your python environment, then the python file name (ex: df).

required
location_to_save_file str)

Required. Location of where you want to save the file locally. Must end with a / Ex: '/Users/user/Desktop/'. However, if the location is 'python', then it will save as a python DataFrame and not to a file

required

get_excel_file_from_blob_storage(container, file_path)

Function to download a file from the Azure storage account This is for excel due to the blob_data.content_as_bytes() piece, then you can read it like a normal read excel file with pd.read_excel()

Parameters:

Name Type Description Default
container(str)

Required. Name of the container. Ex: 'she-data',

required
file_path str

Required. Location of the file you want to download. Must include directory/file. Example: 'Test/test.rtf'

required

list_files_in_blob(container, container_folder_path='', print_files=False)

Function to get a list file names located within the Azure container storage.

Parameters:

Name Type Description Default
container(str)

Required. Name of the container. Ex: 'she-data'

required
container_folder_path str)

Optional. Folder path within directory. Ex: 'Metrics Charts'

''
print_files(boolean)

False does not print the file names and True will print the file names

required

rename_blob(container, old_blob_name, new_blob_name)

Rename a blob.

Parameters:

Name Type Description Default
container str)

Required. Name of the container with folder path. Ex: 'pqs/2024 File Lists'

required
old_blob_name str)

Old blob name (include .csv type endings)

required
new_blob_name str)

New blob name (include .csv type endings)

required

upload_folder_to_blob_storage(container, toplevel_localpath, toplevel_azurepath)

Function to upload a folder with all folder paths to the Azure storage account.

Parameters:

Name Type Description Default
container(str)

Required. Name of the container. Ex: 'so-connect-migration-data'

required
toplevel_localpath str

Required. Location of the top level of the local folder you want to upload. Example: '/Users/username/Desktop/Test'

required
toplevel_azurepath str

Required. Location of the top level of the folder that you want to upload the data to in Azure. It will create a folder on Azure with that name if the folder does not already exist. Example: 'Test'

required

upload_to_blob_storage(container, file_path, file_name, container_folder_path='', use_path=True, df=None)

Function to upload a single file to the Azure storage account.

Parameters:

Name Type Description Default
container(str)

Required. Name of the container. Ex: 'so-connect-migration-data'

required
file_path str)

Required. Location path to file you want to upload. Ex: '/Users/user/Desktop/test.rtf'

required
file_name str)

Required. Desired file name. Ex: 'test.rtf'

required
container_folder_path str)

Optional. Additional folder path within directory. Ex: 'RawData', 'FormattedData' or 'Images'

''
use_path bool)

Required. Whether to use file path to upload to storage blob or to upload object provided under df. Default is True.

True
df obj)

Optional. Python object to upload to storage blob. Only referenced if use_path=False.

None