SoLake
SoLake
Extractor class for the Azure Data Lake Container Storage.
__init__()
Extractor for Azure Data Lake Container Storage data.
This class handles downloading and uploading data to the Azure storage containers (also known as blobs)
delete_all_blobs_in_folder(container, folder_prefix)
Delete all blobs within a given folder in an Azure Blob container.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container |
str
|
The name of the container where the folder exists. Example: 'pqs' |
required |
folder_prefix |
str
|
The "folder" prefix (path) within the container to delete. Example: '2024/' |
required |
download_from_blob_storage(container, file_path, location_to_save_file)
Function to download a file from the Azure storage account. This will save the file in the location_to_save_file + file_path location
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container(str) |
Required. Name of the container. Ex: 'so-connect-migration-data' |
required | |
file_path |
str or file)
|
Required. Location path to file you want to upload. Must include directory/file. (Ex: 'Test/test.rtf') or if the file is within your python environment, then the python file name (ex: df). |
required |
location_to_save_file |
str)
|
Required. Location of where you want to save the file locally. Must end with a / Ex: '/Users/user/Desktop/'. However, if the location is 'python', then it will save as a python DataFrame and not to a file |
required |
get_excel_file_from_blob_storage(container, file_path)
Function to download a file from the Azure storage account This is for excel due to the blob_data.content_as_bytes() piece, then you can read it like a normal read excel file with pd.read_excel()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container(str) |
Required. Name of the container. Ex: 'she-data', |
required | |
file_path |
str
|
Required. Location of the file you want to download. Must include directory/file. Example: 'Test/test.rtf' |
required |
list_files_in_blob(container, container_folder_path='', print_files=False)
Function to get a list file names located within the Azure container storage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container(str) |
Required. Name of the container. Ex: 'she-data' |
required | |
container_folder_path |
str)
|
Optional. Folder path within directory. Ex: 'Metrics Charts' |
''
|
print_files(boolean) |
False does not print the file names and True will print the file names |
required |
rename_blob(container, old_blob_name, new_blob_name)
Rename a blob.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container |
str)
|
Required. Name of the container with folder path. Ex: 'pqs/2024 File Lists' |
required |
old_blob_name |
str)
|
Old blob name (include .csv type endings) |
required |
new_blob_name |
str)
|
New blob name (include .csv type endings) |
required |
upload_folder_to_blob_storage(container, toplevel_localpath, toplevel_azurepath)
Function to upload a folder with all folder paths to the Azure storage account.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container(str) |
Required. Name of the container. Ex: 'so-connect-migration-data' |
required | |
toplevel_localpath |
str
|
Required. Location of the top level of the local folder you want to upload. Example: '/Users/username/Desktop/Test' |
required |
toplevel_azurepath |
str
|
Required. Location of the top level of the folder that you want to upload the data to in Azure. It will create a folder on Azure with that name if the folder does not already exist. Example: 'Test' |
required |
upload_to_blob_storage(container, file_path, file_name, container_folder_path='', use_path=True, df=None)
Function to upload a single file to the Azure storage account.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
container(str) |
Required. Name of the container. Ex: 'so-connect-migration-data' |
required | |
file_path |
str)
|
Required. Location path to file you want to upload. Ex: '/Users/user/Desktop/test.rtf' |
required |
file_name |
str)
|
Required. Desired file name. Ex: 'test.rtf' |
required |
container_folder_path |
str)
|
Optional. Additional folder path within directory. Ex: 'RawData', 'FormattedData' or 'Images' |
''
|
use_path |
bool)
|
Required. Whether to use file path to upload to
storage blob or to upload object provided under |
True
|
df |
obj)
|
Optional. Python object to upload to storage blob. Only referenced if use_path=False. |
None
|