Skip to content

Overview

SoFresh is the base class for SoClean and SoMatch

Modules

  • soclean : Class of cleaned program data
  • somatch : Class of matched program data
  • credentials : set credentials in environment & get stored credentials
  • connection : create connection to servers
  • log : creates log for connecting to server & executing queries

Methods

stout.soclean.SoClean methods:

  • clean_text : General text cleaning
  • clean_name : Function to clean first and last names; uses general text cleaning in addition to stripping out name suffixes
  • clean_gender : Standardizes gender names
  • clean_bdate : Converts dates to birthdates
  • clean_date : Standardizes date formatting
  • clean_program : Cleans text and removes 'usa_' prefix from text
  • clean_program_for_census : Aligns program names with census program names. Note: To be used after clean_program(x)
  • clean_event : General text cleaning for event names
  • clean_entrytype : Converts entrytype to 'game' or 'event'
  • clean_entryrole : Converts entryrole to grouping category
  • athlete, unified partner, coach, volunteer, media, medical, official, staff, other
  • clean_has_role : Cleans text and converts to standardized role
  • athlete, coach, unified athlete, unified partner, volunteer, other
  • detect_test_data : Cleans text and conducts searches for terms that indicate it contains test data
  • Note: To be used in HAS tables with fields: FirstName, LastnName, or EventName

stout.somatch.SoMatch methods:

  • match_program : Matches text to a state or country. Best when used with the results of clean_program()
  • match_eventprogram : Extracts state or country name from event name. Best when used with the results of clean_event()

stout.credentials.Credential methods:

  • set_credential : Helper method for setting credential sin system environment.
  • Requires, username, password, & prefix name of db (defined in stout.connection.Connection().connection_map())
  • get_credential : Returns single credential (_USR or _PWD) for specified db
  • get_cred_set : Returns credential set (_USR and _PWD) for specified db

stout.connection.Connection methods:

  • create_default_engine : Method for creating a default engines listed in connection_map
  • execute_query : Helper method for executing a query from a pandas df
  • write_to_target : Helper method for writing data from a pandas df

stout.solake.SoLake methods: * list_files_in_blob : Function to get a list file names located within the Azure container storage * get_excel_file_from_blob_storage : Function to download an Excel file from the Azure storage account * upload_to_blob_storage : Function to upload a single file to the Azure storage account * upload_folder_to_blob_storage : Function to upload a folder with all folder paths to the Azure storage account * download_from_blob_storage : Function to download a file from the Azure storage account * rename_blob : Rename a blob * delete_all_blobs_in_folder : Delete all blobs within a given folder in an Azure Blob container

stout.sosharepoint.SharepointClass methods: * get_access_token : Get the access token * get_site_id : Get Site ID from the SharePoint site URL * get_drive_id : Get Drive ID from the Site ID * check_if_file_exists : Check if a file exists in the SharePoint folder * delete_existing_file : Delete the existing file if it exists * upload_file_to_sharepoint : Upload a file to sharepoint * rename_existing_file_to_prior_file : Rename the current file to prior_ + file_name * get_file_id : Get the file ID from SharePoint * download_file_and_load_as_dataframe : Download the file from SharePoint and return it as a pandas DataFrame * list_items_in_sharepoint_folder : List either files or folders (not both) in a specified folder and its subfolders in SharePoint. * list_all_items_with_paths_in_sharepoint_folder : List all files with their full paths in a specified folder and its subfolders in SharePoint. * clear_sharepoint_folder : Clear all files and folders in a given folder on SharePoint, including subfolders, and delete empty folders. * remove_text_before_second_underscore : Removes everything before and including the second underscore ('_') in the filename. * split_paths_to_dataframe : Convert a list of file paths into a DataFrame where each path component becomes a separate column. * get_uploaded_sharepoint_files_df : Get a dataframe of all files uploaded in a SharePoint folder.

How to run on local machine

cd into stout/

pip install -r requirements.txt
python setup.py install

Example Usage

>>> from stout.soclean import SoClean
>>> from stout.somatch import SoMatch

>>> so_clean = SoClean()
>>> so_match = SoMatch()

>>> so_clean.clean_text("Oh-oh, spaghetti-o")
'oh oh spaghetti o'

>>> so_clean.clean_name("Leigh-Cheri")
'leigh cheri'

>>> so_clean.clean_program("usa_minesota")
'minesota'

>>> so_match.match_program("minesota")
'Minnesota'

>>> so_clean.clean_event("2013 Young Athletes Minnesota State University")
'2013 young athletes minnesota state university'

>>> so_match.match_program("2013 young athletes minnesota state university")
'Minnesota'
>>> from stout.credentials import Credential
>>> from stout.connection import Connection
>>> from stout.log import _log

>>> cred = Credential()
>>> conn = Connection()

# Set credentials for dbs -- db prefixes defined in conn.connection_map()
cred.set_credential(name = 'Dev_v01xx_USR', value = 'username_here')
cred.set_credential(name = 'Dev_v01xx_PWD', value = 'password_here')

# Return a single credential
cred.get_credential(name = 'Dev_v01xx_USR')
cred.get_credential(name = 'Dev_v01xx_PWD')

# Return a credential set
cred.get_cred_set(prefix = 'Dev_v01xx')

# Create engine to connect to db
conn.create_default_engine('Dev_v01xx')

# Execute a query
conn.execute_query('SELECT TOP 100 * FROM SCHEMA.TABLE')

# Execute a query from a string -- allows DROP and CREATE
conn.execute_query_from_str('DROP TABLE IF EXISTS schema.table')

# Write results to a db
conn.write_to_target(df, table_name, schema_name, chunk = True, chunksize = 5000, time_delay = 0)