Overview
SoFresh is the base class for SoClean and SoMatch
Modules
- soclean : Class of cleaned program data
- somatch : Class of matched program data
- credentials : set credentials in environment & get stored credentials
- connection : create connection to servers
- log : creates log for connecting to server & executing queries
Methods
stout.soclean.SoClean methods:
- clean_text : General text cleaning
- clean_name : Function to clean first and last names; uses general text cleaning in addition to stripping out name suffixes
- clean_gender : Standardizes gender names
- clean_bdate : Converts dates to birthdates
- clean_date : Standardizes date formatting
- clean_program : Cleans text and removes 'usa_' prefix from text
- clean_program_for_census : Aligns program names with census program names. Note: To be used after clean_program(x)
- clean_event : General text cleaning for event names
- clean_entrytype : Converts entrytype to 'game' or 'event'
- clean_entryrole : Converts entryrole to grouping category
- athlete, unified partner, coach, volunteer, media, medical, official, staff, other
- clean_has_role : Cleans text and converts to standardized role
- athlete, coach, unified athlete, unified partner, volunteer, other
- detect_test_data : Cleans text and conducts searches for terms that indicate it contains test data
- Note: To be used in HAS tables with fields: FirstName, LastnName, or EventName
stout.somatch.SoMatch methods:
- match_program : Matches text to a state or country. Best when used with the results of clean_program()
- match_eventprogram : Extracts state or country name from event name. Best when used with the results of clean_event()
stout.credentials.Credential methods:
- set_credential : Helper method for setting credential sin system environment.
- Requires, username, password, & prefix name of db (defined in stout.connection.Connection().connection_map())
- get_credential : Returns single credential (_USR or _PWD) for specified db
- get_cred_set : Returns credential set (_USR and _PWD) for specified db
stout.connection.Connection methods:
- create_default_engine : Method for creating a default engines listed in
connection_map
- execute_query : Helper method for executing a query from a pandas df
- write_to_target : Helper method for writing data from a pandas df
How to run on local machine
cd into stout/
Example Usage
>>> from stout.soclean import SoClean
>>> from stout.somatch import SoMatch
>>> so_clean = SoClean()
>>> so_match = SoMatch()
>>> so_clean.clean_text("Oh-oh, spaghetti-o")
'oh oh spaghetti o'
>>> so_clean.clean_name("Leigh-Cheri")
'leigh cheri'
>>> so_clean.clean_program("usa_minesota")
'minesota'
>>> so_match.match_program("minesota")
'Minnesota'
>>> so_clean.clean_event("2013 Young Athletes Minnesota State University")
'2013 young athletes minnesota state university'
>>> so_match.match_program("2013 young athletes minnesota state university")
'Minnesota'
>>> from stout.credentials import Credential
>>> from stout.connection import Connection
>>> from stout.log import _log
>>> cred = Credential()
>>> conn = Connection()
# Set credentials for dbs -- db prefixes defined in conn.connection_map()
cred.set_credential(name = 'Dev_v01xx_USR', value = 'username_here')
cred.set_credential(name = 'Dev_v01xx_PWD', value = 'password_here')
# Return a single credential
cred.get_credential(name = 'Dev_v01xx_USR')
cred.get_credential(name = 'Dev_v01xx_PWD')
# Return a credential set
cred.get_cred_set(prefix = 'Dev_v01xx')
# Create engine to connect to db
conn.create_default_engine('Dev_v01xx')
# Execute a query
conn.execute_query('SELECT TOP 100 * FROM SCHEMA.TABLE')
# Execute a query from a string -- allows DROP and CREATE
conn.execute_query_from_str('DROP TABLE IF EXISTS schema.table')
# Write results to a db
conn.write_to_target(df, table_name, schema_name, chunk = True, chunksize = 5000, time_delay = 0)