Skip to content

Overview

SoFresh is the base class for SoClean and SoMatch

Modules

  • soclean : Class of cleaned program data
  • somatch : Class of matched program data
  • credentials : set credentials in environment & get stored credentials
  • connection : create connection to servers
  • log : creates log for connecting to server & executing queries

Methods

stout.soclean.SoClean methods:

  • clean_text : General text cleaning
  • clean_name : Function to clean first and last names; uses general text cleaning in addition to stripping out name suffixes
  • clean_gender : Standardizes gender names
  • clean_bdate : Converts dates to birthdates
  • clean_date : Standardizes date formatting
  • clean_program : Cleans text and removes 'usa_' prefix from text
  • clean_program_for_census : Aligns program names with census program names. Note: To be used after clean_program(x)
  • clean_event : General text cleaning for event names
  • clean_entrytype : Converts entrytype to 'game' or 'event'
  • clean_entryrole : Converts entryrole to grouping category
  • athlete, unified partner, coach, volunteer, media, medical, official, staff, other
  • clean_has_role : Cleans text and converts to standardized role
  • athlete, coach, unified athlete, unified partner, volunteer, other
  • detect_test_data : Cleans text and conducts searches for terms that indicate it contains test data
  • Note: To be used in HAS tables with fields: FirstName, LastnName, or EventName

stout.somatch.SoMatch methods:

  • match_program : Matches text to a state or country. Best when used with the results of clean_program()
  • match_eventprogram : Extracts state or country name from event name. Best when used with the results of clean_event()

stout.credentials.Credential methods:

  • set_credential : Helper method for setting credential sin system environment.
  • Requires, username, password, & prefix name of db (defined in stout.connection.Connection().connection_map())
  • get_credential : Returns single credential (_USR or _PWD) for specified db
  • get_cred_set : Returns credential set (_USR and _PWD) for specified db

stout.connection.Connection methods:

  • create_default_engine : Method for creating a default engines listed in connection_map
  • execute_query : Helper method for executing a query from a pandas df
  • write_to_target : Helper method for writing data from a pandas df

How to run on local machine

cd into stout/

pip install -r requirements.txt
python setup.py install

Example Usage

>>> from stout.soclean import SoClean
>>> from stout.somatch import SoMatch

>>> so_clean = SoClean()
>>> so_match = SoMatch()

>>> so_clean.clean_text("Oh-oh, spaghetti-o")
'oh oh spaghetti o'

>>> so_clean.clean_name("Leigh-Cheri")
'leigh cheri'

>>> so_clean.clean_program("usa_minesota")
'minesota'

>>> so_match.match_program("minesota")
'Minnesota'

>>> so_clean.clean_event("2013 Young Athletes Minnesota State University")
'2013 young athletes minnesota state university'

>>> so_match.match_program("2013 young athletes minnesota state university")
'Minnesota'
>>> from stout.credentials import Credential
>>> from stout.connection import Connection
>>> from stout.log import _log

>>> cred = Credential()
>>> conn = Connection()

# Set credentials for dbs -- db prefixes defined in conn.connection_map()
cred.set_credential(name = 'Dev_v01xx_USR', value = 'username_here')
cred.set_credential(name = 'Dev_v01xx_PWD', value = 'password_here')

# Return a single credential
cred.get_credential(name = 'Dev_v01xx_USR')
cred.get_credential(name = 'Dev_v01xx_PWD')

# Return a credential set
cred.get_cred_set(prefix = 'Dev_v01xx')

# Create engine to connect to db
conn.create_default_engine('Dev_v01xx')

# Execute a query
conn.execute_query('SELECT TOP 100 * FROM SCHEMA.TABLE')

# Execute a query from a string -- allows DROP and CREATE
conn.execute_query_from_str('DROP TABLE IF EXISTS schema.table')

# Write results to a db
conn.write_to_target(df, table_name, schema_name, chunk = True, chunksize = 5000, time_delay = 0)