SoClean
SoClean
Bases: SoFresh
Class for cleaning program data.
Parameters
input : object An object to be manipulated. Object must be of type: string, date, or numeric
clean_address(s1, address_type)
Extracting address information from dbo.Address.Addresses.
Keyword arguments: s1 : string from which address information will be extracted. address_type : address field to be extracted. Used as a regex pattern (e.g., 'addr0', 'addr1', 'country').
clean_bdate(s1)
Cleans birthdate; converts to date and coerces errors.
Keyword arguments: s1 : date to be cleaned.
clean_columns(df, case, keep_nums=False)
Function to clean column names column name output can be either: camel, pascal, or snake case.
Keyword args: df : pandas.DataFrame case : type of case to convert column names to keep_nums : whether to keep numeric values in column names
clean_date(s1)
General date formatting.
Keyword arguments: s1 : date to be formatted; converts to str and parses into YYYY-MM-DD.
clean_entryrole(s1)
Infers role type.
Use is limited to GMS.Entries.clean_entryrole. Keyword arguments: s1 : string to be assigned as role type Example: athlete, partner, coach, etc.
clean_entrytype(s1)
Infers if entry is of 'game' or 'event' type.
Use is limited to GMS.Entries.EntryRole Keyword arguments: s1 : string to be assigned as either 'game' or 'event'.
clean_event(s1)
General text cleaning plus removal of 'so_' prefixes.
Keyword arguments: s1 : string to be cleaned.
clean_gender(s1)
Gender category standardization.
Keyword arguments: s1 : string to be cleaned (male; female; unknown/other).
clean_has_role(s1)
Text cleaning to clean Role in HAS data. Categories of interest are: athlete, coach, unified partner, volunteer, other, unified athlete
Keyword arguments: s1 : string to be cleaned.
clean_name(s1)
General text cleaning plus removal of suffixes and junk indicators.
Keyword arguments: s1 : string to be cleaned.
clean_program(s1)
General text cleaning plus removal of 'usa_' prefixes.
Keyword arguments: s1 : string to be cleaned.
clean_program_for_census(s1)
Text cleaning to be run after clean_program. Maps programs specifically to how they are written in the census.
Keyword arguments: s1 : string to be cleaned.
clean_text(s1)
General text cleaning.
Keyword arguments: s1 : string to be cleaned.
clean_zips(s1)
Searches for zips longer than 10 digits and pares down to 10 if needed.
Keyword arguments s1 : string to be cleaned.
detect_test_data(s1)
Text cleaning to detect test data in a string column (created for HAS FirstName, LastName or EventName). Looks for occasions of test, user and patient. It will replace the value with 'test'
Keyword arguments: s1 : string to be cleaned.