Skip to content

SoClean

SoClean

Bases: SoFresh

Class for cleaning program data.

Parameters

input : object An object to be manipulated. Object must be of type: string, date, or numeric

clean_address(s1, address_type)

Extracting address information from dbo.Address.Addresses.

Keyword arguments: s1 : string from which address information will be extracted. address_type : address field to be extracted. Used as a regex pattern (e.g., 'addr0', 'addr1', 'country').

clean_bdate(s1)

Cleans birthdate; converts to date and coerces errors.

Keyword arguments: s1 : date to be cleaned.

clean_columns(df, case, keep_nums=False)

Function to clean column names column name output can be either: camel, pascal, or snake case.

Keyword args: df : pandas.DataFrame case : type of case to convert column names to keep_nums : whether to keep numeric values in column names

clean_date(s1)

General date formatting.

Keyword arguments: s1 : date to be formatted; converts to str and parses into YYYY-MM-DD.

clean_entryrole(s1)

Infers role type.

Use is limited to GMS.Entries.clean_entryrole. Keyword arguments: s1 : string to be assigned as role type Example: athlete, partner, coach, etc.

clean_entrytype(s1)

Infers if entry is of 'game' or 'event' type.

Use is limited to GMS.Entries.EntryRole Keyword arguments: s1 : string to be assigned as either 'game' or 'event'.

clean_event(s1)

General text cleaning plus removal of 'so_' prefixes.

Keyword arguments: s1 : string to be cleaned.

clean_gender(s1)

Gender category standardization.

Keyword arguments: s1 : string to be cleaned (male; female; unknown/other).

clean_has_role(s1)

Text cleaning to clean Role in HAS data. Categories of interest are: athlete, coach, unified partner, volunteer, other, unified athlete

Keyword arguments: s1 : string to be cleaned.

clean_name(s1)

General text cleaning plus removal of suffixes and junk indicators.

Keyword arguments: s1 : string to be cleaned.

clean_program(s1)

General text cleaning plus removal of 'usa_' prefixes.

Keyword arguments: s1 : string to be cleaned.

clean_program_for_census(s1)

Text cleaning to be run after clean_program. Maps programs specifically to how they are written in the census.

Keyword arguments: s1 : string to be cleaned.

clean_text(s1)

General text cleaning.

Keyword arguments: s1 : string to be cleaned.

clean_zips(s1)

Searches for zips longer than 10 digits and pares down to 10 if needed.

Keyword arguments s1 : string to be cleaned.

detect_test_data(s1)

Text cleaning to detect test data in a string column (created for HAS FirstName, LastName or EventName). Looks for occasions of test, user and patient. It will replace the value with 'test'

Keyword arguments: s1 : string to be cleaned.