Skip to content

Extractors

Census

CensusExtractor

Bases: BaseExtractor

Extractor class for Census data.

__init__(connection)

Extractor for Census data.

This class handles pulling Census.

Parameters:

Name Type Description Default
`connection` Connection

The sqlalchemy connection and engine to Census.

required

pull_data(table)

Method for pulling data from Census based on connection.

Parameters:

Name Type Description Default
`table` str

The Census table to begin extraction.

required

Returns:

Type Description
DataFrame

The extract result.

CensusPublic

Bases: BaseExtractor

Class to create a census table to connect to the census dashboard.

__init__(connection)

Extractor for the CensusPublic Table

get_data()

Function to get census data for the dashboard.

Common

CommonExtractor

Bases: BaseExtractor

Extractor class for Common data.

__init__()

Extractor for Common data.

This class handles pulling the Common tables.

pull_data(tableLocation)

Method for pulling data from the common csv files.

Parameters:

Name Type Description Default
`tableLocation` str

The location of the Common table

required

Returns:

Type Description
DataFrame

The extract result.

GMS

GMSExtractor

Bases: BaseExtractor

Extractor class for GMS.

__init__(msi)

Extractor for GMS pipeline.

pull_data(source_dict, table)

Method to pull entries table from GMS.

OpenMRS

OpenMRSExtractor

Bases: BaseExtractor

Extractor class for OpenMRS data.

__init__(source_connection, target_connection, msi)

Extractor for OpenMRS data in Postgres.

This class handles pulling and normalizing data from OpenMRS. In the future, we would expect the normalization procedures to be pulled out and called as an external service in the pipeline a la soclean.

Parameters:

Name Type Description Default
`connection` Connection

The sqlalchemy connection and engine to OpenMRS.

required

Attributes:

Name Type Description
`column_map` dict

A mapping from OpenMRS header fields to warehouse fields.

`header_column_order` list

Ordering list for the header columns.

pull_data(table, isLegacyData)

Method for pulling data from OpenMRS based on connection.

Parameters:

Name Type Description Default
`table` str

The HAS table to begin extraction.

required

Returns:

Type Description
DataFrame

The extract result.

SHE

SHEExtractor

Bases: BaseExtractor

Extractor class for SHE (Single Health Evaluation) data.

__init__(connection, msi)

Extractor for SHE data.

This class handles pulling and normalizing data from the SHE data that is formatted into an Excel file. The file is downloaded from the data lake location here: https://portal.azure.com/#view/Microsoft_Azure_Storage/ContainerMenuBlade/~/overview/storageAccountId/%2Fsubscriptions%2F89d34fcc-ea62-4ebf-b49e-dcd915469196%2FresourceGroups%2FSO-DATA-ANALYTICS%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fsoidatalake01xx/path/she-data/etag/%220x8DB9DBA518E795C%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride~/false/defaultId//publicAccessVal/None

create_long_summary_table(df)

Function to create a a long version of the SHE summary table

create_summary_table(df)

Function to create a SHE table with summary calculations

get_she_data()

Function to iterate through all she tables and create a df

Returns:

Type Description
DataFrame

The SHE data.

make_yn_binary(df)

Function to convert Yes/No columns into 1/0 columns

SportPartnership

SportPartnershipExtractor

Extractor class for the Sport Partnership data.

__init__(connection, msi)

Extractor for the Sport Partnership data.

This class handles pulling and normalizing data from the Sport
Partnership Excel files. The file are downloaded from the data lake location here: https://portal.azure.com/#view/Microsoft_Azure_Storage/ContainerMenuBlade/~/overview/storageAccountId/%2Fsubscriptions%2F89d34fcc-ea62-4ebf-b49e-dcd915469196%2FresourceGroups%2FSO-DATA-ANALYTICS%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fsoidatalake01xx/path/sport-partnership-survey/etag/%220x8DBBDED322A498F%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride~/false/defaultId//publicAccessVal/None

get_sport_partnership_data()

Function to clean the sport partnership data and create a df

Returns:

Type Description
DataFrame

The Sport Partnership data.