Extractors
Census
CensusExtractor
Bases: BaseExtractor
Extractor class for Census data.
__init__(connection)
Extractor for Census data.
This class handles pulling Census.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`connection` |
Connection
|
The sqlalchemy connection and engine to Census. |
required |
pull_data(table)
Method for pulling data from Census based on connection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`table` |
str
|
The Census table to begin extraction. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The extract result. |
CensusPublic
Bases: BaseExtractor
Class to create a census table to connect to the census dashboard.
__init__(connection)
Extractor for the CensusPublic Table
get_data()
Function to get census data for the dashboard.
Common
CommonExtractor
Bases: BaseExtractor
Extractor class for Common data.
__init__()
Extractor for Common data.
This class handles pulling the Common tables.
pull_data(tableLocation)
Method for pulling data from the common csv files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`tableLocation` |
str
|
The location of the Common table |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The extract result. |
GMS
GMSExtractor
Bases: BaseExtractor
Extractor class for GMS.
__init__(msi)
Extractor for GMS pipeline.
pull_data(source_dict, table)
Method to pull entries table from GMS.
OpenMRS
OpenMRSExtractor
Bases: BaseExtractor
Extractor class for OpenMRS data.
__init__(source_connection, target_connection, msi)
Extractor for OpenMRS data in Postgres.
This class handles pulling and normalizing data from OpenMRS. In
the future, we would expect the normalization procedures to be pulled
out and called as an external service in the pipeline a la soclean
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`connection` |
Connection
|
The sqlalchemy connection and engine to OpenMRS. |
required |
Attributes:
Name | Type | Description |
---|---|---|
`column_map` |
dict
|
A mapping from OpenMRS header fields to warehouse fields. |
`header_column_order` |
list
|
Ordering list for the header columns. |
pull_data(table, isLegacyData)
Method for pulling data from OpenMRS based on connection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`table` |
str
|
The HAS table to begin extraction. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The extract result. |
SHE
SHEExtractor
Bases: BaseExtractor
Extractor class for SHE (Single Health Evaluation) data.
__init__(connection, msi)
Extractor for SHE data.
This class handles pulling and normalizing data from the SHE data that is formatted into an Excel file. The file is downloaded from the data lake location here: https://portal.azure.com/#view/Microsoft_Azure_Storage/ContainerMenuBlade/~/overview/storageAccountId/%2Fsubscriptions%2F89d34fcc-ea62-4ebf-b49e-dcd915469196%2FresourceGroups%2FSO-DATA-ANALYTICS%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fsoidatalake01xx/path/she-data/etag/%220x8DB9DBA518E795C%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride~/false/defaultId//publicAccessVal/None
create_long_summary_table(df)
Function to create a a long version of the SHE summary table
create_summary_table(df)
Function to create a SHE table with summary calculations
get_she_data()
Function to iterate through all she tables and create a df
Returns:
Type | Description |
---|---|
DataFrame
|
The SHE data. |
make_yn_binary(df)
Function to convert Yes/No columns into 1/0 columns
SportPartnership
SportPartnershipExtractor
Extractor class for the Sport Partnership data.
__init__(connection, msi)
Extractor for the Sport Partnership data.
This class handles pulling and normalizing data from the Sport
Partnership Excel files. The file are
downloaded from the data lake location here:
https://portal.azure.com/#view/Microsoft_Azure_Storage/ContainerMenuBlade/~/overview/storageAccountId/%2Fsubscriptions%2F89d34fcc-ea62-4ebf-b49e-dcd915469196%2FresourceGroups%2FSO-DATA-ANALYTICS%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fsoidatalake01xx/path/sport-partnership-survey/etag/%220x8DBBDED322A498F%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride~/false/defaultId//publicAccessVal/None
get_sport_partnership_data()
Function to clean the sport partnership data and create a df
Returns:
Type | Description |
---|---|
DataFrame
|
The Sport Partnership data. |