Data Warehouse Description
Our warehouse is running in Azure as a SQL Server with PROD and DEV databases.
- Subscription:
DATAMANAGEMENT
- Resource Group:
SO-DATA-ANALYTICS
- Server:
soidatawarehouse
- Database:
Prod_v01xx
,Dev_v01xx
- Schema: See below
All code related to the warehouse is housed in the soi-dwh repository.
Schemas
- Census
- Common
- Core
- GMS
- HAS
- HASPublic
- LegacyHAS
- SHE
- SoFlow
- Sport
- Stage
Core data model
The core data model captures critical business entities and is designed to balance interpretability, normalization, and generalization. It is interoperable with schemas for major data assets as of 2022-04-19, including HAS and GMS. The core schema is roughly analogous to a fact constellation but is better understood as a graph with vertices and edges where relationships manifest inter- and intra-table.
Data sources
Below we list incoming data sources and their statuses.
- HAS Legacy
- Format: excel, CSV, flat files
- HAS OpenMRS
- Format: Postgres DB
- GMS Backups
- Format: Azure SQL DB
- Census / Accreditation
- Format: Azure SQL DB
- Sport Partnership
- Format: Azure Data Lake
- SHE
- Format: Azure Data Lake
- SO Connect
- Format: Azure Blob Storage
Common data elements
CreatedDateTime
- Type: datetime
- Descr: The timestamp when the row was created.
AssetId
- Type: varchar
- Descr: Id of pipeline run which can be tied back to SoFLow.Log
-
*Id
- Type: bigint
- Descr:
SourceId
: The identifier for the source of the row, e.g. a Healthy Athletes file or a GMS database.LocationId
: The identifier for the Location concept, uniquely defined based on locational attributes. Null is converted to 'No Location'.PersonId
: The unique identifer for a person based on matching against normalized first name, last name, gender, date of birth, and LocationId.OrganizationId
: The identifier for an Organization based on name and type, e.g. delegation, program, address. Matched on organization name, region, and LocationId.EventId
: The identifier for the Event concept, uniquely defined based on event name, OrganizationId, and LocationId.RoleId
: The identifier for the person role for an event.DisciplineId
: The identifier for the specific discipline, e.g. Sport (Basketball) or Health (FunFitness).MetricId
: The identifier for the event metric, e.g., 'Participated', 'bmi'.ObservationId
: The identifier for an instance of observation for a role, e.g. a HAS screening or a sport competition entry for an athlete.MeasureId
: The identifier for an individual instance of measured participation, e.g., Bowling participation for a specific event.
-
*Uuid
- Type: varchar
- Descr:
PersonUuid
: Uuid for resolved person profile. Person profiles are compared against first name, lastname, gender, birthdate, and Program. Person profiles from [Core].[Person] are cleaned, normalized via Soundex, and stored in [Soler].[PersonClean], then compared and resolved and stored in [Soler].[PersonResolved]. Results are joined back to the originalPersonId
and stored in [Core].[PersonResolved].OrganizationUuid
: Uuid for resolved Program profile. Organization names with OrganzationType = 'Program' are fuzzy matched to the standard SOI Program name list from [Soler].['SOIPrograms']. Results are stored in [Core].[ProgramResolved].
Views
Views are read-only prepared queries for our PowerBI dashboards. You will see information about this for each schema within their respective pages, if available. It looks like the same format as the normal tables - just with the changes that are bullet listed.
Example
Find a person's roles over time
SELECT
pr.PersonUuid,
p.FirstName,
p.LastName,
r.RoleName,
MIN(meas.MeasuredDate) as StartDate,
d.DisciplineName AS DisciplineType,
d.SubType AS DisciplineSubType
FROM
[Core].[Observation] obs
JOIN [Core].[Person] p on obs.PersonId = p.PersonId
JOIN [Core].[PersonResolved] pr on p.PersonId = pr.PersonId
JOIN [Core].[Measure] meas on obs.ObservationId = meas.ObservationId
JOIN [Core].[Role] r on meas.RoleId = r.RoleId
JOIN [Core].[Discipline] d ON meas.DisciplineId = d.DisciplineId
GROUP BY
pr.PersonUuid,
p.FirstName,
p.LastName,
r.RoleName,
d.DisciplineName,
d.SubType
;