Skip to content

Data Warehouse Description

Our warehouse is running in Azure as a SQL Server with PROD and DEV databases.

  • Subscription: DATAMANAGEMENT
  • Resource Group: SO-DATA-ANALYTICS
  • Server: soidatawarehouse
  • Database: Prod_v01xx, Dev_v01xx
  • Schema: See below

All code related to the warehouse is housed in the soi-dwh repository.

Schemas

  • Census
  • Common
  • Core
  • GMS
  • HAS
  • HASPublic
  • LegacyHAS
  • SHE
  • SoFlow
  • Sport
  • Stage

Core data model

The core data model captures critical business entities and is designed to balance interpretability, normalization, and generalization. It is interoperable with schemas for major data assets as of 2022-04-19, including HAS and GMS. The core schema is roughly analogous to a fact constellation but is better understood as a graph with vertices and edges where relationships manifest inter- and intra-table.

Data Model

Data sources

Below we list incoming data sources and their statuses.

  • HAS Legacy
    • Format: excel, CSV, flat files
  • HAS OpenMRS
    • Format: Postgres DB
  • GMS Backups
    • Format: Azure SQL DB
  • Census / Accreditation
    • Format: Azure SQL DB
  • Sport Partnership
    • Format: Azure Data Lake
  • SHE
    • Format: Azure Data Lake
  • SO Connect
    • Format: Azure Blob Storage

Common data elements

  • CreatedDateTime
    • Type: datetime
    • Descr: The timestamp when the row was created.
  • AssetId
    • Type: varchar
    • Descr: Id of pipeline run which can be tied back to SoFLow.Log
  • *Id

    • Type: bigint
    • Descr:
      • SourceId: The identifier for the source of the row, e.g. a Healthy Athletes file or a GMS database.
      • LocationId: The identifier for the Location concept, uniquely defined based on locational attributes. Null is converted to 'No Location'.
      • PersonId: The unique identifer for a person based on matching against normalized first name, last name, gender, date of birth, and LocationId.
      • OrganizationId: The identifier for an Organization based on name and type, e.g. delegation, program, address. Matched on organization name, region, and LocationId.
      • EventId: The identifier for the Event concept, uniquely defined based on event name, OrganizationId, and LocationId.
      • RoleId: The identifier for the person role for an event.
      • DisciplineId: The identifier for the specific discipline, e.g. Sport (Basketball) or Health (FunFitness).
      • MetricId: The identifier for the event metric, e.g., 'Participated', 'bmi'.
      • ObservationId: The identifier for an instance of observation for a role, e.g. a HAS screening or a sport competition entry for an athlete.
      • MeasureId: The identifier for an individual instance of measured participation, e.g., Bowling participation for a specific event.
  • *Uuid

    • Type: varchar
    • Descr:
      • PersonUuid: Uuid for resolved person profile. Person profiles are compared against first name, lastname, gender, birthdate, and Program. Person profiles from [Core].[Person] are cleaned, normalized via Soundex, and stored in [Soler].[PersonClean], then compared and resolved and stored in [Soler].[PersonResolved]. Results are joined back to the original PersonId and stored in [Core].[PersonResolved].
      • OrganizationUuid: Uuid for resolved Program profile. Organization names with OrganzationType = 'Program' are fuzzy matched to the standard SOI Program name list from [Soler].['SOIPrograms']. Results are stored in [Core].[ProgramResolved].

Views

Views are read-only prepared queries for our PowerBI dashboards. You will see information about this for each schema within their respective pages, if available. It looks like the same format as the normal tables - just with the changes that are bullet listed.

Example

Find a person's roles over time

SELECT
    pr.PersonUuid,
    p.FirstName,
    p.LastName,
    r.RoleName,
    MIN(meas.MeasuredDate) as StartDate,
    d.DisciplineName AS DisciplineType,
    d.SubType AS DisciplineSubType

FROM
    [Core].[Observation] obs
    JOIN [Core].[Person] p on obs.PersonId = p.PersonId
    JOIN [Core].[PersonResolved] pr on p.PersonId = pr.PersonId
    JOIN [Core].[Measure] meas on obs.ObservationId = meas.ObservationId
    JOIN [Core].[Role] r on meas.RoleId = r.RoleId
    JOIN [Core].[Discipline] d ON meas.DisciplineId = d.DisciplineId
GROUP BY 
    pr.PersonUuid,
    p.FirstName,
    p.LastName,
    r.RoleName,
    d.DisciplineName,
    d.SubType
;