Skip to content

Overview

soler is a class that resolves so program data.

  • Soler can be run in bulk or streaming.

Included Functions

Modules:

  • soler : class that takes a list of dictionaries and outputs a pandas dataframe.

Example Using Defaults

>>> from soler.pipelines import *

# Soler comes with a default indexing dictionary, default comparison dictionary, and default conversion list

# TO RUN IN BULK:
# set msi to True if running in the vm w/ managed secure identity; if running local, set to False
>>> soler = SolerPersonPipeline(environment = 'Prod_v01xx', msi = True)

>>> soler.run(spawn_personclean = True, spawn_personresolved = True, 
                     load_to_core = True, load_personclean = True, 
                     update_personresolved = False, bulk_personresolved=True)


TO RUN ON STREAMING DATA

>>> soler = SolerPersonPipeline(environment = 'Prod_v01xx', msi=True)

>>> soler.run(spawn_personclean = False, spawn_personresolved = False, 
                     load_to_core = True, load_personclean = True, 
                     update_personresolved = True, bulk_personresolved=False)

Example Using Customized Indexing, Comparisons, and Conversions

>>> from soler.pipelines import *

""" Soler comes with a default indexing dictionary, default comparison dictionary, and default conversion list

If you want to mix up the columns that we compare & index on, we can create another set of indexing & comparison methods.

Tip: If you create a set of comparisons with the same column labels, they'll be concatenated at the end, so you only need to handle one set of record pairs for matching.

Creating multiple sets of indexes/comparisons can be beneficial when you want to check for things like name reversal, e.g. FirstName = LastName.

 If you want to split column and compare first part & last part against rdf, append '_Split' to the end of the column name -- col name will be adjusted.
"""

custom_blockers = {
    1: {'type': 'Block',
        'left': {'col_name': 'BlockLeftColumn'},
        'right': {'col_name': 'BlockRightColumn'}
        },
    2: {'type': 'Sorted Neighborhood',
        'left': {'col_name': 'LastName'},
        'right': {'col_name': 'LastName'}
        },
    3: {'type': 'Sorted Neighborhood',
        'left': {'col_name': 'FirstName'},
        'right': {'col_name': 'LastName'}
        },
    4: {'type': 'Sorted Neighborhood',
        'left': {'col_name': 'LastName_Split'},
        'right': {'col_name': 'LastName'}
        }}

custom_comps = {
  1: {'LastName_Score':{
      'weight': 0.3, 'type': 'string', 'left': 'LastName', 'right': 'LastName'
      }, 'FirstName_Score': {
      'weight': 0.3, 'type': 'string', 'left': 'FirstName',
      'right': 'FirstName'
      }, 'BirthDate_Score': {
      'weight': 0.25, 'type': 'birthdate', 'left': 'BirthDate',
      'right': 'BirthDate'
      }, 'Gender_Score': {
      'weight': 0.15, 'type': 'exact', 'left': 'Gender', 'right': 'Gender'
      }},

  2: {'LastName_Score':{
      'weight': 0.3, 'type': 'string', 'left': 'LastName', 'right': 'FirstName'
      }, 'FirstName_Score': {
      'weight': 0.3, 'type': 'string', 'left': 'FirstName',
      'right': 'LastName'
      }, 'BirthDate_Score': {
      'weight': 0.25, 'type': 'birthdate', 'left': 'BirthDate',
      'right': 'BirthDate'
      }, 'Gender_Score': {
      'weight': 0.15, 'type': 'exact', 'left': 'Gender', 'right': 'Gender'
      }}
}

custom_conversion = ['LastName', 'FirstName']

>>> soler = SolerPersonPipeline(
                    blockers = custom_index, 
                    comparators = custom_comps, 
                    conversion = custom_conversion, 
                    environment = 'Prod_v01xx', 
                    msi=True)

>>> soler.run(spawn_personclean = True, spawn_personresolved = True, 
                     load_to_core = True, load_personclean = True, 
                     update_personresolved = False, bulk_personresolved=True)

Program Resolution

# To conduct a full run on organizations and create Core.ProgramResolved

from soler.pipelines import *

soler = SolerProgramPipeline('Dev_v01xx', msi = True)

soler.run(spawn_programresolved = True, load_to_core = True, 
            load_soi_programs = True)


# To update the current Core.ProgramResoilved with new programs
soler = SolerProgramPipeline('Dev_v01xx', msi = True)

soler.run(spawn_programresolved = False, load_to_core = True, 
            load_soi_programs = False)