6/1/23 CELR Data Lake Migration Meeting notes

Date

Jun 1, 2023

Attendees:

APHL

CDC

Peraton

APHL

CDC

Peraton

Dari Shirazi:

Megan Light:

Erroll Rosser:

Vanessa Holley : X

Teresa Jue: --

Kristin Peterson :

Brooke Beaulieu: X

Cheri Gatland-Lightener:

Tom Russell: --

Mel Kourbage:

Norris Kpamegan :

Marcelo Caldas:

Gretl Glick: X

Ryan Harrison: X

Don Lindsay: X

Geo Miller: X

 

Leslyn Mcnabb:

Alissa McShane:

 

Marion

Goals

  • Update on Data Lake Migration Status

  • DEX Overview and Status Update

Discussion topics

Item

Notes

Item

Notes

Overall Status Updates

 

 

Metadata values required by DEX

  • meta_destination_id – unique identifier used to indicate the program associated with

the upload.

  • meta_ext_event – unique identifier used to indicate the event type within the program

that this file belongs to (e.g., routineImmunization).

optional for all:

meta_schema_version

  • How is optional metadata for a specific meta_destination_id/meta_ext_event/meta_schema_version defined?

  • Does this need to be defined before sending metadata values or will DEX allow any additional optional metadata values to be sent?

Mtg Notes:

  • Prototyped download to bucket:

    • No issues using upload API

    • Meta Data Questions

    • Viable path forward

  • Option 2: AWS Data Sync

    • Runs on schedule

    • Cost implications/questions/ blob storage time period

    • Requires EC2 configuration, without monitoring and auditing services

    • Within S3--when copy is done, are the same keys within S3 in Azure? No; Metadata is not copied over to blob metadata--not impossible, but would need to create solution

      • Would need either to …option 1 write custom connector or

      • Questions on long-term monitoring and cost

  • Option 3:

    • Requires EC2 configuration, without monitoring and auditing services

    • Within S3--when copy is done, are the same keys within S3 in Azure? No; Metadata is not copied over to blob metadata--not impossible, but would need to create solution

      • Would need either to …option 1 write custom connector or

      • Questions on long-term monitoring and cost

      • CDC is pulling data

  • Option 4:

    • Almost same as Option 1

    • Instead of pushing to httpm endpoint, could just have DEX pull from AIMS

  • Pros/Cons to all options:

    • Either using upload API, or having DEX pull data from S3 bucket on AIMs is most viable

      • DEX: What is the thing doing the pull? AIMS: Would provide code to DEX, set of library code

      • AIMS: This would be similar to initial data pull for EIP+

      • Could make a new code base, would share code with DEX, used for other integration projects

  • Decision Point:

    • Load test/Performance Test: Using meta data

      • Option 1: Aims will need to decide to where to host, but could start that next week

      • Option 4: Needs a bit of refinement, but also viable

      • Option 2,3: Would need additional developer research/support to implement

    • Uncertain if meta data comes across with Option Data Sync: Scheduling feature, and growth of data would need to be configured/monitored

    • AIMS/DS: Choice needs to be considered from long-term viability and ability to use for other programs/use cases

      • GM: Bucket used would prefix project, meta data used would be mapped to data upload api object, so could be scaled

      • DS: Option 4 is nicer for AIMS, as less code is maintained on AIMS; Polling from CDC; Queue can be monitored, but is available for CDC to poll at any time

        • Question is who is monitoring queue/polling

    • DEX: In Option 4, how is Meta Data received? AIMS: Get object command would make meta data available; DEX: Meta Data would be persisted in Azure blob

  • Leslynn: Are there any programs which APHL would need to pull data back from CDC? Dari: Currently, EIP is the only program which pushes data from CDC, but that would be migrated to CDC Platform intermediate term

  • Leslynn: Total daily volumes--would this be possible to provide per program?

    • AIMS: Can provide overview of volume for DLs, and potentially other programs

      • Action item: Provide counts to Leslynn and Ryan

  • AIMS: Preference would be Option 4; Option 1 would need to be coordinated long-term with DEX, so higher LOE

    • DEX: Coordination would not be needed, DEX would

    • DEX: Could we test Option 1?

DRAFT Data Flow Diagram

 

Questions

 

Next Steps & Action Items

  • Next steps:

    •  

Action items

Continue to research options, decide on option, AIMS will create data flow diagram @Geoffery Miller
Provide Geo with paid Azure subscription for prototype/testing @dari.shirazi@aphl.org or @Alissa McShane
@Gretl Glick Next Meeting: Schedule: June 1, 2023 at 1pm ET

Quick decisions not requiring context or tracking

For quick, smaller decisions that do not require extra context or formal tracking, use the “Add a decision…” function here.

Decisions requiring context or tracking

For decisions that require more context (e.g., documentation of discussion, options considered) and/or tracking, use the decision template to capture more information.