CELR Decommission Process

Date

Jul 3, 2024

Attendees:

APHL/AIMS	Peraton	CDC/MVPS

APHL/AIMS	Peraton	CDC/MVPS
Geo Miller: --	Erroll Rosser:	Tricia Aden
Gretl Glick:	Kristin Peterson	Teresa Jue
Vanessa Holley		Megan Mueller
Laura Carlton
Dari Shirazi
Eddie
John Reaves

Goals

Discussion topics

Item	Notes

Item

Notes

Next Steps/Action Items

Agenda/Objectives:

Introductions
CELR Decommission: Communication and Timeline – Tricia Aden
1. Objective: APHL, CDC, and Peraton Teams understand next steps in Covid-19 ELR data and communicated timelines.
Discuss CELR decommission process – Team Q&A - All
1. Objective: APHL and Peraton Teams understand high-level requirements to decommission CELR pipelines including Data reconciliation, export and pipeline retirement.
2. Not using CELR data for surveillance
3. PRA expires on 9/30/24, making decision to retire CELR
4. Will be sending out communications just to sending jurisdictions in mid-late July
5. Drop dead date will be on 9/30
  1. Is AIMS continuing to send Covid ELR data to DEX?
    1. CDC: Data cannot be received by CDC after 9/30, but de-commissioning process can happen after that
    2. If data is received after 10/1, CDC cannot receive data; would need to send jurisdiction to inform them that we cannot receive data
  2. Does all decommissioning on AIMS need to be completed by 9/30
6. Has communicated to HHS Protect team as well, will work with them on archiving data
Identify CELR decommission team participants and resources – All
1. Objective: Identify decommission team member, team meeting cadence, action items and next steps

· This is a good question to ask; as there is some effort to identify, archive, disable, and delete the resources.

· High Level:

o Decommission Mirth configurations for CELR

Geo: Disable endpoints to process data in CELR on AIMS and send to DEX
Would also need to disable CELR Portal/login
CDC: PHLs are not likely to login to view historic data
Raw data from 2020--what happens to it?
- CDC is archiving processed historic data in Amazon glacier
- ER: From a pipeline, saved in S3 bucket, then redshift, then HHS (Data is redacted/de-ided)
- Raw data received may have PHI
- AIMS: Can store data in glacier, but can probably remove redshift data,
- CDC: Don’t see need to retain data--we’ll have 2 copies (CDC and HHS), PLUS AIMS data has PHI, so do not see need to retain AIMS data--can put this in writing
- Does APHL need to retain these records?
  - Dari: Data in redshift is NOT reproduceable (program got better--same input will produce different OUTPUT in redshift--it is about 95% reproduceable)

o Archive and Remove CELR AWS components

o Archive and Remove CELR services deployed to k8s

o Archive and remove Redshift cluster ('onboard' environment only)

· More specific for each active environment:

o Disable and archive mirth destination to AIMS hosted CELR

o Identify all S3 buckets used for CELR

o Archive S3 bucket data (discuss when we might also delete)

o Delete CELR pods from k8s cluster

o Delete route53 entries for CELR k8s services (portal, etc)

o Archive and remove CELR services docker images

o Delete any infrastructure specific to CELR used by the CELR services in k8s

§ We will need to review if any k8s clusters can be downsized or removed as a result of this work

o Delete CELR (named as alice) cloudformation stacks

o Archive and delete ECS/Fargate tasks for S3/Redshift integration

o Archive redshift data (discuss when we might also delete)

o Turn off redshift cluster

o Archive gitlab repositories

o Final review of AIMS environments to validate we have not missed items not listed here.

Can AIMS reprocess 22 ½ million messages from 2022 (and some from 2023; they will be archiving all of the data)?

· I assume this is asking for a retroactive COPY from the CELR output to redshift and a final UNLOAD from Redshift for HHS pickup.

o If this is the case, the S3 bucket CELR outputs to does have old data that could be copied to redshift

§ I would like to avoid doing this work, as it will not be easy to coordinate and takes a long time. Additionally, doing it is likely to cause more work if new questions come up after that copy and how the data changes. The theories about the time ranges they want recopied to Redshift could be wrong and have us doing a lot of back and forth investigating. The team already copied all of the data before redshift out of AIMS, I'd highly suggest that be used moving forward.

§ It takes a very long time to ingest large blocks of historical time ranges of data into redshift, and becomes something we can't catch up with incoming data streams. We'd have to at a minimum stop exporting to HHS for a few days while we copy data into redshift. The best idea I have here is that we could do this only after we've turned off any new data coming into the data lake.

§ We will need very specific dates or date ranges that are desired for a re-copy.

§ An easier idea is just to provide the CELR output bucket data (before Redshift), and let the downstream teams take it from there, but I believe the team already did this.

o If this is not the case, we cannot reprocess messages further upstream due to data retention policies

Action items

Quick decisions not requiring context or tracking

For quick, smaller decisions that do not require extra context or formal tracking, use the “Add a decision…” function here.

Decisions requiring context or tracking

For decisions that require more context (e.g., documentation of discussion, options considered) and/or tracking, use the decision template to capture more information.