AIMS has reviewed docs, but have not yet requested to be added to the DEX activity.
Pro: resumable technology looks like it solves a the problem of supporting large file uploads to web apis.
Con: is that it would be a client agent within AIMS that would need continual maintenance and also the ability to track success/failure (queue/dead letter) and team prepared to debug if certificates, credentials, etc change.
AWS data sync to azure blob storage -
Background: the integration for azure blob storage is notably a “preview” feature on AWS’s side.
Pro: job would be configured on each side with no code to maintain.
Con: A downside is that this is a job that has to run on some cron frequency - so data flow will be in chunks as opposed to a “real time” stream.
Alternative research: potential other project’s integration into azure using Azure Connectors for SQS and S3 to see if it would apply to this problem. This would match how AIMS already internally handles event driven processing & would inherit what the team already knows how to use.
Geo is working on proto-typing this option and will have an update next time we meet
Need to investigate the effort on the Azure side to make configurations and/or services to use these connectors.
Research in progress, AIMS has not requested access
With a slight modification, the SDS can run as a poller
Have built similar poller services many times, can be done with this for performance testing in a few days if there is interest
Would run on the DEX/Azure side to poll SQS and get data from AIMS S3
Image AddedImage Added
Metadata values required by DEX
meta_destination_id – unique identifier used to indicate the program associated with
the upload.
meta_ext_event – unique identifier used to indicate the event type within the program
that this file belongs to (e.g., routineImmunization).
optional for all:
meta_schema_version
How is optional metadata for a specific meta_destination_id/meta_ext_event/meta_schema_version defined?
Does this need to be defined before sending metadata values or will DEX allow any additional optional metadata values to be sent?
Mtg Notes:
Prototyped download to bucket:
No issues using upload API
Meta Data Questions
Viable path forward
Option 2: AWS Data Sync
Runs on schedule
Cost implications/questions/ blob storage time period
Requires EC2 configuration, without monitoring and auditing services
Within S3--when copy is done, are the same keys within S3 in Azure? No; Metadata is not copied over to blob metadata--not impossible, but would need to create solution
Would need either to …option 1 write custom connector or
Questions on long-term monitoring and cost
Option 3:
Requires EC2 configuration, without monitoring and auditing services
Within S3--when copy is done, are the same keys within S3 in Azure? No; Metadata is not copied over to blob metadata--not impossible, but would need to create solution
Would need either to …option 1 write custom connector or
Questions on long-term monitoring and cost
CDC is pulling data
Option 4:
Almost same as Option 1
Instead of pushing to httpm endpoint, could just have DEX pull from AIMS
Pros/Cons to all options:
Either using upload API, or having DEX pull data from S3 bucket on AIMs is most viable
DEX: What is the thing doing the pull? AIMS: Would provide code to DEX, set of library code
AIMS: This would be similar to initial data pull for EIP+
Could make a new code base, would share code with DEX, used for other integration projects
Decision Point:
Load test/Performance Test: Using meta data
Option 1: Aims will need to decide to where to host, but could start that next week
Option 4: Needs a bit of refinement, but also viable
Option 2,3: Would need additional developer research/support to implement
Uncertain if meta data comes across with Option Data Sync: Scheduling feature, and growth of data would need to be configured/monitored
AIMS/DS: Choice needs to be considered from long-term viability and ability to use for other programs/use cases
GM: Bucket used would prefix project, meta data used would be mapped to data upload api object, so could be scaled
DS: Option 4 is nicer for AIMS, as less code is maintained on AIMS; Polling from CDC; Queue can be monitored, but is available for CDC to poll at any time
Question is who is monitoring queue/polling
DEX: In Option 4, how is Meta Data received? AIMS: Get object command would make meta data available; DEX: Meta Data would be persisted in Azure blob
Leslynn: Are there any programs which APHL would need to pull data back from CDC? Dari: Currently, EIP is the only program which pushes data from CDC, but that would be migrated to CDC Platform intermediate term
Leslynn: Total daily volumes--would this be possible to provide per program?
AIMS: Can provide overview of volume for DLs, and potentially other programs
Action item: Provide counts to Leslynn and Ryan
AIMS: Preference would be Option 4; Option 1 would need to be coordinated long-term with DEX, so higher LOE
DEX: Coordination would not be needed, DEX would
DEX: Could we test Option 1?
DRAFT Data Flow Diagram
Image Added
Questions
Next Steps & Action Items
Next steps:
Action items
Continue to research options, decide on option, AIMS will create data flow diagram Geoffery Miller
Gretl Glick Next Meeting: Schedule: June 1, 2023 at 1pm ET
Quick decisions not requiring context or tracking
For quick, smaller decisions that do not require extra context or formal tracking, use the “Add a decision…” function here.
Decisions requiring context or tracking
For decisions that require more context (e.g., documentation of discussion, options considered) and/or tracking, use the decision template to capture more information.