| |
Overall Status Updates | DEX Development, Status Updates, and Timeline: DEX & AIMS team to discuss provisioning S3 Bucket DEX: Handling ingestion and validation services, receiving Immunization data from IZ Gateway (prod data)--sending via API Ask: FROM aims--API HL7 and CSV INGESTION, HL7 V2 Pipeline in progress--Sept 23, testing; testing with HL7--consuming lab and case data, should be ready for Q4 AIMS: What about existing data (historic data)? ER: Redshift data for CELR? OR Kafka? Ingress S3 bucket for CELR, S3 bucket for parsed data for Redshift DS: S3 Buckets parsed (both CSV and HL7)--how are we sending existing data to CDC? 1 HL7 message at time? Connecting to existing S3? ER: Need to discuss and determine historic data migration plans; existing HL7 raw data wants to be ingested into DEX ; will need to discuss transmission of data into DEX; egress data which has been validated, translated/transformed--this data we do not want to come into dex--Need to determine if that is migrated to EDAV; different S3 buckets will be migrated into different systems DS: Future data: will go into DEX (correct); historic data (4.5 billion data for CELR)--will need determine different migration plan (this is probably out of scope for DEX) LM: Need raw data for DEX, would need to be able to test with data, how AIMS will onboard with data api DS: Test data: could try to send through pipeline, but very few test cases/low volume--few/limited test data set to send (CELR production data was very quickly spun up) If CDC testing data is secure, could try to send prod data to endpoint LM: Prod data can be sent to staging environment (ATO in place in CDC)
GM/AIMS: No comments yet LM/CDC: What does test file look like? Could potentially phase testing if small volume--would be good to have both CSV and HL7 files to be able to test AIMS: CSV processing has improved over the past 3 years--do you want to re-process older files? How clean should this data be? Consideration for data migration plans/processes ER: Not certain about file validation which is being built in CSV pipeline, lessons learned over the past few years--impacting performance and data quality; processing has been refined; Erroll had previously provided validation requirements to CDC for review CDC: Ongoing discussion as to how to handle, still trying to determine enterprise level CSV Validation services versus programmatic validation (content/data quality validation)--need to determine/tease out before we decide on re-processing of data Peraton: Balance b/w csv structure validation and data quality/content validation was important for CELR, case by case;
AIMS: Testing with small data streams; will need to test size of batch HL7 files--how large can data api accept? Performance/speed considerations (CA File sizes are VERY LARGE)--would want to test prior to prod cutover CDC: Data APIs: No hard limit on data apis, resumable as well; unless files are larger than 100gb, would not expect to run into size limit; soft guidance--anything under 10GB should be ok; 10-100 GB--should work, but would monitor; More than 100gb--would want to consider other transport mechanisms; terabyte size… AIMS: File size limit has been challenge in past, so would want to test; some of the challenges would be around firewalls, timeouts CDC: In migration plan, is there a preferred transit mechanism b/w AIMS & DEX? Going forward, does AIMS have preference for sending HL7 to DEX? ER: Not yet determined, data currently sitting in S3 buckets, would need to discuss best avenue to transport data AIMS: CDC: Just discussing this as a first use case for CV19 ELR data, but need to consider other use cases --and if we need to consider other solutions/other data streams if a data api would be sufficient? AIMS: Preference would be S3--secure, CDC: From AIMS…to CDC? AIMS-- does CDC want us to use data api? or are we considering other transit mechanisms? makes sense to concentrate on 1 technical solution for all data streams CDC: Uncertain as to number of potential data streams coming from AIMS to CDC; since APHL is a huge source of data streams, does it make more sense to set up bucket system (would usually not do this for lower volume systems); CDC cannot maintain bucket to bucket connections for ALL Small volume data senders--data api was identified as a potential service for lower volume; open to exploring other options for large volume senders (such as APHL) Options: Upload Data API (advantage is that it is integrated with SAMS for identification/verification system; metadata minimum enforced) Option: S3 Bucket to bucket Option: VERY LARGE VOLUME
AIMS: Preference would be S3 bucket API, would need to review documentation
CDC: How many transport streams go into APHL? aphl: quite a few--PHINMS, Mirth, S3 EIP+, ELIMS Long-term--would like to fully retire PHINMS
CDC: Would you be willing to try upload API? AIMS: Sure, we can try--any api, would need to ensure that we re-try submission if they fail; an advantage of S3 is that this is already working; so would need to identify how to do so with other API services; can review documentation if available DS: Let’s pause on a final decision--would prefer S3 to S3 connection if this an option --known commodity; ER: Currently doing this for EIP+ MC: Would be S3 to Azure connection CDC: How is meta data being communicated when S3 connection? iS THIS Endurable? Do you have a schema? We would be referencing S3 to Azure AIMS: S3 object meta data; not sure if this durable (would carry from S3 to Azure blob store?) AIMS: may vary across projects, but believe there are common S3 schema/metadata
CDC: CDC: Every transmission has destination ID, and event type (required); can require additional meta data based on programs Need way to ensure meta data is transferred from S3 to Azure blob AIMS: Azure blob store has S3 schema ER: Only doing this in test environment, not certain if we receive meta data; if api is used, may not get jurisdiction meta data, which we would need--will need to consider these details
AIMS: Need to do further research on preferred ER: Has test messages available--can share with DEX Team (previously Next steps: Dari/aims to research S3 to Azure mechanism, determine options as to what is possible Erroll has previously shared test files with marion Gretl to schedule meeting for same timeslot, 2 weeks--5/11
|