GAIHN-HAI - Production Review and Deployment
Association Of Public Health Laboratories (APHL) - Production Review and Deployment
- 1 Association Of Public Health Laboratories (APHL) - Production Review and Deployment
- 1.1 Revision / Review
- 1.2 APHL Informatics Messaging Service (AIMS) Overview
- 1.3 About GAIHN-HAI-PWA
- 1.4 Purpose
- 1.5 1. Hardware and 2. Networking Services
- 1.6 3. Performance
- 1.6.1 System Capacity
- 1.6.2 Performance Monitoring
- 1.6.3 Backup Policy
- 1.6.4 Data Retention
- 1.7 4. Support
- 1.8 5. System Architecture
- 1.8.1 Diagram:
- 1.9 6. Security
- 1.10 7. Testing
- 1.11 8. User Acceptance Testing (UAT)
- 1.12 9. User Guide and Training
- 1.13 10. Help Desk Documentation and Training
- 1.13.1 Helpdesk for application support documentation has been completed and distributed
- 1.13.2 Helpdesk for application support has received training and has demonstrated required competency
- 1.13.3 Helpdesk for host support (AIMS) has been completed and distributed
- 1.13.4 Helpdesk for AIMS support staff has received training and has demonstrated required competency
Revision / Review
| Author | Description | Release Date |
1 | Paul Jankauskas, Marty Sibley | Initial Draft | 07/28/2023 |
APHL Informatics Messaging Service (AIMS) Overview
The APHL Informatics Messaging Services (AIMS) platform is a secure, cloud-based environment that accelerates the implementation of public health messaging solutions by providing shared services to aid in the transport, validation, translation, and routing of electronic data.
AIMS is located on Amazon Web Services (AWS) East (us-east-1) in multiple Availability Zones. AIMS leverages many cloud-native services that inherently provide high availability and scalability.
About GAIHN-HAI-PWA
The GAIHN-HAI-PWA is an information system solution in support of the Global Action in Healthcare Network (GAIHN). GAIHN is a global collaborative network consisting of countries, institutions, and partners at global, regional, national, and subnational levels instituted by the CDC. These constituents work together to address emerging threats in healthcare settings through rapid detection and response. GAIHN addresses these emerging threats through two modules:
Antimicrobial-resistant pathogens (AR)
Healthcare-associated infections (HAI)
GAIHN HAI acts to detect and prevent HAIs within healthcare systems to provide safe healthcare and protect patients, staff, and visitors. Globally, point prevalence surveys (PPS) provide a useful snapshot of HAI in hospitals and other healthcare facilities.
Additional details regarding the GAIHN-HAI program can be found https://www.cdc.gov/infectioncontrol/global/GAIHN-HAI.html
Purpose
This document is to detail the design and implementation of the GAIHN-HAI-PWA architecture, resiliency, security, and support model of the system.
1. Hardware and 2. Networking Services
The following compute services, configuration, and networking services that define the GAIHN-HAI-PWA infrastructure
Hardware - System Components and Configuration
Amazon Web Services Elastic Container Service (ECS) Fargate (Container as a Service)
Names:
Desired Tasks: 4; Minimum Tasks 4; Maximum Tasks 10
Amazon Web Services Network Load Balancer (Fully Managed Service)
Load balancer points to
Relational Database Service (RDS) (Platform as a Service)
Name: aurora-GAIHN-HAI-PWA
Instance Type and Family: db.r5.large
Encryption: Enabled
7-day retention period;
Multiple Availability Zone Deployment
Storage autoscaling - Enabled and inherent based on the service offering
3. Performance
System Capacity
Performance Monitoring
Backup Policy
Data Retention
4. Support
Service Level Agreements / UpTime
Aurora - Amazon Aurora Service Level Agreement - AWS will use commercially reasonable efforts to make each Included Service available for each AWS region with a Monthly Uptime Percentage of at least 99.99%,
Noted Exclusion: The Service Commitment does not apply to any unavailability, suspension or termination of Amazon Aurora, or any other Amazon Aurora performance issues, directly or indirectly : (i) caused by factors outside of our reasonable control, including any force majeure event or Internet access or related problems beyond the demarcation point of Amazon Aurora; (ii) that result from any voluntary actions or inactions from you; (iii) that result from instances belonging to the Micro DB instance class or other instance classes which have similar CPU and memory resource limitations; (iv) that result from you not following the basic operational guidelines described in the Amazon Aurora User Guide (e.g., overloading a database instance to the point it is inoperable, creating an excessively large number of tables that significantly increases the recovery time, etc.); (v) caused by underlying database engine software that leads to repeated database crashes or an inoperable database instance; (vi) that result in long recovery time due to insufficient IO capacity for your database workload; (vii) that result from your equipment, software or other technology; or (viii) arising from our suspension or termination of your right to use Amazon Aurora in accordance with the Agreement (collectively, the "Amazon Aurora SLA Exclusions").
If availability is impacted by factors other than those explicitly used in our Monthly Uptime Percentage or Single-AZ Uptime Percentage calculation, as applicable, then we may issue a Service Credit considering such factors at our discretion.
Elastic Container Service (Fargate) - Amazon Elastic Container Service (Amazon ECS) and AWS Fargate SLA - AWS will use commercially reasonable efforts to make each Included Service available for each AWS region with a Monthly Uptime Percentage of at least 99.99%,
Elastic Load Balancer (ELB) - Amazon Elastic Load Balancing Service Level Agreement - AWS will use commercially reasonable efforts to make each Load Balancer available with a Monthly Uptime Percentage of at least 99.99%,
Noted Exclusions: The Service Commitment does not apply to any unavailability, suspension, or termination of Elastic Load Balancing, or any Load Balancer performance issues: (i) caused by factors outside of our reasonable control, including any force majeure event or Internet access or related problems beyond the demarcation point of Elastic Load Balancing; (ii) that result from any voluntary actions or inactions from you or any third party (e.g. misconfiguring security groups, VPC configurations or credential settings, disabling encryption keys or making the encryption keys inaccessible, etc.); (iii) that result from your equipment, software or other technology and/or third-party equipment, software or other technology (other than third party equipment within our direct control); (iv) that result from you not following the guidelines described in the Elastic Load Balancing User Guide on the AWS Site; or (v) arising from our suspension or termination of your right to use Elastic Load Balancing in accordance with the Agreement (collectively, the “Elastic Load Balancing SLA Exclusions”). If availability is impacted by factors other than those used in our Monthly Uptime Percentage calculation, then we may issue a Service Credit considering such factors at our discretion
5. System Architecture
Diagram:
GAIHN-HAI-PWA Architectural Diagram
6. Security
Common Vulnerabilities and Exposures (CVE)
Amazon ECR image scanning helps in identifying software vulnerabilities in container images. Amazon ECR provides scanning which uses the Common Vulnerabilities and Exposures (CVEs) database from the open-source Clair project. Scanning will be enabled on the private registry where the GAIHN-HAI image resides. APHL will provide an ongoing scan and report findings to application developers on at least a quarterly basis.
Amazon ECR uses the severity for a CVE from the upstream distribution source if available or uses the Common Vulnerability Scoring System (CVSS) score. The CVSS score can be used to obtain the NVD vulnerability severity rating.
Dynamic Application Security Testing (DAST)
A dynamic analysis security testing tool, or a DAST test, will be performed against the GAIHN-HAI project to find certain vulnerabilities in the web applications while it is running. A DAST test will use the same techniques that an attacker would use to find potential weaknesses in an application.
A DAST test can look for a broad range of vulnerabilities which include input/output validation issues that could leave an application vulnerable to cross-site scripting or SQL injection. A DAST test can also help spot configuration mistakes and errors and identify other specific problems with applications. APHL will provide an ongoing scan and report findings to application developers on at least a quarterly basis.
Data Classification
The GAIHN-HAI data set will be classified as sensitive information; however, is not considered to be protected health information (PHI) as the Health Insurance Portability and Accountability Act does not apply to data sets outside of the United States.
Incident Response
In the event of a security incident, APHL will follow the incident response plan. APHL will act in accordance with the steps of the incident response plan which include:
Alert Phase - The alert phase is the process of learning about a potential security incident and reporting it to generate a helpdesk incident ticket. Alerts may arrive from a variety of sources including monitoring of firewalls and intrusion detection systems, anti-virus software, threats received via e-mail, and media reports about new threats. The AIMS Service Desk may also directly generate incident tracking tickets while managing potential incidents.
Triage Phase - The triage phase involves the process of examining the information available about the situation to determine whether or not a security incident has occurred. If an incident has occurred, the nature of the incident is determined, the initial priority level is assigned and the documentation of all actions taken is initiated. This phase will also involve creating an Incident Response Team (IRT) to work on activities relating to incident handling. A decision to “pursue” or “protect” is made during this phase according to the sensitivity of the data and the criticality of the operational system.
Response (Containment and Eradication) Phase - The response phase is the process of limiting the scope and magnitude of an incident in order to keep the incident from getting worse. Consideration is given to factors such as system backup, the risk of continuing operations, and changing passwords or access control lists on compromised systems and data. This phase also involves determining the cause of the incident, improving system defenses, determining system vulnerabilities, and removing the cause of the incident, using the security operations procedure, to eliminate the possibility of recurrence. It may be necessary to activate the Contingency plan. The AIMS Platform Business Steward would make this determination.
Recovery Phase - The system and business process returns to full and normal operations during this phase. Actions include restoring and validating the system, deciding when to restore operations, and monitoring systems to verify normal operations without further system or data compromise.
Follow-up Phase - This phase involves developing an incident report and disseminating it to appropriate entities according to established policies; identifying lessons learned from the incident handling process including the successful and unsuccessful actions taken in response to an incident; and developing recommendations to prevent future incidents and to improve enterprise security implementation.
7. Testing
Application test completion criteria
Authentication
Database connection
Code components and pre-rendering
Prisma database schema
Page and File routing
API routing
Production Integration Testing
8. User Acceptance Testing (UAT)
User acceptance testing (UAT)
User acceptance testing is performed to assess user interface and user experience (UI/UX), functionality, workflow, data capture, and performance of the application. An initial UAT was performed in August 2023 using scenarios to input responses to each data collection field, including Country, Facility, Ward, and all patient forms. Feedback was also received on the usability and experience of the tester. Testing was performed primarily by the CDC GAIHN-HAI team. Results were captured and collated to address any bugs and suggested changes to the user experience, workflow and performance of the application.
Data structure and confirmation of data captured
do we need documentation of the data structure. data export, CSV, and confirmation of the data captured… example in comparison to Red Cap or paper data collection forms.
Change request plan and changes and bug fix log ??? didn’t see change request plan in the smartsheet…
9. User Guide and Training
Training documentation for users and application support has been completed and distributed per training communication plan
Designated staff has received training and has demonstrated required competency
10. Help Desk Documentation and Training
Helpdesk for application support documentation has been completed and distributed
Helpdesk for application support has received training and has demonstrated required competency
Helpdesk for host support (AIMS) has been completed and distributed
Helpdesk for AIMS support staff has received training and has demonstrated required competency