Enhance healthcare companies via affected person 360: A zero-ETL method to allow close to real-time knowledge analytics

March 27, 2024

55

Healthcare suppliers have a possibility to enhance the affected person expertise by gathering and analyzing broader and extra various datasets. This contains affected person medical historical past, allergy symptoms, immunizations, household illness historical past, and people’ life-style knowledge reminiscent of exercise habits. Gaining access to these datasets and forming a 360-degree view of sufferers permits healthcare suppliers reminiscent of declare analysts to see a broader context about every affected person and personalize the care they supply for each particular person. That is underpinned by constructing a whole affected person profile that permits declare analysts to determine patterns, tendencies, potential gaps in care, and adherence to care plans. They’ll then use the results of their evaluation to know a affected person’s well being standing, therapy historical past, and previous or upcoming physician consultations to make extra knowledgeable choices, streamline the declare administration course of, and enhance operational outcomes. Reaching this will even enhance basic public well being via higher and extra well timed interventions, determine well being dangers via predictive analytics, and speed up the analysis and growth course of.

AWS has invested in a zero-ETL (extract, remodel, and cargo) future in order that builders can focus extra on creating worth from knowledge, as an alternative of getting to spend time making ready knowledge for evaluation. The answer proposed on this put up follows a zero-ETL method to knowledge integration to facilitate close to real-time analytics and ship a extra customized affected person expertise. The answer makes use of AWS companies reminiscent of AWS HealthLake, Amazon Redshift, Amazon Kinesis Knowledge Streams, and AWS Lake Formation to construct a 360 view of sufferers. These companies allow you to gather and analyze knowledge in close to actual time and put a complete knowledge governance framework in place that makes use of granular entry management to safe delicate knowledge from unauthorized customers.

Zero-ETL refers to a set of options on the AWS Cloud that allow integrating completely different knowledge sources with Amazon Redshift:

Resolution overview

Organizations within the healthcare trade are at the moment spending a big quantity of money and time on constructing complicated ETL pipelines for knowledge motion and integration. This implies knowledge might be replicated throughout a number of knowledge shops through bespoke and in some instances hand-written ETL jobs, leading to knowledge inconsistency, latency, and potential safety and privateness breaches.

With help for querying cross-account Apache Iceberg tables through Amazon Redshift, now you can construct a extra complete patient-360 evaluation by querying all affected person knowledge from one place. This implies you may seamlessly mix data reminiscent of scientific knowledge saved in HealthLake with knowledge saved in operational databases reminiscent of a affected person relationship administration system, along with knowledge produced from wearable gadgets in close to real-time. Gaining access to all this knowledge permits healthcare organizations to type a holistic view of sufferers, enhance care coordination throughout a number of organizations, and supply extremely customized look after every particular person.

The next diagram depicts the high-level answer we construct to attain these outcomes.

Deploy the answer

You need to use the next AWS CloudFormation template to deploy the answer parts:

This stack creates the next assets and vital permissions to combine the companies:

AWS Resolution setup

AWS HealthLake

AWS HealthLake permits organizations within the well being trade to securely retailer, remodel, transact, and analyze well being knowledge. It shops knowledge in HL7 FHIR format, which is an interoperability normal designed for fast and environment friendly alternate of well being knowledge. Whenever you create a HealthLake knowledge retailer, a Quick Healthcare Interoperability Sources (FHIR) knowledge repository is made out there through a RESTful API endpoint. Concurrently and as a part of AWS HealthLake managed service, the nested JSON FHIR knowledge undergoes an ETL course of and is saved in Apache Iceberg open desk format in Amazon S3.

To create an AWS HealthLake knowledge retailer, check with Getting began with AWS HealthLake. Ensure to pick the choice Preload pattern knowledge when creating your knowledge retailer.

In real-world situations and whenever you use AWS HealthLake in manufacturing environments, you don’t must load pattern knowledge into your AWS HealthLake knowledge retailer. As an alternative, you should use FHIR REST API operations to handle and search assets in your AWS HealthLake knowledge retailer.

We use two tables from the pattern knowledge saved in HealthLake: affected person and allergyintolerance.

Question AWS HealthLake tables with Redshift Serverless

Amazon Redshift is the information warehousing service out there on the AWS Cloud that gives as much as six occasions higher price-performance than another cloud knowledge warehouses out there, with a completely managed, AI-powered, massively parallel processing (MPP) knowledge warehouse constructed for efficiency, scale, and availability. With steady improvements added to Amazon Redshift, it’s now greater than only a knowledge warehouse. It permits organizations of various sizes and in several industries to entry all the information they’ve of their AWS environments and analyze it from one single location with a set of options underneath the zero-ETL umbrella. Amazon Redshift integrates with AWS HealthLake and knowledge lakes via Redshift Spectrum and Amazon S3 auto-copy options, enabling you to question knowledge immediately from recordsdata on Amazon S3.

Question AWS HealthLake knowledge with Amazon Redshift

Amazon Redshift makes it easy to question the information saved in S3-based knowledge lakes with automated mounting of an AWS Glue Knowledge Catalog within the Redshift question editor v2. This implies you now not must create an exterior schema in Amazon Redshift to make use of the information lake tables cataloged within the Knowledge Catalog. To get began with this characteristic, see Querying the AWS Glue Knowledge Catalog. After it’s arrange and also you’re linked to the Redshift question editor v2, full the next steps:

Validate that your tables are seen within the question editor V2. The Knowledge Catalog objects are listed underneath the awsdatacatalog database.

FHIR knowledge saved in AWS HealthLake is very nested. To find out about the right way to un-nest semi-structured knowledge with Amazon Redshift, see Tutorial: Querying nested knowledge with Amazon Redshift Spectrum.

Use the next question to un-nest the allergyintolerance and affected person tables, be part of them collectively, and get affected person particulars and their allergy symptoms:

WITH patient_allergy AS 
(
    SELECT
        resourcetype, 
        c AS allery_category,
        a."affected person"."reference",
        SUBSTRING(a."affected person"."reference", 9, LEN(a."affected person"."reference")) AS patient_id,
        a.recordeddate AS allergy_record_date,
        NVL(cd."code", 'NA') AS allergy_code,
        NVL(cd.show, 'NA') AS allergy_description

    FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."allergyintolerance" a
            LEFT JOIN a.class c ON TRUE
            LEFT JOIN a.response r ON TRUE
            LEFT JOIN r.manifestation m ON TRUE
            LEFT JOIN m.coding cd ON TRUE
), patinet_info AS
(
    SELECT id,
            gender,
            g as given_name,
            n.household as family_name,
            pr as prefix

    FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."affected person" p
            LEFT JOIN p.title n ON TRUE
            LEFT JOIN n.given g ON TRUE
            LEFT JOIN n.prefix pr ON TRUE
)
SELECT DISTINCT p.id, 
        p.gender, 
        p.prefix,
        p.given_name,
        p.family_name,
        pa.allery_category,
        pa.allergy_code,
        pa.allergy_description
from patient_allergy pa
    JOIN patinet_info p
        ON pa.patient_id = p.id
ORDER BY p.id, pa.allergy_code
;

To eradicate the necessity for Amazon Redshift to un-nest knowledge each time a question is run, you may create a materialized view to carry un-nested and flattened knowledge. Materialized views are an efficient mechanism to take care of complicated and repeating queries. They comprise a precomputed consequence set, based mostly on a SQL question over a number of base tables. You’ll be able to situation SELECT statements to question a materialized view, in the identical approach you can question different tables or views within the database.

Use the next SQL to create a materialized view. You employ it later to construct a whole view of sufferers:

CREATE MATERIALIZED VIEW patient_allergy_info AUTO REFRESH YES AS
WITH patient_allergy AS 
(
    SELECT
        resourcetype, 
        c AS allery_category,
        a."affected person"."reference",
        SUBSTRING(a."affected person"."reference", 9, LEN(a."affected person"."reference")) AS patient_id,
        a.recordeddate AS allergy_record_date,
        NVL(cd."code", 'NA') AS allergy_code,
        NVL(cd.show, 'NA') AS allergy_description

    FROM
        "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."allergyintolerance" a
            LEFT JOIN a.class c ON TRUE
            LEFT JOIN a.response r ON TRUE
            LEFT JOIN r.manifestation m ON TRUE
            LEFT JOIN m.coding cd ON TRUE
), patinet_info AS
(
    SELECT id,
            gender,
            g as given_name,
            n.household as family_name,
            pr as prefix

    FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."affected person" p
            LEFT JOIN p.title n ON TRUE
            LEFT JOIN n.given g ON TRUE
            LEFT JOIN n.prefix pr ON TRUE
)
SELECT DISTINCT p.id, 
        p.gender, 
        p.prefix,
        p.given_name,
        p.family_name,
        pa.allery_category,
        pa.allergy_code,
        pa.allergy_description
from patient_allergy pa
    JOIN patinet_info p
        ON pa.patient_id = p.id
ORDER BY p.id, pa.allergy_code
;

You may have confirmed you may question knowledge in AWS HealthLake through Amazon Redshift. Subsequent, you arrange zero-ETL integration between Amazon Redshift and Amazon Aurora MySQL.

Arrange zero-ETL integration between Amazon Aurora MySQL and Redshift Serverless

Purposes reminiscent of front-desk software program, that are used to schedule appointments and register new sufferers, retailer knowledge in OLTP databases reminiscent of Aurora. To get knowledge out of OLTP databases and have them prepared for analytics use instances, knowledge groups may need to spend a substantial period of time to construct, check, and deploy ETL jobs which can be complicated to keep up and scale.

With the Amazon Redshift zero-ETL integration with Amazon Aurora MySQL, you may run analytics on the information saved in OLTP databases and mix them with the remainder of the information in Amazon Redshift and AWS HealthLake in close to actual time. Within the subsequent steps on this part, we connect with a MySQL database and arrange zero-ETL integration with Amazon Redshift.

Connect with an Aurora MySQL database and arrange knowledge

Connect with your Aurora MySQL database utilizing your editor of alternative utilizing AdminUsername and AdminPassword that you simply entered when working the CloudFormation stack. (For simplicity, it’s the similar for Amazon Redshift and Aurora.)

Whenever you’re linked to your database, full the next steps:

Create a brand new database by working the next command:
```
CREATE DATABASE front_desk_app_db;
```
Create a brand new desk. This desk simulates storing affected person data as they go to clinics and different healthcare facilities. For simplicity and to exhibit particular capabilities, we assume that affected person IDs are the identical in AWS HealthLake and the front-of-office software. In real-world situations, this is usually a hashed model of a nationwide well being care quantity:
```
CREATE TABLE patient_appointment ( 
      patient_id varchar(250), 
      gender varchar(1), 
      date_of_birth date, 
      appointment_datetime datetime, 
      phone_number varchar(15), 
      PRIMARY KEY (patient_id, appointment_datetime) 
);
```

Having a major key within the desk is necessary for zero-ETL integration to work.

Insert new data into the supply desk within the Aurora MySQL database. To exhibit the required functionalities, make sure that the patient_id of the pattern data inserted into the MySQL database match those in AWS HealthLake. Change [patient_id_1] and [patient_id_2] within the following question with those from the Redshift question you ran beforehand (the question that joined allergyintolerance and affected person):

INSERT INTO front_desk_app_db.patient_appointment (patient_id, gender, date_of_birth, appointment_datetime, phone_number)

VALUES([PATIENT_ID_1], 'F', '1988-7-04', '2023-12-19 10:15:00', '0401401401'),
([PATIENT_ID_1], 'F', '1988-7-04', '2023-09-19 11:00:00', '0401401401'),
([PATIENT_ID_1], 'F', '1988-7-04', '2023-06-06 14:30:00', '0401401401'),
([PATIENT_ID_2], 'F', '1972-11-14', '2023-12-19 08:15:00', '0401401402'),
([PATIENT_ID_2], 'F', '1972-11-14', '2023-01-09 12:15:00', '0401401402');

Now that your supply desk is populated with pattern data, you may arrange zero-ETL and have knowledge ingested into Amazon Redshift.

Arrange zero-ETL integration between Amazon Aurora MySQL and Amazon Redshift

Full the next steps to create your zero-ETL integration:

On the Amazon RDS console, select Databases within the navigation pane.
Select the DB identifier of your cluster (not the occasion).
On the Zero-ETL Integration tab, select Create zero-ETL integration.
Comply with the steps to create your integration.

Create a Redshift database from the mixing

Subsequent, you create a goal database from the mixing. You are able to do this by working a few easy SQL instructions on Amazon Redshift. Log in to the question editor V2 and run the next instructions:

Get the mixing ID of the zero-ETL you arrange between your supply database and Amazon Redshift:
```
SELECT * FROM svv_integration;
```

Create a database utilizing the mixing ID:

CREATE DATABASE ztl_demo FROM INTEGRATION '[INTEGRATION_ID ';

Query the database and validate that a new table is created and populated with data from your source MySQL database:
```
SELECT * FROM ztl_demo.front_desk_app_db.patient_appointment;
```

It might take a few seconds for the first set of records to appear in Amazon Redshift.

This shows that the integration is working as expected. To validate it further, you can insert a new record in your Aurora MySQL database, and it will be available in Amazon Redshift for querying in near real time within a few seconds.

Set up streaming ingestion for Amazon Redshift

Another aspect of zero-ETL on AWS, for real-time and streaming data, is realized through Amazon Redshift Streaming Ingestion. It provides low-latency, high-speed ingestion of streaming data from Kinesis Data Streams and Amazon MSK. It lowers the effort required to have data ready for analytics workloads, lowers the cost of running such workloads on the cloud, and decreases the operational burden of maintaining the solution.

In the context of healthcare, understanding an individual’s exercise and movement patterns can help with overall health assessment and better treatment planning. In this section, you send simulated data from wearable devices to Kinesis Data Streams and integrate it with the rest of the data you already have access to from your Redshift Serverless data warehouse.

For step-by-step instructions, refer to Real-time analytics with Amazon Redshift streaming ingestion. Note the following steps when you set up streaming ingestion for Amazon Redshift:

Select wearables_stream and use the following template when sending data to Amazon Kinesis Data Streams via Kinesis Data Generator, to simulate data generated by wearable devices. Replace [PATIENT_ID_1] and [PATIENT_ID_2] with the affected person IDs you earlier when inserting new data into your Aurora MySQL desk:
```
{
   "patient_id": "{{random.arrayElement(["[PATIENT_ID_1]"," [PATIENT_ID_2]"])}}",
   "steps_increment": "{{random.arrayElement(
      [0,1]
   )}}",
   "heart_rate": {{random.quantity( 
      {
         "min":45,
         "max":120}
   )}}
}
```
Create an exterior schema referred to as from_kds by working the next question and changing [IAM_ROLE_ARN] with the ARN of the position created by the CloudFormation stack (Patient360BlogRole):
```
CREATE EXTERNAL SCHEMA from_kds
FROM KINESIS
IAM_ROLE '[IAM_ROLE_ARN]';
```

Use the next SQL when making a materialized view to eat knowledge from the stream:

CREATE MATERIALIZED VIEW patient_wearable_data AUTO REFRESH YES AS 
SELECT approximate_arrival_timestamp, 
      JSON_PARSE(kinesis_data) as Knowledge FROM from_kds."wearables_stream" 
WHERE CAN_JSON_PARSE(kinesis_data);

To validate that streaming ingestion works as anticipated, refresh the materialized view to get the information you already despatched to the information stream and question the desk to ensure knowledge has landed in Amazon Redshift:
```
REFRESH MATERIALIZED VIEW patient_wearable_data;

SELECT *
FROM patient_wearable_data
ORDER BY approximate_arrival_timestamp DESC;
```

Question and analyze affected person wearable knowledge

The leads to the information column of the previous question are in JSON format. Amazon Redshift makes it easy to work with semi-structured knowledge in JSON format. It makes use of PartiQL language to supply SQL-compatible entry to relational, semi-structured, and nested knowledge. Use the next question to flatten knowledge:

SELECT knowledge."patient_id"::varchar AS patient_id,       
      knowledge."steps_increment"::integer as steps_increment,       
      knowledge."heart_rate"::integer as heart_rate, 
      approximate_arrival_timestamp 
FROM patient_wearable_data 
ORDER BY approximate_arrival_timestamp DESC;

The consequence appears like the next screenshot.

Now that you understand how to flatten JSON knowledge, you may analyze it additional. Use the next question to get the variety of minutes a affected person has been bodily lively per day, based mostly on their coronary heart fee (larger than 80):

WITH patient_wearble_flattened AS
(
   SELECT knowledge."patient_id"::varchar AS patient_id,
      knowledge."steps_increment"::integer as steps_increment,
      knowledge."heart_rate"::integer as heart_rate,
      approximate_arrival_timestamp,
      DATE(approximate_arrival_timestamp) AS date_received,
      extract(hour from approximate_arrival_timestamp) AS    hour_received,
      extract(minute from approximate_arrival_timestamp) AS minute_received
   FROM patient_wearable_data
), patient_active_minutes AS
(
   SELECT patient_id,
      date_received,
      hour_received,
      minute_received,
      avg(heart_rate) AS heart_rate
   FROM patient_wearble_flattened
   GROUP BY patient_id,
      date_received,
      hour_received,
      minute_received
   HAVING avg(heart_rate) > 80
)
SELECT patient_id,
      date_received,
      COUNT(heart_rate) AS active_minutes_count
FROM patient_active_minutes
GROUP BY patient_id,
      date_received
ORDER BY patient_id,
      date_received;

Create a whole affected person 360

Now that you’ll be able to question all affected person knowledge with Redshift Serverless, you may mix the three datasets you used on this put up and type a complete affected person 360 view with the next question:

WITH patient_appointment_info AS
(
      SELECT "patient_id",
         "gender",
         "date_of_birth",
         "appointment_datetime",
         "phone_number"
      FROM ztl_demo.front_desk_app_db.patient_appointment
),
patient_wearble_flattened AS
(
      SELECT knowledge."patient_id"::varchar AS patient_id,
         knowledge."steps_increment"::integer as steps_increment,
         knowledge."heart_rate"::integer as heart_rate,
         approximate_arrival_timestamp,
         DATE(approximate_arrival_timestamp) AS date_received,
         extract(hour from approximate_arrival_timestamp) AS hour_received,
         extract(minute from approximate_arrival_timestamp) AS minute_received
      FROM patient_wearable_data
), patient_active_minutes AS
(
      SELECT patient_id,
         date_received,
         hour_received,
         minute_received,
         avg(heart_rate) AS heart_rate
      FROM patient_wearble_flattened
      GROUP BY patient_id,
         date_received,
         hour_received,
         minute_received
         HAVING avg(heart_rate) > 80
), patient_active_minutes_count AS
(
      SELECT patient_id,
         date_received,
         COUNT(heart_rate) AS active_minutes_count
      FROM patient_active_minutes
      GROUP BY patient_id,
         date_received
)
SELECT pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth,
      ppi.appointment_datetime,
      ppi.phone_number,
      pamc.date_received,
      pamc.active_minutes_count
FROM patient_allergy_info pai
      LEFT JOIN patient_active_minutes_count pamc
            ON pai.patient_id = pamc.patient_id
      LEFT JOIN patient_appointment_info ppi
            ON pai.patient_id = ppi.patient_id
GROUP BY pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth,
      ppi.appointment_datetime,
      ppi.phone_number,
      pamc.date_received,
      pamc.active_minutes_count
ORDER BY pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth DESC,
      ppi.appointment_datetime DESC,
      ppi.phone_number DESC,
      pamc.date_received,
      pamc.active_minutes_count

You need to use the answer and queries used right here to develop the datasets utilized in your evaluation. For instance, you may embrace different tables from AWS HealthLake as wanted.

Clear up

To wash up assets you created, full the next steps:

Delete the zero-ETL integration between Amazon RDS and Amazon Redshift.
Delete the CloudFormation stack.
Delete AWS HealthLake knowledge retailer

Conclusion

Forming a complete 360 view of sufferers by integrating knowledge from numerous completely different sources gives quite a few advantages for organizations working within the healthcare trade. It permits healthcare suppliers to realize a holistic understanding of a affected person’s medical journey, enhances scientific decision-making, and permits for extra correct analysis and tailor-made therapy plans. With zero-ETL options for knowledge integration on AWS, it’s easy to construct a view of sufferers securely, cost-effectively, and with minimal effort.

You’ll be able to then use visualization instruments reminiscent of Amazon QuickSight to construct dashboards or use Amazon Redshift ML to allow knowledge analysts and database builders to coach machine studying (ML) fashions with the information built-in via Amazon Redshift zero-ETL. The result’s a set of ML fashions which can be educated with a broader view into sufferers, their medical historical past, and their life-style, and subsequently allow you make extra correct predictions about their upcoming well being wants.

In regards to the Authors

Saeed Barghi is a Sr. Analytics Specialist Options Architect specializing in architecting enterprise knowledge platforms. He has intensive expertise within the fields of knowledge warehousing, knowledge engineering, knowledge lakes, and AI/ML. Based mostly in Melbourne, Australia, Saeed works with public sector prospects in Australia and New Zealand.

Satesh Sonti is a Sr. Analytics Specialist Options Architect based mostly out of Atlanta, specialised in constructing enterprise knowledge platforms, knowledge warehousing, and analytics options. He has over 17 years of expertise in constructing knowledge belongings and main complicated knowledge platform applications for banking and insurance coverage shoppers throughout the globe.