Healthcare Data Lake: The Key to Operating a Data Informed Organization

Tom Laughlin, AVP, Premier Client Services, Inovalon

Two decades ago, business priorities within a healthcare organization were largely driven by a select few executive visionaries. Today, the most successful healthcare organizations are using data to validate ideas and further refine them through advanced studies and predictive models.

The data-informed healthcare organization has come of age through recent advances in data technologies, a surge in artificial intelligence and machine learning capabilities, and the availability of high-compute, storage-efficient hardware through the Commercial Cloud (AWS, Azure, Google Cloud). This influx of technology and talent in the market has resulted in a lower barrier to entry for data-informed intelligence. Market competition and large-scale innovation has reduced the learning curve as well as the cost.

The Essential Role of a Strong Data Architecture – and How to Achieve It

Becoming a data-informed healthcare organization starts with having a strong data architecture. Data must be secure, but readily available to those who need it. Data must be cheap to store in extremely large volumes, but systems must be able to search through it in seconds or less. Complex data like JSON or images must be accessible through standard query languages like SQL.

Enter the Healthcare Data Lake – a collection of datasets focused on patient claims history, analytical output from quality measurement and risk adjustment programs, clinical data from Electronic Health Record systems, and social determinants of health data. Removing the obstacles of siloed data sources in varying formats, the data lake creates one comprehensive, consolidated source of data for healthcare organizations to access on-demand in support of a variety of clinical and business use cases.

Common Data Lake Misconceptions

When I first heard the term “Data Lake” and began to investigate, the overarching promise of one all-encompassing data source sounded a bit intimidating; like something that would be very big, messy, and challenging to deal with and gain value from. This is not an uncommon perception – and not totally without merit. However, when implemented properly, a data lake delivers speed, accuracy, and ease of integration with the organization’s current tools and workflows, avoiding these top data lake misconceptions:  

#1 – “A data lake is complex and with this volume of data it would take weeks to update.”

Some data lakes support data refreshes in a few hours. It can take two weeks to populate the same data in a healthcare organization’s own on-prem data warehouse.

#2 – “This massive amount of data will be too hard to work with and understand.”

The most effective data lakes are those that provide access to high levels of structured data – where all sources can be connected through common keys, with data dictionaries that describe the data elements.

#3 – “We’ve already spent years and millions of dollars building our own analytics data warehouse and we don’t want to throw all that work away.”

This is not an either/or proposition. Technologies powering data lakes often use data sharing and replication to push data across regions and even across clouds or into private data centers. Data lakes can be an extension and enrichment of existing data warehouses.

#4 – “If I use a third-party data lake, my team can’t connect all their analytics tools to it.”

Tools such as SageMaker, SAS, or even business applications can securely connect to the data lake. This means healthcare organizations can consider the data lake an extension of their current datasets and encourage direct connectivity when needed.

Leveraging a Healthcare Data Lake for Your Clinical and Business Initiatives

Data lakes are historically made up of raw structured and unstructured data; the more structured the data, the easier it is to understand and consume for a wide range of use cases. Some data lakes also enable the integration of supplemental data sources, meaning healthcare organizations can enrich their data to gain more comprehensive, meaningful insights to drive their clinical and business initiatives.

Let’s explore some data lake use cases for healthcare:

– Leveraging clinical data to identify populations or diagnoses that may be under-reported for risk and quality programs

– Equipping care managers with access to real-time clinical data to proactively prevent avoidable emergency department visits, hospitalizations, etc.

– Integrating meaningful clinical outcomes into provider report cards

– Monitoring opioid prescribing patterns to identify potential patient safety issues and detect potential instances of fraud, waste, and abuse

– Evaluating member care-seeking patterns for use in benefit design, network, and quality initiatives

Use Case Example: Improving Cancer Screening Rates in Older Adults

A health plan wants to understand where to focus its patient outreach campaigns to improve cancer screening rates in older adults, so the data analyst logs into the data lake, grabs non-compliant patients for the relevant cancer screening measures using a basic SQL query, groups it by ZIP code, and views the results in table format. The analyst then creates a heatmap to visually display where the patient-specific measure gaps are concentrated using a visualization tool. The outreach manager can use this report to quickly identify a few locations to focus outreach and inform their staffing model for interventions. As a result, a project that would have previously taken months to do can now be completed within days – delivering speed-to-value for both members and the organization.

Now It the Time to Discover the Value of a Healthcare Data Lake

If your organization uses data to inform clinical and business decisions and you aren’t investing in a cloud-based data lake, now is the perfect time to get started. A healthcare data lake can drive speed-to-value for your organization – enabling you to confidently merge and enrich your complex, disparate data to support analytics, business intelligence, and data exploration initiatives that positively impact the delivery of care and your bottom line.

The data-informed healthcare organization is here.


About Tom Laughlin

Tom Laughlin is an expert in healthcare data management and analytics, with nearly 20 years of experience developing technology solutions that empower organizations to improve the outcomes and economics of healthcare. He currently leads Solution Engineering at Inovalon, where he and his team are focused on tailoring software solutions to meet the unique needs of health plan customers.