Methods

How To Evaluate A Healthcare Program

This article outlines a practical method for evaluating healthcare programs using standards commonly applied in CMS evaluations. The goal is to separate true program effects from background noise.

Home / Evidence & Performance / How To Evaluate A Healthcare Program

1. Establish the logic model

Before running any analysis, define the expected pathway from intervention to outcome. For example: early outreach and medication reconciliation can reduce avoidable acute events, which then lowers inpatient utilization and total spend.

List each operational mechanism and the expected downstream outcome.
Define timing assumptions: how quickly should each effect appear?
Identify outcomes that should change first vs. later.

2. Choose an evaluation design

In most real-world settings, a difference-in-differences design is the strongest practical approach.

Treatment trend: pre/post change among participants.
Comparison trend: pre/post change among similar non-participants during the same period.
Program effect: the difference between those two changes.

This structure helps net out system-wide effects such as policy shifts, coding changes, or pandemic-era shocks.

3. Build a valid comparison group

The comparison group must look like the treatment group before program launch.

Use propensity score matching on age, sex, geography, chronic burden, and baseline utilization.
Check pre-period parallel trends in the core outcomes.
Document balance diagnostics and unmatched exclusions.

4. Measure both outcomes and implementation

Quantitative measures tell you what changed. Qualitative evidence helps explain why.

Spending: PMPM total cost and service-specific costs.
Utilization: admissions, ED visits, SNF days, readmissions.
Quality: potentially avoidable events and evidence-based process measures.
Health outcomes: mortality and days at home.
Implementation: clinician interviews, workflow assessment, and patient feedback.

5. Anticipate real-world data constraints

Unmeasured confounding risk increases as program duration grows.
Small samples can hide meaningful effects when confidence intervals are wide.
Administrative claims are often preferable for treatment/comparison parity and reproducibility.

6. Interpret beyond a yes/no result

Magnitude: Is the effect practically meaningful?
Uncertainty: What does the confidence interval imply for risk-bearing decisions?
Probability framing: What is the chance of at least modest savings?
Subgroups: Which populations drive effect heterogeneity?

Related methods

Previous: Do an evaluation on a napkin | Next: Index date in program evaluation

Small details in evaluation

For more granular data, more recent data, or scientific analysis support, please email us.

Back to Evidence & Performance