Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

No-show appointment forecasting

In this use case you will build a model that identifies patients most likely to miss appointments, with correlating reasons. This data can then be used by staff to target outreach on those patients and additionally to understand, and perhaps address, associated issues.

Click here to jump directly to the notebook. Otherwise, the following several paragraphs describe the business justification and problem framing for this use case.


Canceling a doctor's appointment need not be particularly problematic—with appropriate notice to the office. Without notice (patients known as "no-shows") on the other hand, cost outpatient health centers a staggering 14% of anticipated daily revenue. Missed appointments result in lower utilization rates for doctors and nurses and also contribute unnecessarily to the overhead costs required to run outpatient centers. In addition, when a patient misses an appointment, they risk poorer health outcomes due to lack of timely care.

Many outpatient centers employ solutions such as phoning patients in advance, but this high-touch investment is often not prioritized for the highest risk no-shows patients. Low-touch solutions, such as automated texts, are effective tools for mass reminders but do not offer necessary personalization for a patient at the highest risk of missing an appointment.

Key takeaways:

  • Strategy/challenge: Identify clients likely to miss appointments ("no-shows") and take action to prevent that from happening.

  • Business driver: Grow Revenue, increase customer LTV, increase customer satisfaction.

  • Model solution: Rank-order patients and build an outreach call list. Using the list can minimize revenue loss by increasing attendance (and also, thereby, improving patient outcome) and identifying potential overbook opportunities to prevent downtime.

Using this notebook

Topic Description
Use Case Type Health Care/No-show forecasting
Skill set Business Analyst
Desired Outcomes
  • Prevent no-shows
  • Optimally target responses
  • Reduce costs from missed appointments/add revenue from booking more appointments
Metrics / KPIs
  • Current no-show rate is roughly 5% of all appointments.
  • Cost of missed visit on average is $150 per appointment.
Sample Datasets This use cases uses the following datasets:

Solution value

The purpose of this use case is to build a model that enables practice management staff to predict in advance which patients are likely to miss appointments. Using historical data to uncover patterns related to no-shows, in addition to identifying those more likely to no-show, the model's visualizations help understand the top reasons why. These predictions, and their explanations, help staff understand how various factors, such as a patient’s distance from a clinic and the days they needed to wait for their appointments, influence the risk of no-show. Based on these predictions and insights, outpatient staff members can focus outreach on patients with the highest risk of missing and subsequently offer alternatives—rescheduling appointments or providing transportation.

The primary issues, and corresonding opportunities, that this use case addresses include:

Patient outcome Ensuring attendance plays a critical role in patient health, since patients may suffer if they do not get required care.
Revenue loss A degree of certainty about an open booking slot allows for preemptive filling by:
  • Standard over-booking.
  • Contacting an alternative patient (using a "propensity for" model).
Staffing inefficiency Correct staffing levels improve both patient and employee satisfaction.

Work with data

The primary dataset for this use case represents patient visits. Supplemental datasets allow aggregating features for more targeted responses.


The dataset granularity is one row per visit. For best results, data should cover two years of historical appointments to provide a comprehensive sample of data, accounting for seasonality and other important factors like, most recently, the impacts of COVID. Sample by patient ID, not appointment ID, so that all appointments for a particular patient (within the time window) are represented.


To apply this use case, your dataset should contain, minimally, the following features:

  • Patient ID
  • Binary classification target that represents attendance (show/no-show, 0/1, True/False, etc.)
  • Date/time of the scheduled appointment
  • Date the appointment was made
  • Number of days between scheduling and appointment

Other helpful features to include are:

  • Distance between the patient and the clinic they are visiting
  • Historical no-show history for the patient
  • Reason for visit
  • Scheduled clinic
  • Scheduled doctor
  • Patient age
  • Patient gender
  • Other patient descriptors (hypertension, diabetes, alcholism, etc...)


See the notebook here.

Updated September 14, 2022
Back to top