Predicting COVID-19 Hospitalization Risk Using Machine Learning

Client Profile

Public health and healthcare stakeholders responsible for managing COVID-19 response, hospital capacity, and emerging disease preparedness in a large metropolitan region.

The Challenge

During the COVID-19 pandemic, healthcare systems faced significant strain due to unpredictable hospitalization demand and prolonged patient stays.

Key challenges included:

  • Limited ability to predict which patients would require extended hospitalization
  • Strain on hospital capacity and resource allocation
  • Need for early identification of high-risk patients
  • Lack of scalable, data-driven tools to support decision-making

The need was to develop a predictive framework to identify high-risk patients and support proactive healthcare planning.

Key Requirements

  • Analyze hospitalization patterns using real-world data
  • Identify clinical and demographic risk factors for prolonged hospital stays
  • Compare multiple machine learning models for predictive accuracy
  • Deliver a scalable and interpretable model for public health use
  • Support decision-making for hospital capacity and preparedness

The Solution

  • 1. Cohort Design and Data Structuring
    Utilized Harris County COVID-19 hospitalization data to define two cohorts:

    • Short stay (<14 days)
    • Prolonged stay (≥14 days)

    2. Multivariate Risk Factor Analysis
    Conducted statistical analysis to identify predictors associated with extended hospitalization duration.

    3. Machine Learning Model Development
    Developed and evaluated three supervised learning models:

    • Logistic Regression
    • K-Nearest Neighbor (KNN)
    • Random Forest

    4. Model Evaluation and Selection
    Assessed model performance using key metrics including sensitivity, specificity, and Area Under the ROC Curve (AUC).

    5. Interpretation and Application
    Selected the most effective model and translated findings into actionable insights for healthcare planning and intervention strategies.

The Impact

  • Identified key predictors of prolonged hospitalization, including advanced age (66+), obesity, respiratory symptoms, and male sex
  • Demonstrated strong associations with ICU admission, intubation, and mortality
  • Achieved 84% AUC, indicating strong predictive performance
  • Enabled early identification of high-risk patients for targeted intervention
  • Supported data-driven hospital capacity planning and resource allocation
  • Established a scalable framework for predicting healthcare burden during emerging disease outbreaks

Key Insight

Machine learning models can effectively identify patients at risk of prolonged hospitalization, allowing healthcare systems to prioritize care, allocate resources efficiently, and improve preparedness for future outbreaks.

Outcome

This work was presented at the American Public Health Association (APHA) 2023 Annual Meeting and Expo, demonstrating the application of machine learning in real-world public health settings.