Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Spring 2024 - Applied Statistics II for STEM Professions - Predicting Heart Disease
Project Type
Predicting Heart Disease Using Statistical and Machine Learning Models
Date
Spring 2024
❤️ Predicting Heart Disease Using Statistical and Machine Learning Models
Course: Applied Statistics II for STEM Professions (MAT-303)
Tools Used: R, Logistic Regression, Random Forest, ROC Curves, Confusion Matrices
Project Overview
In this project, I explored multiple statistical models to analyze a heart disease dataset containing 303 patient records and 14 health-related variables (e.g., age, cholesterol, chest pain type, blood pressure, max heart rate). The goal was to identify patterns, risk factors, and relationships among these indicators to predict the likelihood of heart disease and provide insights that could support early diagnosis and prevention efforts.
The models I built and compared included:
Two logistic regression models using different predictor combinations
A random forest classification model to assess heart disease presence
A random forest regression model to predict maximum heart rate achieved
Objectives & Methodology
Clean and prepare the data for analysis
Test multiple predictors for significance in logistic regression models
Evaluate model performance using confusion matrices, ROC/AUC values, and significance testing
Use random forest models to explore non-linear relationships and improve prediction accuracy
Key Insights:
The second logistic regression model, which included chest pain type and max heart rate, was more statistically significant than the first and achieved an AUC of 0.8389, indicating a strong fit.
The random forest classification model outperformed both logistic regressions, effectively capturing complex patterns with greater generalization, especially when tested with new data.
The random forest regression model was used to predict maximum heart rate — a critical indicator in assessing cardiac strain — with optimal tree tuning for accuracy.
What I Learned:
How to perform logistic regression and interpret coefficients, p-values, and odds ratios
The value of model evaluation tools like Hosmer-Lemeshow tests, Wald’s test, and ROC curves
How to build and fine-tune random forest models for classification and regression tasks
The importance of comparing multiple models to determine the most reliable for decision-making
Reflection:
This project marked my transition into more advanced statistical thinking. It deepened my appreciation for the role of data in health risk prediction and real-world decision-making. Most importantly, it taught me how to evaluate not just the accuracy of a model, but its relevance, strength, and practical utility in solving complex problems — skills I carry into every data project I take on.





















