XGBoost Explainable AI End-to-End Interactive

Predict Customer Churn
Before It Happens

by Noah Gallagher · Data Scientist

An end-to-end machine learning system that identifies at-risk telecom customers with 93% recall, enabling data-driven retention strategies projected to save $367K annually.

0%
Recall Rate
Catches 93 out of 100 at-risk customers
$0K
Annual Savings
Net benefit from targeted retention
0%
Campaign ROI
Return on retention investment
0
Customers Analyzed
IBM Telco Customer Dataset

The Churn Problem

Telecom companies lose $1,500 per churned customer in lifetime value. With a 26.5% churn rate, that translates to nearly $2.8M in annual losses for a typical customer base.

  • Month-to-month customers churn at 42%
  • New customers (<6 months) churn at 55%
  • Electronic check users churn at 45%
  • No early warning system in place

My ML Solution

Built an end-to-end ML pipeline using XGBoost with SHAP explainability that identifies at-risk customers and recommends targeted interventions.

  • 93% recall - catches nearly all churners
  • SHAP explanations for every prediction
  • 431% ROI on retention campaigns
  • Actionable business recommendations

Model Performance

Comprehensive evaluation with confidence intervals, baseline comparisons, and statistical validation

Accuracy 62.5% CI: [60.0%, 64.9%]
Precision 40.9% CI: [37.6%, 44.3%]
Recall 93.1% CI: [90.3%, 95.6%]
F1 Score 56.8% CI: [53.5%, 60.2%]
ROC-AUC 0.838 CI: [0.818, 0.860]
Why optimize for recall? In churn prediction, missing an at-risk customer (false negative) is far more costly than a false alarm. A missed churner costs $1,500 in lost CLV, while a false positive only costs $100 for an unnecessary retention offer. This 15:1 cost ratio drives our recall-first strategy.

Confusion Matrix

ROC & PR Curves

Model Comparison

XGBoost vs baselines and alternative algorithms

Statistical Validation (Paired t-tests, 5-fold CV)

Comparison Mean Diff t-statistic p-value Cohen's d Result
XGBoost vs Logistic Regression 0.345 63.57 < 0.001 44.95 Significant
XGBoost vs Random Forest 0.002 2.90 0.044 0.20 Significant
XGBoost vs LightGBM 0.002 1.62 0.181 0.19 Not Significant

ROI Analysis

Total Campaigns 851
Campaign Cost $85,100
Customers Saved 226
Revenue Saved $452,400
Net Benefit $367,300
ROI 431.6%
Assumes: CLV = $2,000 | Campaign cost = $100/customer | 65% success rate

ROI Sensitivity

Bootstrap Confidence Intervals (1000 iterations)

Feature Importance & SHAP Analysis

Understanding what drives churn predictions using SHAP (SHapley Additive exPlanations)

Global Feature Importance (SHAP Values)

Contract Type

Two-year contracts reduce churn to 3% vs 42% for month-to-month. This is the single strongest predictor.

Internet Service

Fiber optic users churn more, possibly due to higher expectations and pricing. No internet = lowest churn.

Customer Tenure

New customers (<12 months) churn at 48%. After 4+ years, churn drops to 8%. Early engagement is critical.

Payment Method

Electronic check users churn at 45%. Automatic payments correlate with lower churn (15%).

SHAP Summary Plot

Each dot represents a customer. Color indicates feature value (red = high, blue = low). Position shows impact on churn prediction.

SHAP Summary Plot

What-If Churn Simulator

Adjust customer attributes and see how churn risk changes in real-time

Customer Profile

072 months
$18$120
HIGH RISK

Top Contributing Factors

Recommended Action
Offer a discounted annual contract with bundled online security to reduce churn risk by an estimated 35%.

Segment Analysis

Model performance and churn patterns across customer segments

Churn Rate by Segment

Model F1 Score by Segment

Segment Performance Detail

Segment N Churn Rate Precision Recall F1 FPR

Technical Approach

Data & Features

  • IBM Telco dataset (7,043 customers)
  • 21 raw features → 36 engineered
  • Tenure bins, risk scores, ratios
  • SMOTE oversampling for class balance

Model & Training

  • XGBoost (best recall among 4 models)
  • RandomizedSearchCV (20 iterations)
  • 5-fold stratified cross-validation
  • Optimized for recall (cost-sensitive)

Explainability

  • SHAP TreeExplainer for XGBoost
  • Global + per-prediction explanations
  • Feature dependence analysis
  • Business-friendly insight generation

Engineering

  • Modular Python pipeline (src/)
  • Comprehensive logging & error handling
  • Type hints + Google-style docstrings
  • Static site + Netlify Functions
Python XGBoost scikit-learn SHAP Pandas Plotly SMOTE HTML/CSS/JS Netlify Functions Gemini API
Model v1.0 | Trained on IBM Telco dataset | XGBoost with 36 engineered features | Last updated Nov 2025

About This Project

This project demonstrates an end-to-end machine learning workflow: from data exploration and feature engineering through model training, evaluation, and deployment as an interactive portfolio piece.

The goal was not just to build a model, but to tell a clear business story: who is churning, why, and what actionable steps reduce risk — backed by rigorous statistical validation and transparent model explanations.

Key Outcomes

93% of at-risk customers correctly identified
$367K estimated annual savings from targeted retention
432% ROI on retention campaigns
4 models compared with statistical validation

Note: Business impact metrics are based on standard industry CLV assumptions applied to the IBM Telco sample dataset.

ChurnSense AI Assistant
Hi! I'm the ChurnSense AI assistant. I can answer questions about this customer churn prediction project — the model, data, SHAP explainability, business impact, and more. Pick a topic below or ask anything!