Insurance Pricing Forecast Using XGBoost Regressor

This project builds an XGBoost Regressor to predict healthcare costs, ensuring insurance profitability. We'll compare it with a linear regression baseline and learn to communicate results effectively to non-technical stakeholders.

Save $10
Limited Time Offer

$15 USD

$5.00 USD

Thumbnail

Project Outcomes

  • Performed Exploratory Data Analysis (EDA) to understand the distribution and relationships within the data.
  • Handled missing values, encoded categorical variables, and scaled numerical features.
  • Found significant correlations between healthcare costs and features like age, BMI, smoking status, and others.
  • Implemented and evaluated a Linear Regression model.
  • Validated assumptions of Linear Regression, including linearity, independence, homoscedasticity, and normality of residuals.
  • Achieved a baseline RMSE, serving as a benchmark for comparison.
  • Developed and optimized an XGBoost Regressor model using BayesSearchCV for hyperparameter tuning.
  • Built a pipeline using Sklearn's Pipeline operator to streamline preprocessing and model training.
  • Significantly improved RMSE compared to the Linear Regression model, demonstrating the effectiveness of XGBoost in handling complex interactions and non-linear relationships.
  • Evaluated both models using RMSE and other regression metrics (MAE, R^2).
  • The XGBoost model showed superior performance with a lower RMSE and higher R^2, indicating better fit and predictive accuracy.
  • Translated technical metrics into understandable insights for non-technical stakeholders.
  • Provided visualizations to illustrate model performance and feature importance, aiding stakeholders in understanding key drivers of healthcare costs.

You might also like

Finding more about `Machine Learning`?