Insurance Pricing Forecast Using XGBoost Regressor
This project builds an XGBoost Regressor to predict healthcare costs, ensuring insurance profitability. We'll compare it with a linear regression baseline and learn to communicate results effectively to non-technical stakeholders.
Save $10
Limited Time Offer
$15 USD
$5.00 USD

Project Outcomes
- Performed Exploratory Data Analysis (EDA) to understand the distribution and relationships within the data.
- Handled missing values, encoded categorical variables, and scaled numerical features.
- Found significant correlations between healthcare costs and features like age, BMI, smoking status, and others.
- Implemented and evaluated a Linear Regression model.
- Validated assumptions of Linear Regression, including linearity, independence, homoscedasticity, and normality of residuals.
- Achieved a baseline RMSE, serving as a benchmark for comparison.
- Developed and optimized an XGBoost Regressor model using BayesSearchCV for hyperparameter tuning.
- Built a pipeline using Sklearn's Pipeline operator to streamline preprocessing and model training.
- Significantly improved RMSE compared to the Linear Regression model, demonstrating the effectiveness of XGBoost in handling complex interactions and non-linear relationships.
- Evaluated both models using RMSE and other regression metrics (MAE, R^2).
- The XGBoost model showed superior performance with a lower RMSE and higher R^2, indicating better fit and predictive accuracy.
- Translated technical metrics into understandable insights for non-technical stakeholders.
- Provided visualizations to illustrate model performance and feature importance, aiding stakeholders in understanding key drivers of healthcare costs.