Credit Risk Modeling Project

Overview

This project implements a credit risk modeling solution using logistic regression to predict loan default risk. It utilizes customer, loan, and credit bureau data to build a predictive model, incorporating data preprocessing, feature engineering, model training, and evaluation.

Project Structure

Notebook: credit_risk_model_codebasics.ipynb - Main Jupyter Notebook containing the code for data loading, preprocessing, modeling, and evaluation.
Dataset:
- dataset/customers.csv: Contains customer demographic information (e.g., age, gender, income).
- dataset/loans.csv: Contains loan details (e.g., loan amount, tenure, default status).
- dataset/bureau_data.csv: Contains credit bureau data (e.g., open accounts, credit utilization).
Artifacts:
- artifacts/model_data.joblib: Saved model file containing the trained logistic regression model, feature names, scaler, and columns to scale.

Prerequisites

To run this project, ensure you have the following Python libraries installed:

pandas
numpy
matplotlib
seaborn
scikit-learn
joblib

You can install them using pip:

pip install pandas numpy matplotlib seaborn scikit-learn joblib

Data Description

The dataset consists of three CSV files:

customers.csv: Includes customer details such as:
- cust_id: Unique customer identifier
- age, gender, marital_status, employment_status, income, etc.
loans.csv: Includes loan details such as:
- loan_id, cust_id, loan_purpose, loan_type, sanction_amount, default, etc.
bureau_data.csv: Includes credit bureau data such as:
- cust_id, number_of_open_accounts, credit_utilization_ratio, delinquent_months, etc.

Each dataset contains 50,000 records, which are merged on cust_id for analysis.

Methodology

Data Loading and Merging:
- Load the three datasets using pandas.
- Merge customers.csv and loans.csv on cust_id, then merge with bureau_data.csv to create a unified dataset.
Feature Engineering:
- Create new features like loan_to_income and avg_dpd_per_delinquency.
- Encode categorical variables (e.g., residence_type, loan_purpose, loan_type) using one-hot encoding.
- Scale numerical features using a scaler (saved in model_data.joblib).
Model Training:
- Use logistic regression to predict the default column (True/False).
- Split data into training and testing sets using train_test_split.
- Train the model and evaluate feature importance based on model coefficients.
Model Saving:
- Save the trained model, feature names, scaler, and columns to scale in artifacts/model_data.joblib.

Usage

Clone the repository or download the project files.
Ensure the dataset files are in the dataset/ directory.
Open and run the credit_risk_model_codebasics.ipynb notebook in a Jupyter environment.
The notebook will:
- Load and preprocess the data.
- Train the logistic regression model.
- Display feature importance using a bar plot.
- Save the model to artifacts/model_data.joblib.

Results

The logistic regression model is trained to predict loan defaults.
Feature importance is visualized to show which features (e.g., credit_utilization_ratio, loan_to_income) most influence the prediction.
The model and preprocessing components are saved for future use.

Future Improvements

Experiment with other algorithms (e.g., Random Forest, XGBoost) for better performance.
Perform hyperparameter tuning to optimize the logistic regression model.
Add cross-validation to ensure robust model evaluation.
Include additional feature engineering to capture more complex patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
artifacts		artifacts
dataset		dataset
.gitignore		.gitignore
README.md		README.md
credit_risk_model_codebasics.ipynb		credit_risk_model_codebasics.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Modeling Project

Overview

Project Structure

Prerequisites

Data Description

Methodology

Usage

Results

Future Improvements

About

Uh oh!

Releases

Packages

Languages

RealSahilp7676/Credit-Risk-Model-ML

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Modeling Project

Overview

Project Structure

Prerequisites

Data Description

Methodology

Usage

Results

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages