Credit Risk Assessment with Machine Learning
To demonstrate how SHAP values aid in interpreting machine learning predictions, we examine credit risk assessment using a synthetic dataset. Our simulated credit applications include standard features like income, age, employment status, and SCHUFA scores1. We train a random forest model to predict loan repayment success and use SHAP values to explain its predictions.
Data Generation
We first import some standard libraries.
import pandas as pd
import numpy as np
Next, we define our data generating function for simulating multiple socioeconomic variables and other plausible determinants of creditworthiness and the outcome, see Table 1. The credit approval decision (ok
) depends on a scoring system that awards up to 40 points for credit score (scr
), 20 points for income, and 30 points for employment duration (emp_d
), while deducting up to 16 points for debt ratio (debt
) and 25 points for existing loans (loans
). Additional modifiers include +10 points for civil servants, -10 for students, and +5 for homeowners (emp
). Approval requires a total score above 50 points.
Click for details about the simulated features
Feature | Type | Description | Notes |
---|---|---|---|
age |
Numeric | Age in years | Mean: 42, Range: 18–75 |
inc |
Numeric | Monthly income | Mean: €3,500, Range: €1k–10k |
emp_d |
Numeric | Employment duration (years) | Mean: 8, Range: 0–30 |
cred |
Numeric | Credit amount requested | Mean: €25k, Range: €5k–100k |
term |
Categorical | Loan term (months) | Values: {12, 24, 36, … , 96} |
debt |
Numeric | Debt-to-income ratio | Mean: 0.3, Range: 0–0.8 |
loans |
Numeric | Number of existing loans | Mean: ~1.5, Max: 5 |
scr |
Numeric | Credit score (SCHUFA) | Mean: 95, Range: 50–100 |
emp |
Categorical | Employment type | Includes civil servants, etc. |
home |
Categorical | Housing situation | Rent, own, shared, with parents |
ok |
Binary | Credit approval decision | Based on a weighted scoring |
Table 1: Features in simulated credit approval dataset
The function has call signature create_modern_credit_data(n_samples, seed)
. Extend the below code chunk for the function definition.
Code
def create_modern_credit_data(n_samples=1000, seed=1337):
np.random.seed(seed)
# numeric features
data = pd.DataFrame({
"age": np.random.normal(42, 12, n_samples).clip(18, 75),
"inc": np.random.normal(3500, 1500, n_samples).clip(1000, 10000),
"emp_d": np.random.normal(8, 5, n_samples).clip(0, 30),
"cred": np.random.normal(25000, 15000, n_samples).clip(5000, 100000),
"term": np.random.choice([12, 24, 36, 48, 60, 72, 84, 96], n_samples),
"debt": np.random.normal(0.3, 0.15, n_samples).clip(0, 0.8),
"yrs": np.random.normal(7, 4, n_samples).clip(0, 30),
"loans": np.random.poisson(1.5, n_samples).clip(0, 5),
#SCHUFA
"scr": np.random.normal(95, 15, n_samples).clip(50, 100)
})
# Categorical features
data["emp"] = np.random.choice(
["employed", "self_employed",
"civil_servant", "retiree", "student"],
n_samples,
p=[0.6, 0.1, 0.15, 0.1, 0.05],
)
data["home"] = np.random.choice(
["rent", "own", "shared", "with_parents"],
n_samples, p=[0.5, 0.3, 0.1, 0.1]
)
# Introduce some correlations
data["scr"] += (data["inc"] - data["inc"].mean()) * 0.004
data["scr"] -= data["loans"] * 2.5
data["debt"] += data["loans"] * 0.1
# creditworthiness
def calculate_credit_worthiness(row):
score = 0
score += (row["scr"] - 50) / 50 * 40 # max 40 points
score += (row["inc"] / 5000) * 20 # max 20 points
score -= (row["debt"] * 100) * 0.2 # max -16 points
score += (row["emp_d"] / 10) * 10 # max 30 points
score -= row["loans"] * 5 # max -25 points
if row["emp"] == "civil_servant":
score += 10
elif row["emp"] == "student":
score -= 10
if row["home"] == "own":
score += 5
return score > 50
data["ok"] = data.apply(calculate_credit_worthiness, axis=1)
return data
Next, we generate the data and examine relationships between key numerical credit features through a correlation matrix visualization, see Figure 1.
# Generate the data
data = create_modern_credit_data()
numerical_features = [
"age",
"inc",
"scr",
"debt",
"loans",
"cred",
]
import matplotlib.pyplot as plt
import seaborn as sns
# Plot correlation matrix
plt.figure(figsize=(5, 4.5), dpi = 200)
sns.heatmap(data[numerical_features].corr().round(2), annot=True, cmap="coolwarm", center=0)
plt.title("Correlation Matrix of Credit Features")
plt.tight_layout()
plt.show()
The correlation matrix in Figure 1 shows how the DGP in create_modern_credit_data()
generates some plausible dependence between the covariates, e.g., positive correlations between income (inc
) and SCHUFA score (scr
) as well as debt amount (loan
) and debt-to-income ratio (debt
).
The share of creditworthy individuals is roughly 27%.
# share of creditworthy individuals
print(data["ok"].mean())
0.268
Model Training and Evaluation
As preprocessing for the modelling the data, we applying one-hot encoding to categorical features and then split the dataset into training and testing sets. Using the scikit-learn package, we train a Random Forest classifier to predict the target variable ok
.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
# Define variables for model fitting
categorical_features = ["emp", "home"]
X = pd.get_dummies(
data.drop("ok", axis=1),
columns=categorical_features
)
y = data["ok"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2,
random_state=42,
)
# Train the RF
model = RandomForestClassifier(
n_estimators=500,
random_state=1337
)
fit = model.fit(X_train, y_train)
To get some metrics on classification performance, we evaluate the model on the training set and summarise using classification_report()
.
# predictions on test set
y_pred = model.predict(X_test)
# classification metrics
print(classification_report(y_test, y_pred))
precision recall f1-score support
False 0.96 0.98 0.97 160
True 0.92 0.82 0.87 40
accuracy 0.95 200
macro avg 0.94 0.90 0.92 200
weighted avg 0.95 0.95 0.95 200
Having high accuracy and good precision and recall, the trained random forest classifier performs well on the test set (for the default 50% probability threshold for classification as positive).
Understanding Predictions with SHAP Values
To better understand how our model’s predictions, relate to the covariate values, we’ll use SHAP values. SHAP values provide a unified measure of feature importance that shows how much each feature contributes to pushing a prediction higher or lower. For binary classification problems like ours, SHAP values are calculated for class membership probabilities. We now obtain SHAP values for the positive class (creditworthy).
import shap
# Create a SHAP TreeExplainer using the trained model
explainer = shap.TreeExplainer(model)
# Compute SHAP values for the test dataset X_test
shap_values = explainer.shap_values(X_test)
# select the positive class SHAP values
shap_values = shap_values[:, :, 1]
To visualize the feature importance for creditworthiness based on the SHAP values, we generate a beeswarm plot, see Figure 2.
# SHAP summary plot (beeswarm)
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_test, show=False)
plt.title("SHAP Summary: Feature Importance for Creditworthiness")
plt.tight_layout()
plt.show()
The vertical line at x=0 represents the average predicted probability for the test data. The SHAP values show each feature’s impact across observations: positive values (right) increase predictions, negative values (left) decrease them.
Note how the distributions of SHAP values reveal the key drivers of creditworthiness predictions by our model: Credit score (scr
) emerges as the dominant predictor, followed by annual income (inc
), debt-to-income ratio (debt
), and outstanding loans (loans
)—higher credit scores and incomes consistently boost creditworthiness predictions, as shown by their positive SHAP values and purple-red coloring. Conversely, renter status (home_rent
) and civil service employment (emp_civil_servant
) typically carry negative SHAP values, suggesting reduced creditworthiness (blue-ish coloring).
Analyzing Individual Credit Decisions
When a loan applicant gets rejected by our model, they deserve to understand why. SHAP values help that purpose. Let’s look at person #18 in our test set as an example. Her predicted probability of creditworthiness is 44%, and so we would reject the application with a decision rule based on a 50% threshold.2 Using SHAP values, we can break down exactly how each covariate contributed to this rejection, starting from the base prediction rate of the random forest on the training data (28.5%) and showing how each feature pushes the predicted probability up or down. The waterfall plot in Figure 3 visualizes this decomposition.
Figure 3 shows the 10 features with the largest impact on the prediction for test observation 18. The excellent credit score provides the strongest positive contribution, increasing the creditworthiness probability by 11%. However, the below-average monthly income negatively impacts the prediction by 8%. Despite positive factors like stable employment, only one outstanding loan, and a manageable debt-to-income ratio, these aren’t enough to overcome the effect of the low income.
# test observation data
observation = X_test.iloc[[18]]
# SHAP values for this specific observation
shap_values = explainer.shap_values(observation)
# SHAP values of the positive class
shap_values_positive = shap_values[0, :, 1]
fig = plt.figure()
# Create the waterfall plot
shap.plots.waterfall(
shap.Explanation(
values=shap_values_positive,
base_values=explainer.expected_value[1],
data=observation.values[0],
feature_names=observation.columns,
),
show = False
)
plt.gcf().set_size_inches(8, 4.5)
plt.show()
Figure 3 shows the 10 features with the largest impact on the prediction for test observation 18. Their excellent credit score provides the strongest positive contribution, increasing the creditworthiness probability by 11%. However, the below-average monthly income negatively impacts the prediction by 8%. Despite positive factors like stable employment, few outstanding loans, and a manageable debt-to-income ratio, these aren’t enough to overcome the income disadvantage.
Let’s create a function to output individual credit decisions along with a SHAP value analysis for a single observation…
def analyze_single_case(obs):
# Predict default probability
predicted_prob = model.predict_proba(obs)[0, 0]
shap_values = explainer.shap_values(obs)
shap_values_positive = shap_values[0, :, 0]
shap_df = pd.DataFrame({
"value": obs.iloc[0],
"shap value": shap_values_positive
}).sort_values(by="shap value", ascending=False)
print(
f"""Decision: {'reject' if predicted_prob >= 0.5 else 'accept'}\n"""
)
print(f"Predicted Default Probability: {predicted_prob:.4f}")
print("\nVariable importance for default probability prediction:")
print(shap_df)
… and run it on the test observation
:
analyze_single_case(observation)
Decision: reject
Predicted Default Probability: 0.5600
Variable importance for default probability prediction:
value shap value
inc 3223.34577 0.077640
emp_civil_servant False 0.024718
cred 5000.0 0.019227
age 29.709429 0.004882
emp_employed True 0.004744
term 60 0.001934
emp_retiree False 0.001312
home_with_parents False 0.000925
emp_self_employed False -0.001133
yrs 0.0 -0.001504
emp_student False -0.002198
home_shared False -0.003228
home_rent False -0.015577
debt 0.361093 -0.029929
loans 1 -0.031195
home_own True -0.043133
emp_d 9.237825 -0.048764
scr 95.956185 -0.113474
Conclusion
Our analysis demonstrates how SHAP values transform “black box” machine learning models into interpretable decision-making tools for credit assessment. By decomposing individual predictions into feature-level contributions, SHAP values enable lenders to explain credit decisions clearly. This is especially valuable when regulatory requirements demand models to be not only accurate but interpretable.