Credit Risk Assessment with Machine Learning

To demonstrate how SHAP values aid in interpreting machine learning predictions, we examine credit risk assessment using a synthetic dataset. Our simulated credit applications include standard features like income, age, employment status, and SCHUFA scores1. We train a random forest model to predict loan repayment success and use SHAP values to explain its predictions.

Data Generation

We first import some standard libraries.

import pandas as pd
import numpy as np

Next, we define our data generating function for simulating multiple socioeconomic variables and other plausible determinants of creditworthiness and the outcome, see Table 1. The credit approval decision (ok) depends on a scoring system that awards up to 40 points for credit score (scr), 20 points for income, and 30 points for employment duration (emp_d), while deducting up to 16 points for debt ratio (debt ) and 25 points for existing loans (loans). Additional modifiers include +10 points for civil servants, -10 for students, and +5 for homeowners (emp). Approval requires a total score above 50 points.

Click for details about the simulated features
Feature Type Description Notes
age Numeric Age in years Mean: 42, Range: 18–75
inc Numeric Monthly income Mean: €3,500, Range: €1k–10k
emp_d Numeric Employment duration (years) Mean: 8, Range: 0–30
cred Numeric Credit amount requested Mean: €25k, Range: €5k–100k
term Categorical Loan term (months) Values: {12, 24, 36, … , 96}
debt Numeric Debt-to-income ratio Mean: 0.3, Range: 0–0.8
loans Numeric Number of existing loans Mean: ~1.5, Max: 5
scr Numeric Credit score (SCHUFA) Mean: 95, Range: 50–100
emp Categorical Employment type Includes civil servants, etc.
home Categorical Housing situation Rent, own, shared, with parents
ok Binary Credit approval decision Based on a weighted scoring

Table 1: Features in simulated credit approval dataset

The function has call signature create_modern_credit_data(n_samples, seed). Extend the below code chunk for the function definition.

Code
def create_modern_credit_data(n_samples=1000, seed=1337):
    np.random.seed(seed)

    # numeric features
    data = pd.DataFrame({
        "age": np.random.normal(42, 12, n_samples).clip(18, 75),
        "inc": np.random.normal(3500, 1500, n_samples).clip(1000, 10000),
        "emp_d": np.random.normal(8, 5, n_samples).clip(0, 30),
        "cred": np.random.normal(25000, 15000, n_samples).clip(5000, 100000),
        "term": np.random.choice([12, 24, 36, 48, 60, 72, 84, 96], n_samples),
        "debt": np.random.normal(0.3, 0.15, n_samples).clip(0, 0.8),
        "yrs": np.random.normal(7, 4, n_samples).clip(0, 30),
        "loans": np.random.poisson(1.5, n_samples).clip(0, 5),
        #SCHUFA
        "scr": np.random.normal(95, 15, n_samples).clip(50, 100)
    })

    # Categorical features
    data["emp"] = np.random.choice(
        ["employed", "self_employed", 
        "civil_servant", "retiree", "student"],
        n_samples,
        p=[0.6, 0.1, 0.15, 0.1, 0.05],
    )

    data["home"] = np.random.choice(
        ["rent", "own", "shared", "with_parents"], 
        n_samples, p=[0.5, 0.3, 0.1, 0.1]
    )

    # Introduce some correlations
    data["scr"] += (data["inc"] - data["inc"].mean()) * 0.004
    data["scr"] -= data["loans"] * 2.5
    data["debt"] += data["loans"] * 0.1

    # creditworthiness
    def calculate_credit_worthiness(row):
        score = 0
        score += (row["scr"] - 50) / 50 * 40  # max 40 points
        score += (row["inc"] / 5000) * 20  # max 20 points
        score -= (row["debt"] * 100) * 0.2  # max -16 points
        score += (row["emp_d"] / 10) * 10  # max 30 points
        score -= row["loans"] * 5  # max -25 points

        if row["emp"] == "civil_servant":
            score += 10
        elif row["emp"] == "student":
            score -= 10

        if row["home"] == "own":
            score += 5

        return score > 50

    data["ok"] = data.apply(calculate_credit_worthiness, axis=1)

    return data

Next, we generate the data and examine relationships between key numerical credit features through a correlation matrix visualization, see Figure 1.

# Generate the data
data = create_modern_credit_data()

numerical_features = [
    "age",
    "inc",
    "scr",
    "debt",
    "loans",
    "cred",
]

import matplotlib.pyplot as plt
import seaborn as sns

# Plot correlation matrix
plt.figure(figsize=(5, 4.5), dpi = 200)
sns.heatmap(data[numerical_features].corr().round(2), annot=True, cmap="coolwarm", center=0)

plt.title("Correlation Matrix of Credit Features")
plt.tight_layout()
plt.show()

Figure 1: Correlation Matrix of Credit Features

The correlation matrix in Figure 1 shows how the DGP in create_modern_credit_data() generates some plausible dependence between the covariates, e.g., positive correlations between income (inc) and SCHUFA score (scr) as well as debt amount (loan) and debt-to-income ratio (debt).

The share of creditworthy individuals is roughly 27%.

# share of creditworthy individuals
print(data["ok"].mean())
0.268

Model Training and Evaluation

As preprocessing for the modelling the data, we applying one-hot encoding to categorical features and then split the dataset into training and testing sets. Using the scikit-learn package, we train a Random Forest classifier to predict the target variable ok.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Define variables for model fitting
categorical_features = ["emp", "home"]
X = pd.get_dummies(
    data.drop("ok", axis=1), 
    columns=categorical_features
)
y = data["ok"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, 
    random_state=42,
)

# Train the RF
model = RandomForestClassifier(
    n_estimators=500, 
    random_state=1337
)
fit = model.fit(X_train, y_train)

To get some metrics on classification performance, we evaluate the model on the training set and summarise using classification_report().

# predictions on test set 
y_pred = model.predict(X_test)

# classification metrics
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

       False       0.96      0.98      0.97       160
        True       0.92      0.82      0.87        40

    accuracy                           0.95       200
   macro avg       0.94      0.90      0.92       200
weighted avg       0.95      0.95      0.95       200

Having high accuracy and good precision and recall, the trained random forest classifier performs well on the test set (for the default 50% probability threshold for classification as positive).

Understanding Predictions with SHAP Values

To better understand how our model’s predictions, relate to the covariate values, we’ll use SHAP values. SHAP values provide a unified measure of feature importance that shows how much each feature contributes to pushing a prediction higher or lower. For binary classification problems like ours, SHAP values are calculated for class membership probabilities. We now obtain SHAP values for the positive class (creditworthy).

import shap
# Create a SHAP TreeExplainer using the trained model
explainer = shap.TreeExplainer(model)
# Compute SHAP values for the test dataset X_test
shap_values = explainer.shap_values(X_test)

# select the positive class SHAP values
shap_values = shap_values[:, :, 1]

To visualize the feature importance for creditworthiness based on the SHAP values, we generate a beeswarm plot, see Figure 2.

# SHAP summary plot (beeswarm)
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_test, show=False)
plt.title("SHAP Summary: Feature Importance for Creditworthiness")
plt.tight_layout()
plt.show()

Figure 2: SHAP values: Beeswarm Plot

The vertical line at x=0 represents the average predicted probability for the test data. The SHAP values show each feature’s impact across observations: positive values (right) increase predictions, negative values (left) decrease them.

Note how the distributions of SHAP values reveal the key drivers of creditworthiness predictions by our model: Credit score (scr) emerges as the dominant predictor, followed by annual income (inc), debt-to-income ratio (debt), and outstanding loans (loans)—higher credit scores and incomes consistently boost creditworthiness predictions, as shown by their positive SHAP values and purple-red coloring. Conversely, renter status (home_rent) and civil service employment (emp_civil_servant) typically carry negative SHAP values, suggesting reduced creditworthiness (blue-ish coloring).

Analyzing Individual Credit Decisions

When a loan applicant gets rejected by our model, they deserve to understand why. SHAP values help that purpose. Let’s look at person #18 in our test set as an example. Her predicted probability of creditworthiness is 44%, and so we would reject the application with a decision rule based on a 50% threshold.2 Using SHAP values, we can break down exactly how each covariate contributed to this rejection, starting from the base prediction rate of the random forest on the training data (28.5%) and showing how each feature pushes the predicted probability up or down. The waterfall plot in Figure 3 visualizes this decomposition.

Figure 3 shows the 10 features with the largest impact on the prediction for test observation 18. The excellent credit score provides the strongest positive contribution, increasing the creditworthiness probability by 11%. However, the below-average monthly income negatively impacts the prediction by 8%. Despite positive factors like stable employment, only one outstanding loan, and a manageable debt-to-income ratio, these aren’t enough to overcome the effect of the low income.

# test observation data
observation = X_test.iloc[[18]]

# SHAP values for this specific observation
shap_values = explainer.shap_values(observation)

# SHAP values of the positive class
shap_values_positive = shap_values[0, :, 1]
fig = plt.figure()

# Create the waterfall plot
shap.plots.waterfall(
    shap.Explanation(
        values=shap_values_positive,
        base_values=explainer.expected_value[1],
        data=observation.values[0],
        feature_names=observation.columns,
    ), 
    show = False
)

plt.gcf().set_size_inches(8, 4.5)
plt.show()

Figure 3: SHAP-based decomposition of model prediction for test observation 18

Figure 3 shows the 10 features with the largest impact on the prediction for test observation 18. Their excellent credit score provides the strongest positive contribution, increasing the creditworthiness probability by 11%. However, the below-average monthly income negatively impacts the prediction by 8%. Despite positive factors like stable employment, few outstanding loans, and a manageable debt-to-income ratio, these aren’t enough to overcome the income disadvantage.

Let’s create a function to output individual credit decisions along with a SHAP value analysis for a single observation…

def analyze_single_case(obs):
    # Predict default probability
    predicted_prob = model.predict_proba(obs)[0, 0]

    shap_values = explainer.shap_values(obs)
    
    shap_values_positive = shap_values[0, :, 0]

    shap_df = pd.DataFrame({
        "value": obs.iloc[0],
        "shap value": shap_values_positive
    }).sort_values(by="shap value", ascending=False)

    print(
    f"""Decision: {'reject' if predicted_prob >= 0.5 else 'accept'}\n"""
    )    
    print(f"Predicted Default Probability: {predicted_prob:.4f}")
    print("\nVariable importance for default probability prediction:")
    print(shap_df)

… and run it on the test observation:

analyze_single_case(observation)
Decision: reject

Predicted Default Probability: 0.5600

Variable importance for default probability prediction:
                        value  shap value
inc                3223.34577    0.077640
emp_civil_servant       False    0.024718
cred                   5000.0    0.019227
age                 29.709429    0.004882
emp_employed             True    0.004744
term                       60    0.001934
emp_retiree             False    0.001312
home_with_parents       False    0.000925
emp_self_employed       False   -0.001133
yrs                       0.0   -0.001504
emp_student             False   -0.002198
home_shared             False   -0.003228
home_rent               False   -0.015577
debt                 0.361093   -0.029929
loans                       1   -0.031195
home_own                 True   -0.043133
emp_d                9.237825   -0.048764
scr                 95.956185   -0.113474

Conclusion

Our analysis demonstrates how SHAP values transform “black box” machine learning models into interpretable decision-making tools for credit assessment. By decomposing individual predictions into feature-level contributions, SHAP values enable lenders to explain credit decisions clearly. This is especially valuable when regulatory requirements demand models to be not only accurate but interpretable.


  1. SCHUFA (Schutzgemeinschaft für allgemeine Kreditsicherung) is Germany’s main credit reporting agency that provides credit scores for individuals ↩︎

  2. Since this is a test observation, we may assess that the model is actually wrong here: y_test.iloc[[18]] is True↩︎