Join us for a seminar presented by Ed Stokes from PwC Australia.
• 5.45–6.15pm, Refreshments in the staff room of the Peter Hall building (https://maps.unimelb.edu.au/parkville/building/160)
• 6.15–7.15pm, Talk, Evan Williams Theatre (https://studentvip.com.au/unimelb/parkville/maps/63455), Peter Hall building
• 7.30pm, Dinner at University Cafe (http://www.universitycafe.com.au/)
Credit scoring: should greater predictability come at the cost of model interpretation?
Credit scoring is the science of estimating the future likelihood of an account, customer or application entering hard arrears (i.e. typically, but not always, failing to pay money owed more than 90 days after the due date).
Credit scoring has been a well-defined and mature process for the best part of two decades, where generalized linear models ‘GLM’ fitted to a binomial outcome have been the go to technique. The building of a GLM, or more specifically, the data preparation involved, has been a lengthy process, where data cleansing, feature engineering and stratification of the portfolio can take the best part of 6 months. Following the data preparation, the model build, implementation and sign off of the model can take another 6 months; taking approximately 12 months to build and implement models for a given portfolio.
Due to the elapsed time, the data that the model is trained on may not be relevant to the current performance of the portfolio. This presents a pertinent problem for a credit provider, as they are not using the most recent data to make informed credit decisions on the future quality of the portfolio.
Methods popular in the machine learning literature, such as gradient boosted trees (GBTs), may provide a way to expedite a model build and allow for quick retraining of credit decision models. A GBT, by nature requires less feature engineering and, due to the non-parametric nature of GBTs, allows a more seamless streamlined process to develop the model. If the right environment is provided then a more recently trained model could be implemented, thus allowing for credit decisions to be made on more recent data, which in turn should enable an improved prediction.
It is important to note that the output of methods such as a GBTs can be hard to interpret. GBTs produce a myriad of trees with multiple interactions, making it difficult to gauge the relationship between individual predictor variables and the response variable. In contrast, a GLM can be easily interpreted, making it easier to explain to regulators and to the user of the model.