
Micro-clustering of bank transaction data
A new approach to understanding consumers’ personal budget configurations & financial habits with unsupervised learning
Following our participation in the latest Credit Scoring and Credit Control Conference, we are glad to introduce our new approach to automatise affordability assessment with Open Banking data.
The algorithm behind: “Micro-clustering of bank transaction data” is relevant for all types of credit, specifically for consumer loans and BNPL, where lenders need to combine short Time to yes and responsible credit.
As you know, we are in an Open Banking era
-
87%
of the countries have some form of Open Banking API
-
>470
third-party providers registered with a National Competent Authority in Europe
-
800m
Open Banking API calls each month in the UK
This creates a major opportunity for credit risk assessment
For the end-customer
- Shorter “time-to-yes”
- Improved customer journey
- Better alignment between amount granted and
needs/affordability - Fairer decision / better access to credit
For the lender
- Less costly processes
- Higher acceptance rate
- More accurate affordability assessment
- Better risk management
However, the use of bank transaction data to assess affordability is currently limited
- The transactions descriptions are not always self-explicit (e.g. the word “salary” is rarely embedded in the description of the transaction related to the payment of the salary).
- Even when relevant patterns can be identified in the transaction description, it generally does not tell much about the regularity of such transaction.
- Whilst the above issues can be successfully mitigated by specific approaches (e.g. consensus-based annotation of ambiguous transactions to facilitate a supervised learning approach and the inclusion of some element of recurrence), these can be manpower, time and/or data hungry.
- Lastly, the categorisation approach does not easily cross borders: a categorisation engine is country and language specific and cannot be generalised.
Algoan has developed another approach that encompasses the time dimension, provides clear answers to the limitations listed above.
Rather than looking at bank transaction data as a sequentialseries of payments, we have considered a projection of such data into a multi-dimensional space (with the amount, the weekday, the day in month, the category, the textual information, etc. being each a distinct dimension) and used state of the art clustering methods to highlight key relationships between transactions. The main challenge we have addressed is about mixing heterogeneous information (e.g. date, amounts, textual information) together.