Internationalisation – How to scale a data model?
Imagine a business that offers credit in France but has plans to expand in other countries, particularly in Europe. An immediate reaction would be – that’s not going to be easy! Even in the EU, contexts vary greatly. With non-homogenous data and credit systems operating in different ways – the business might have a nasty surprise and need to rebuild its scoring model from scratch.
Far from being an imaginary scenario, this was the status quo up until recently. Designing a data model that works in every country is complicated, but not impossible. Particularly given the availability of a powerful resource – Open Banking data.
Since the beginning of Algoan, we have worked hard to come up with an API credit scoring model that is internationally scalable through the use of Open Banking data. How have we done this? Camille Charreaux, our Head of Data Science, gives the lowdown on our data choices to create a product that works across different countries, without being overly complicated or difficult.
Accessible but disparate data
Do you remember the last time you applied for credit? If it was over five years ago, it would probably have been something like this:
“Traditionally in France, credit establishments ask loan applicants consumers to fill out an online form. There are 10 – 20 questions to answer about their family situation, income, outgoings, any other credit, etc.”
And if you have contacted several banks and credit organisations, you will know that the questionnaires are all different – as was the credit-issuing decision. Now, imagine this between different countries!
This approach has its problems:
- The data are declaratory, so they can contain errors and things can be left out or forgotten. There is often a gap between income, which is overestimated, and outgoings, which are underestimated. There is a tendency not to declare other ongoing credit commitments to improve the chances of success.
- The data are not financial. Or at least not uniquely so: a lot of the data are demographic and socio-professional (age, family situations, category of job, etc.). To find out if an individual is in a position to repay the credit, financial data are the most reliable.
It isn’t a perfect scoring model, neither in terms of the collection nor the nature of the data. Add to this a further layer of complexity – the differing ways in which things work in different countries with the presence of credit agencies.
“We don’t have this in France, but it is common in many other countries. Credit agencies collect information on the credit held by consumers. They provide this to credit establishments so they can enjoy an overarching view of the individual’s circumstances.”
Data is no longer just declaratory, which solves part of the problem. However, other issues appear:
- The collected data varies from country to country, even with credit agencies in some countries.
- Depending on the country, all the credit may not be logged with credit agencies, creating disparities.
- The harvested data is not as granular as banking data.
- Only citizens who have taken out a credit appear in the database. It can be difficult to gain a credit score – and so have access to credit – with a first application. For example, in the USA, without a FICO score, it can be difficult to get a loan.
Even when credit organisations use credit agencies, they have to adapt their scoring models according to the data harvested by each one.
The good news is things have evolved in the past few years.
- At the outset, post-2010, consumer banking data was collected using a web scrapping method that gathers financial data to evaluate the potential ability of an individual to repay credit.
- It was lacking in terms of security until PSD2, the Revised Directive on Payment Services came into force in 2019. This secured access to banking data by making banks put secure APIs in place with strict authentication mechanisms. This is the famous ‘Open Banking’. It allows aggregators to develop secure, robust connections and provide granular, universal banking data.
This Open Banking data creates a solid foundation for internationally scalable models.
Open Banking – an undeniable data processing opportunity
Why does Open Banking data radically change things for data processing?
“Open Banking data use formats that are well known in the data science world. They are digital (transaction amounts) and textual. We have some insight into the model and architecture types that work with these data categories.”
Their very nature resolves many problems encountered in traditional credit scoring:
- They are always the same data in a similar format.
- The data source is unique and cannot be forged. It is not declaratory: the data are obtained directly from users’ bank accounts.
To sum up, the data are representative of the financial situation of individuals applying for a loan. They are a perfect basis for building a data model that can be replicated on an international basis.
“Far from being niche, Open Banking has a profound impact on how credit is granted. At Algoan, we have chosen to work with several aggregators connected directly to banking APIs to collect data. This means we avoid having to develop connectors and we remain focused on developing our Credit Scoring API.”
Open Banking represents a paradigm shift in the credit journey and opens up possibilities to develop products that are scalable on an international basis.
Scaling a data-driven product
Open Banking data represents a powerful resource. The next step is to build data models that can scale with them.
“We always knew we wanted to offer a global product. It is important because this has been integrated natively into the construction of our data models: the way data is collected, processed, etc.”
This is the method we’ve adopted at Algoan:
1st Phase→ Designing functionalities:
This first phase aims to list the functionalities needed – to understand which data to collect. In our case, we knew the different steps that lead to credit being granted, so we reviewed each stage of the credit decision to decide the most relevant variables to select. These enable an accurate consumer banking profile to be established (income, spending volatility, overdraft use, incidents, etc.)
2nd Phase→ Building the generic algorithm outline:
The aim is to build an outline adapted for every context. This can be done with bank data. Once the architecture is defined, the real data work begins with the strategies for collection and labeling. With Open Banking data, pre-processing is similar in every country (cleaning and simplifying data to be injected into algorithms). Then the algorithms need to be trained – they learn thanks to the labels given to them. Their objective is to work out the labels on future data.
3rd Phase→ Product personalisation using specific data:
Once the universal base layer is built, more specific data can be considered in the context of each country (behaviours, social habits, lifestyle, etc.) with the implementation of a complementary labeling strategy.
“To scale a product-driven by data, having an international vision from the very beginning is pretty much unavoidable. It saves precious time down the line as algorithms are designed with this in mind. So it is necessary to think universally at the beginning and then add a layer of specialisation that responds more precisely to the local context.”
The good news is that with data, improvements are continuous. Algorithms are always getting better. The more data you have, the better it gets. The better it gets, the more data there is. This positive cycle is made possible only by the implementation of good collection strategies and data scaling.
Credit has become more straightforward thanks to Open Banking. It is a solid foundation for any value proposition Algoan develops: improving access to credit. With Open Banking Data, we can offer a product that works better, and that works everywhere.