How Algoan Scales Open Banking-based Credit Risk Models with Large Language Models

From Transactions Chaos to Categorized Insights
At Algoan, we have spent the last few years building cutting-edge tools to help lenders make smarter and fairer decisions based on open banking data (i.e., bank account transactional data shared with the consent of the credit applicant).
Credit Insights, one of our key products, transforms raw bank transaction data into actionable credit risk indicators, estimating incomes, assessing affordability, detecting incidents on accounts, etc. By automatically categorizing income, expenses, credit charges and financial behaviors, Credit Insights gives lenders an up-to-date, highly granular view of a borrower's financial situation, allowing them to evaluate repayment capacity, detect early signs of financial stress, and provide a more accurate, real-time picture of creditworthiness.
As open banking reshapes the credit industry, we have entered a new era where access to raw transactional data is no longer the bottleneck. The real challenge now lies in interpreting this data. Interpreting transactional data is far from trivial: bank transactions are not designed for automated processing. Descriptions are often truncated, inconsistent, full of abbreviations, typos, or bank-specific formatting rules. The same merchant or payment type may appear under dozens of different labels. Even two transactions that look similar on the surface may reflect very different financial behaviors depending on their context. And on top of that, every country, bank, and customer segment brings its own data quirks. Extracting consistent, high-quality features from this messy raw input is one of the biggest challenges in building robust, reliable credit decisioning systems.
For years, our goal has been to build precise categorization models, at scale, with nuance, and with the ability to adapt across markets and learn fast on new data, in a cost-efficient way. This is where Large Language Models (LLMs) come into play.
Continuous Improvement Without Exploding Ressources
When it comes to categorizing transactions, we do not serve LLMs directly in production pipelines. While LLMs are flexible, they are expensive to serve at scale and introduce variability in outputs. Thus our production models remain supervised and purpose-built for transaction data. This gives us full control over cost, output stability, and explainability, all essential for regulatory compliance and credit decisioning. These specialized models deliver excellent precision across the core categories we track, which is a must when you are building trust with lenders and regulators.
But our algorithms need labeled data for both training and monitoring. And high-quality annotations remain a major bottleneck when you are working with noisy transactional data. Annotating financial transactions is only labor-intensive and highly sensitive to interpretation. Here’s where LLMs enter the loop: we leverage local verticalized LLMs to label transactions.

LLMs allow us to scale up annotation without scaling our team linearly. We now combine the following steps for annotation:
- Automatic labeling via pattern recognition: this is more significant for early categorization models. When the models are mature, these patterns become marginal for active learning (only for new organizations or new bank conventions).
- Automatic LLM-based annotation with LLMs that were verticalized on transactional data, including retrieval-augmented generation (essentially for a precise annotation on rare/small organizations): the annotations are only accepted if confidence exceeds a high threshold.
- Feedback from clients.
- Human-in-the-loop on much smaller data volumes than before thanks to the LLMs: we divided the volumes manually labeled by 3.
Using clustering techniques that we specifically designed for transactional data, we group transactions with similar descriptions, amounts, and days of month to identify homogeneous groups, which should share consistent labels. These algorithms operate at the end of the pipeline to flag inconsistencies between humans and/or machines annotations. Consensus mechanisms or human-in-the-loop reviews are then used to validate the final labeled data in these cases.
As a result, we now have a semi-automated, self-correcting pipeline where humans validate edge cases while machines handle the bulk. The LLMs accelerate our data labeling, but the final models deployed remain optimized for precision, consistency, and auditability. LLMs are not the classifier, they are the teacher’s assistant.
Our active learning process leverages this pipeline, ensuring that our systems learn continuously from new data. We don’t just ship a model: we monitor, retrain, and adapt. With LLM-empowered pipelines, we now better detect anomalies and explain predictions. Typically, significant divergences between LLM and our production model trigger reviews, helping us spot model drift earlier and avoid performance degradation.
In short, LLMs are not replacing our models: they are amplifying our ability to build them. By combining smart automation with human expertise, we are scaling faster, improving quality, and keeping control.
Camille Charreaux, Head of Data Science at Algoan.
You may also be interested in
A project? A question?
Looking to change the way you make credit decisions? Let's talk!
