Agentic AI in Retail and Consumer Lending

Lending is the part of retail banking under the most pressure right now. PSD3 collapses consent and aggregation timelines and brings the FIDA framework with it - broader data classes, longer-lived dashboards, an obligation to provide an interface even where the bank would rather not. AMLR adds material KYC weight to retail onboarding, not just corporate. The EBA loan origination guidelines push affordability assessment squarely into a creditworthiness-and-cashflow joint exercise. And the AI Act explicitly classes creditworthiness assessment as Annex III high-risk, which means the things a bank gets to do with a model in this domain are now bounded by transparency, fairness, human-oversight, and recourse obligations that did not previously bind in this shape.

Underneath all of that, the underwriting model itself is straining. A credit-score-led, bureau-centred picture of a customer fits one segment of the population well - the segment with a thick file. It fits a growing segment poorly: younger customers with thin files, gig-economy workers with irregular but stable cashflow, recent immigrants, customers who have moved away from credit cards on principle. The information needed to underwrite these customers exists. It just doesn't sit in the bureau. It sits in their bank account, where PSD3 now makes it accessible with consent in a structured form.

The shape of the work is agentic-shaped, the same way KYC was in the previous case. It is embarrassingly parallel - bank A's PSD2 endpoint, the document scanner, and the affordability model can run independently. It is multi-modal - open-banking JSON, scanned payslips, free-text policy, structured bureau returns. It is iterative - the affordability output drives further document requests in non-trivial cases. And it is retry-heavy - bank APIs go down, OCR returns garbage, the customer uploads a holiday photo where a payslip should be. A single-shot prompt to "underwrite this loan" is the wrong shape. A coordinated fan-out across specialised agents, consolidated back into a decision artefact with provenance, is the right one.

One honest note before the architecture. Nexus does not run a live consumer-lending pipeline. The components named below all run in production for adjacent purposes - Newton's research swarm, Leonardo's vision agent, Hermes's financial-time-series reasoning, the dual-delivery audit trail to Notion and Telegram, the cron-driven scheduling layer - but I have not pointed the assembled pipeline at a bank's loan queue. What follows is an architectural composition built from components Nexus actively runs, plus the specific additional controls a regulated lender would need on top. Especially in this domain the line matters. AI Act Annex III makes overclaiming on a credit decision the kind of mistake a lender's risk function flags within a five-minute read. The components are real. The composition is architectural. Both statements live in the same paragraph for a reason.

The five strands

A loan application arrives at the orchestrator as a customer record - name, identifier, requested product and amount, declared income - and the orchestrator decomposes it into five concurrent strands. Aggregation gathers the customer's bank-side picture under PSD2/3 consent. Documents ingests and validates the application bundle. Affordability scores the customer on a cashflow axis the bureau cannot see. Decisioning consolidates the strands into a structured decision under a hard latency budget. Servicing - back-book repricing, onward monitoring, and right-to-recourse - picks up where origination ends, reusing the same models against a different schedule.

Each strand maps onto a specific Nexus component, with a model chosen for the shape of the work and a fallback that survives a provider outage along the lines of the previous case. The orchestrator runs on glm-5.1:cloud; aggregation and affordability share kimi-k2.5:cloud; documents run on gemma4:31b-cloud; the decisioning and audit layer is dull Python rather than an agent. The pipeline is meant to be readable end-to-end by the operator - and, more importantly, by an external reviewer who needs to trace any single decision back to the inputs that produced it.

Pipeline shape - fan-out across four specialised agents, consolidation into decision plus audit plus servicing

Aggregation - PSD2/3 across consented sources

The aggregation strand is Newton's research-swarm pattern, transposed onto a much narrower fan-out. Where Newton at full stretch can spawn up to a hundred parallel sub-agents to cover a long-tail research question, consumer aggregation typically needs five to fifteen - the customer's actual bank list. The swarm shape is overkill for the count, but earns its keep on the shape of the work: each consented bank API is hit independently under its own session, the slowest counterparty does not block the chain, and partial results come back with a freshness tag rather than blocking on completeness.

Consent is the architecturally interesting part. PSD3 with FIDA shifts the model from per-call consent to longer-lived dashboards under the customer's control - which means the system holds something closer to a standing arrangement, but the standing arrangement has to be checkable on every call against the customer's current consent surface. The simplest defensible implementation is a consent vault holding tokens with explicit TTLs, no caching of aggregated payloads beyond the consent window, and every aggregation request paired against a live consent check. The Nexus pattern of dual-delivery audit applies cleanly: every aggregation call writes to an immutable trail with the consent state at the time of the call, so any later customer challenge - "I didn't authorise that read" - can be answered from the trail rather than reconstructed.

The model choice is deliberate and slightly counter-intuitive. kimi-k2.5:cloud is the swarm's primary because the consolidation step at the end of aggregation needs entity resolution across slightly-different account names and counterparty strings - the kind of work where a thinner model gets sharp answers wrong. The aggregation layer itself is dull JSON-over-HTTPS; the model only enters at the consolidation seam.

Documents - ingestion at application time

The document strand is Leonardo, the same vision agent that runs the daily-charts cron at 08:00 CEST. The model is gemma4:31b-cloud. The discipline that makes this strand defensible - rather than another "the AI read my payslip" demo - is schema validation. Every document type the lender accepts has a fixed extraction schema. A payslip extracts to {employer, period, gross, net, deductions[]}. A bank statement to {account_holder, period, transactions[]}. An ID document to {type, name, dob, expiry, mrz_check}. Leonardo's job is to populate the schema. Leonardo is graded on whether the output validates, not on the fluency of the prose it can produce around the document.

This matters for one reason. An LLM that writes "I can see this is a payslip from XYZ Corp showing approximately 38,000 SEK net for March" is not extracting. It is hallucinating with the costume of an extraction on. Schema validation is the discipline that separates the two. If the model returns a payload that doesn't validate, the document is routed to a human queue with the specific failed field flagged. It is not retried into confidence. It is not partially trusted. It is held until a human resolves it, and the resolution is logged.

# Document strand - schema-first extraction contract

document_type:      "payslip"
required_fields:    [employer, period, gross, net, deductions]
extractor:           "leonardo" # gemma4:31b-cloud
validation:
  on_success:        emit_to_aggregation_consolidator
  on_schema_fail:    route_to_human_queue
  on_low_confidence: route_to_human_queue
audit:
  written_to:        [notion_audit_db, telegram_owner_dm]
  retains:           [raw_image_hash, model_version, prompt_version]

The audit retention is not optional. Under AI Act Article 13 transparency obligations, the lender has to be able to tell the customer - in plain language - how an automated decision used their documents. The retained model_version and prompt_version let the operator answer that question precisely six months later, when the customer asks. Without them, the answer is an honest "we don't know any more," which is not an answer the regulator accepts.

Affordability - beyond the credit score

The affordability strand is the architecturally interesting one. It is a Hermes-pattern agent - financial-time-series reasoning, on kimi-k2.5:cloud - pointed at twelve to twenty-four months of transaction-level cashflow data delivered by the aggregation strand. Hermes today does this shape of work on market data: classify, detect regime, project under stress, surface a recommendation that's auditable per-input. Cashflow affordability is the same shape, on a different dataset, with a different output schema.

The crucial framing: this is not a replacement for the bureau score. Replacing the score would be both unwise and unfeasible - the bureau score is a strong signal, hard-won over decades, and it carries information about repayment behaviour the aggregation can't see. The affordability layer is a second axis, complementary. The customer with a thin file but stable cashflow gets approved where they'd have been declined. The customer with a thick file but cashflow stress gets caught where they'd have slipped through. Two axes, joint policy, both auditable, neither replacing the other.

A bureau score is a number. A cashflow affordability finding is a number with provenance. "Estimated stable monthly income: 38,500 SEK" is backed by an array of twenty-four transactions the customer can be shown. AI Act Annex III explainability is achievable here in a way it is not for a closed-form scorecard.

The output schema enforces the provenance discipline. Every claim the affordability agent makes - stable income, variable income, recurring obligations, discretionary spend, stress-tested free cashflow - is paired with the input transactions that supported it. A free-text rationale is not the artefact. The structured per-claim attribution is the artefact. The free text, if generated, is for the customer-facing explanation only.

Decisioning - real-time orchestration

The decisioning strand is the orchestrator's consolidation step, with one variable that materially changes the shape compared with the AFC pipeline in the previous case: the latency budget. Corporate KYC can run for an hour and nobody minds. Application-time consumer decisioning needs an end-to-end answer in something like thirty seconds, often less, or the customer abandons. So the orchestrator is shaped around the critical path explicitly. Aggregation and document strands fire concurrently from the start. Affordability waits on aggregation but not on documents. Decisioning waits on the three primary strands; the AFC overlay (sanctions, PEP, fraud screen) runs in parallel and joins. The architectural critical path is twelve to eighteen seconds in the design; non-critical-path strands run alongside and join late where they're fast enough, or write into the trail asynchronously where they're not.

Graceful degradation is the part most pilots get wrong. When an upstream agent times out - and they will time out - the orchestrator does not hard-fail. It records the timeout in the audit trail, marks the affected strand as unknown rather than substituting an optimistic default, and either escalates the application to human review or returns a more conservative decision under a clearly-named policy. The audit trail records which strand was unknown, why, and which fallback policy applied. A decline with an unknown affordability is a different kind of decline from a decline with a known affordability, and the trail has to make that distinguishable.

The dual-delivery audit pattern from the previous case carries over directly. Every decision writes to the immutable audit and to a human-visible operator channel. The absence of an audit entry is itself a signal - a missing decision record is the kind of failure that has to be findable within hours, not weeks. In production at a bank, the audit destination is the regulator-readable log; for Nexus's home-lab analogue, it's Notion plus Telegram. The pattern is the same.

Servicing - back-book repricing as scheduled batch

The servicing strand is where the cron-driven scheduling layer earns its keep. A back-book repricing event - base-rate change, new affordability rule, new fairness audit, new product policy - needs every customer in the affected segment re-scored against the new criteria. That's a population scan, scheduled overnight, running exactly the same affordability agent that ran at origination. Newton's swarm carries the parallelism; the dual-delivery audit carries the explainability. Every customer whose status changes generates a per-customer decision artefact written to the audit trail. Every customer whose status doesn't change generates an explicit "considered, no change" record, because the absence of a record is not the same as a positive consideration.

The servicing strand is also where AI Act Annex III's right-to-recourse obligations land in concrete form. An automated change in credit terms is a decision the customer can challenge, and the audit trail has to support that challenge - what was decided, on which inputs, against which policy version, and where the human-review escalation lives. The architectural posture is straightforward: recourse is a first-class capacity-planned channel, not a fallback nobody expects to exercise. A repricing batch that tips three percent of a portfolio onto a different rate is producing three percent of a portfolio's worth of potential recourse cases. The recourse channel has to be sized for that, not for the percentage that will actually escalate.

What translates to a bank

Every strand in this case study has a direct analogue in what a bank deploying AI into a regulated lending journey will need to put in place. The mapping is not metaphorical; it is one-for-one.

Nexus strand	Bank translation
Aggregation - Newton swarm under PSD2/3 consent	PSD3/FIDA-aligned aggregation broker; consent token vault with explicit TTL enforcement; per-call live consent re-check; no caching of aggregated payloads past the consent window; aggregation event logged to an immutable audit so any later customer challenge can be answered from the trail rather than reconstructed.
Documents - Leonardo schema-validated extraction	Document-understanding service with a schema policy under change control; extraction failures routed to human queue, never retried-into-confidence; OCR + LLM hybrid rather than LLM-alone for safety-relevant fields; per-field provenance, model version, and prompt version retained for AI Act Article 13 transparency.
Affordability - cashflow-axis reasoning with per-claim provenance	Cashflow model treated as augmentation, not replacement, of the bureau-centred decisioning model; per-claim input attribution to satisfy the EBA loan-origination guidelines explainability bar; model-risk-management treatment as a Tier 1 model under SS1/23 (UK), SR 11-7 (US), or the European equivalent.
Decisioning - orchestrated under a sub-30s latency budget	API-fronted decision service with a published end-to-end latency SLA; graceful degradation under upstream timeouts with the degraded path explicitly named in the audit, not silently substituted; decision output an artefact the customer can be shown under the right-to-explanation.
Servicing - same models, scheduled batch, recourse channel	Back-book repricing event-sourced into an auditable population scan; the same model serves origination and servicing so divergence does not creep in; right-to-human-review path explicit, capacity-planned, and exercised against synthetic load before the batch is run in production.

Why this matters under AI Act Annex III Creditworthiness assessment is a designated high-risk AI system. The compliance bar is not a paperwork exercise - it is a documented design requirement covering transparency, fairness, human oversight, recourse, and ongoing monitoring. An agentic lending pipeline that cannot trace any single decision back to its inputs, name its model and prompt versions at the time of decision, and serve a right-to-recourse on demand has not met the bar. The architecture above is shaped to that bar from day one rather than retrofitted afterwards.

None of this is exotic. All of it is what a regulated lender's risk and compliance functions would expect to see in the control narrative for an AI-driven lending journey. A surprising number of in-flight bank pilots have a working aggregation strand and not much else - the document layer is a demo, the affordability axis is aspirational, the audit trail is partial, and the recourse path is documented but not exercised. That is the gap this case study exists to point at.

What I would do differently at bank scale

Three things. First, the model-risk-management envelope. A consumer-lending decision under AI Act Annex III is a high-risk AI system with named obligations on validation, monitoring, and human oversight. That means a discipline stricter than Nexus's: independent model validation against the SS1/23 or SR 11-7 framework, ongoing performance monitoring sliced by demographic and product segment, override and recourse processes documented and exercised on a calendar. The Nexus pattern - one operator, one config file, one audit trail - is defensible for a personal system; it is not defensible for a regulated decisioning estate where the model risk function and the operations function are properly separated.

Second, the fairness layer would be a continuous offline pipeline, not a bullet point. A nightly job re-scores the live model against demographic and product slices, alerts on drift, and gates production deployments on a fairness audit alongside the performance audit. The "is the model fair this morning" question has to be a question the system answers itself and signals about, not a question the regulator asks before the answer can be produced. This is the strand most pilots quietly drop in version one and then never put back in version two. It belongs in version one or it does not belong at all.

Third, the recourse path would be capacity-planned and load-tested. Recourse is the layer that fails silently in most early bank deployments - documented but not load-tested, expected to be a trickle, swamped the first time an automated repricing batch tips a meaningful slice of a back-book onto different terms. The right architecture treats recourse as a first-class capacity-planned channel from day one, with synthetic load tests against the same workflow software the human reviewers will actually use, on the same SLA the customer has been promised. Anything less is a paper recourse path, and a paper recourse path is what the regulator finds first.

Those three changes are what the gap between "an architectural composition built from components Nexus runs" and "a production lending pipeline a bank runs" actually looks like. Everything else - the five-strand structure, the schema-first document extraction, the cashflow-axis affordability with per-claim provenance, the latency-budgeted orchestration with named graceful degradation, the same-model servicing with capacity-planned recourse - generalises cleanly.

Next case study

Cron-driven autonomy - eleven jobs that keep a multi-agent system honest

How Nexus uses scheduled work to do three different things at once - self-healing (sentinel, zombie cleanup, weekly reviews), scheduled intelligence (morning briefing, Newton autoresearch, Leonardo daily charts, Hermes nightly backtests), and housekeeping (backup, log rotation, maintenance). The overnight-batch layer reimagined as an agent layer - and the form banks' compliance batch is already quietly taking under DORA. Publishing next.

Get in touch