How I built an AI that reads bank contracts the way bankers do (not the way customers do)
<h1> How I built an AI that reads bank contracts the way bankers do (not the way customers do) </h1> <p>The problem started in 2009. I was a banker. I watched loan officers use internal scoring grids that customers never saw. The information asymmetry wasn't illegal — it was just never shared.</p> <p>Fifteen years later, the asymmetry got worse. Banks now run LLMs on customer data before any human reviews it. The customer still signs without understanding what they're signing.</p> <p>So I built the reverse.</p> <h2> The core insight: bankers read contracts differently than customers </h2> <p>A customer reads a loan contract linearly — page by page, looking for the monthly payment.</p> <p>A banker reads it dimensionally — simultaneously scanning for:</p> <ul> <li> <strong>Covenant triggers<
How I built an AI that reads bank contracts the way bankers do (not the way customers do)
The problem started in 2009. I was a banker. I watched loan officers use internal scoring grids that customers never saw. The information asymmetry wasn't illegal — it was just never shared.
Fifteen years later, the asymmetry got worse. Banks now run LLMs on customer data before any human reviews it. The customer still signs without understanding what they're signing.
So I built the reverse.
The core insight: bankers read contracts differently than customers
A customer reads a loan contract linearly — page by page, looking for the monthly payment.
A banker reads it dimensionally — simultaneously scanning for:
-
Covenant triggers (what makes the loan callable)
-
Cross-default clauses (what other contracts could trigger this one)
-
Margin ratchets (how the rate changes under specific conditions)
-
Termination asymmetries (who can exit and under what conditions)
These aren't hidden in fine print. They're just never explained. An LLM trained to scan for these patterns — in the order a banker would — surfaces what a linear read misses.
The architecture
The system runs four specialized agents in parallel rather than one general-purpose model. This is borrowed from the 4D analytical framework we use at WASA Confidence — the principle being that parallel agents surfacing contradictions are more reliable than a single agent producing a confident answer.
Agent 1 — Clause Extractor
Parses the document structure. Identifies clause types, cross-references, and defined terms. Does not interpret — only maps.
system_prompt = """ You are a legal document parser. Your only task is to:system_prompt = """ You are a legal document parser. Your only task is to:- List every clause by type (payment, covenant, default, termination, rate)
- Flag every cross-reference between clauses
- Flag every defined term that appears in a clause but is defined elsewhere
Output JSON only. No interpretation. No summary. """`
Enter fullscreen mode
Exit fullscreen mode
Agent 2 — Risk Scanner
Takes the clause map from Agent 1. Scores each clause against a library of 340 known adverse patterns — built from 15 years of banking experience.
system_prompt = """ You are a senior credit analyst. You receive a structured clause map. For each clause, return:system_prompt = """ You are a senior credit analyst. You receive a structured clause map. For each clause, return:- risk_level: none / low / medium / high / critical
- pattern_match: which known adverse pattern this matches (if any)
- plain_language: one sentence explaining what this means for the borrower
Do not summarize the document. Score each clause independently. """`
Enter fullscreen mode
Exit fullscreen mode
Agent 3 — Cross-Contract Analyzer
This is the one customers never run. It takes the flagged clauses and checks them against the borrower's other contracts — insurance policies, supplier agreements, other loans.
A cross-default clause in a bank loan that triggers on a supplier payment delay is invisible if you only read the bank contract.
Agent 4 — Contradiction Detector
Runs against the outputs of Agents 1, 2 and 3. Looks for contradictions between what the contract says and what the borrower believes (captured in a short intake form).
The contradictions between agents are often more informative than any single agent's output. This is the core principle behind the WASA Confidence 4D methodology — parallel analysis surfaces what sequential analysis misses.
What it finds in practice
On a sample of 47 SME loan contracts analyzed:
Finding Count
Margin ratchet clause borrower was unaware of 31 / 47
Cross-default linking loan to unrelated supplier contracts 19 / 47
Callable provisions triggered by unmonitored financial ratios 8 / 47
Termination asymmetries giving bank unilateral exit rights 3 / 47
None of these were illegal. None were hidden. All were unread.
The technical limit worth being honest about
LLMs hallucinate on numerical conditions. If a covenant says "ratio must remain above 1.35x adjusted EBITDA" — the model will extract the clause correctly but may misinterpret what counts as adjusted EBITDA without the definition section.
The fix: Agent 1 explicitly maps every defined term before Agent 2 interprets any condition. You cannot let a model interpret a covenant before it has resolved every defined term in that covenant.
This sounds obvious. It isn't how most people prompt document analysis.
Where this goes
The same architecture applies to insurance contracts, supplier agreements, and lease terms. Anywhere a professional on one side of the table reads dimensionally and a non-professional on the other side reads linearly.
The full service — contract analysis, banking condition audit, transaction data room — is at mainstreetbrigade.org. The underlying 4D analytical framework is documented at wasaconf.org.
The code above is simplified but the architecture is production. Happy to discuss the prompt engineering for the contradiction detection agent in the comments — that's where most of the interesting edge cases live.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelproductservice
An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications
arXiv:2509.19185v3 Announce Type: replace Abstract: Foundation model (FM)-based AI agents are rapidly gaining adoption across diverse domains, but their inherent non-determinism and non-reproducibility pose testing and quality assurance challenges. While recent benchmarks provide task-level evaluations, there is limited understanding of how developers verify the internal correctness of these agents during development. To address this gap, we conduct the first large-scale empirical study of testing practices in the AI agent ecosystem, analyzing 39 open-source agent frameworks and 439 agentic applications. We identify ten distinct testing patterns and find that novel, agent-specific methods like DeepEval are seldom used (around 1%), while traditional patterns like negative and membership tes

A Multi-Language Perspective on the Robustness of LLM Code Generation
arXiv:2504.19108v5 Announce Type: replace Abstract: Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the robustness of code generation models remains an ongoing endeavor. Previous studies have primarily focused on code generation models specifically for the Python language, overlooking other widely used programming languages. In this work, we conduct a comprehensive comparative analysis to assess the robustness performance of several prominent code generation models and investigate whether robustness can be improved by repairing perturbed docstrings using an LLM. Furthermore, we investigate how their performanc

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
arXiv:2411.10656v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

![[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.](https://preview.redd.it/56hbny8rrjtg1.png?width=640&crop=smart&auto=webp&s=26d91255bcdd942aea5255c7d3ac259db5bebf23)
[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.
PokeClaw - A Pocket Version of OpenClaw Most "private" AI assistants are private because the company says so. PokeClaw is private because there's literally no server component. The AI model runs on your phone's CPU. There's no cloud endpoint. You can block the app from the internet entirely and it works the same. It runs Gemma 4 on-device using LiteRT and controls your phone through Android Accessibility. You type a command, the AI reads the screen, decides what to tap, and executes. Works with any app. I built this because I wanted a phone assistant that couldn't spy on me even if it wanted to. Not because of a privacy policy, but because of architecture. There's nowhere for the data to go. First app I've found that does fully local LLM phone control — every other option I checked either

Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation - Daytona Beach News-Journal
Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation Daytona Beach News-Journal



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!