Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPolymarket Kalshi ArbitrageDEV CommunityBMAD-Method: AI-Driven Agile Development That Actually Works (Part 1: Core Framework)DEV CommunityBehind the Scenes: How Database Traffic Control WorksDEV CommunityWe Built the Same Agent Three Times Before It WorkedDEV CommunityWhy Cybersecurity Compliance Is Now a Strategic Business Asset — Not Just a Legal ObligationDEV CommunityScan Any Document to a Searchable PDF For Free, Right in Your BrowserDEV CommunityAI Writes Better UI Without React Than With ItDEV CommunityScan Any Document to a Searchable PDF — For Free, Right in Your BrowserDEV CommunityWhy LLM orchestration is broken (and how cryptographic agent identities fix it)DEV CommunityBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDEV CommunityBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPolymarket Kalshi ArbitrageDEV CommunityBMAD-Method: AI-Driven Agile Development That Actually Works (Part 1: Core Framework)DEV CommunityBehind the Scenes: How Database Traffic Control WorksDEV CommunityWe Built the Same Agent Three Times Before It WorkedDEV CommunityWhy Cybersecurity Compliance Is Now a Strategic Business Asset — Not Just a Legal ObligationDEV CommunityScan Any Document to a Searchable PDF For Free, Right in Your BrowserDEV CommunityAI Writes Better UI Without React Than With ItDEV CommunityScan Any Document to a Searchable PDF — For Free, Right in Your BrowserDEV CommunityWhy LLM orchestration is broken (and how cryptographic agent identities fix it)DEV CommunityBeyond the Hype: A Practical Guide to Integrating AI into Your Development WorkflowDEV CommunityBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfake

Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28028v1 Announce Type: cross Abstract: Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweig — Arundhathi Dev, Justin Zhan

View PDF HTML (experimental)

Abstract:Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with single-GPU training. Our approach decouples lightweight visual character detection (domain-agnostic) from domain-specific linguistic correction using pretrained sequence models including T5, ByT5, and BART. By training the correctors entirely on synthetic noise, we enable annotation-free domain adaptation without requiring labeled target images. Evaluating across modern clean handwriting, cursive script, and historical documents, we identify a critical "Pareto frontier" in architecture selection: T5-Base excels on modern text with standard vocabulary, whereas ByT5-Base dominates on historical documents by reconstructing archaic spellings at the byte level. Our results demonstrate that this decoupled paradigm matches end-to-end transformer accuracy while reducing compute by approximately 95%, establishing a viable, resource-efficient alternative to monolithic OCR architectures.

Comments: Accepted to the International Conference on Machine Intelligence Theory and Applications (MiTA 2026)

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2603.28028 [cs.CV]

(or arXiv:2603.28028v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28028

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Arundhathi Dev [view email] [v1] Mon, 30 Mar 2026 04:39:26 UTC (125 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Efficient D…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers