Research Papers research paper arxiv ai artificial-intelligence

HypeLoRA: Hyper-Network-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning

arXivMarch 31, 202610 min read0 views

arXiv:2603.19278v2 Announce Type: replace-cross Abstract: Modern Transformer-based models frequently suffer from miscalibration, producing overconfident predictions that do not reflect true empirical frequencies. This work investigates the calibration dynamics of LoRA: Low-Rank Adaptation and a novel hyper-network-based adaptation framework as parameter-efficient alternatives to full fine-tuning for RoBERTa. Evaluating across the GLUE benchmark, we demonstrate that LoRA-based adaptation consistently achieves calibration parity with (and in specific tasks exceeds) full fine-tuning, while mainta — Bartosz Trojan, Filip G\k{e}bala

View PDF HTML (experimental)

Abstract:Modern Transformer-based models frequently suffer from miscalibration, producing overconfident predictions that do not reflect true empirical frequencies. This work investigates the calibration dynamics of LoRA: Low-Rank Adaptation and a novel hyper-network-based adaptation framework as parameter-efficient alternatives to full fine-tuning for RoBERTa. Evaluating across the GLUE benchmark, we demonstrate that LoRA-based adaptation consistently achieves calibration parity with (and in specific tasks exceeds) full fine-tuning, while maintaining significantly higher parameter efficiency. We further explore a dynamic approach where a shared hyper-network generates LoRA factors (A and B matrices) to induce structural coupling across layers. This approach produced results similar to standard LoRA fine-tuning, even achieving better MCC on CoLA dataset. Our study also reveal a critical trade-off: constraining the adaptation space (e.g., freezing matrices A) acts as a powerful regularizer that enhances Expected Calibration Error (ECE), but necessitates a carefully balanced sacrifice in downstream task accuracy. To support future research, we provide a unified and reproducible implementation of contemporary calibration metrics, including ECE, MCE, and ACE. Our findings clarify the relationship between parameter efficiency and probabilistic reliability, positioning structured low-rank updates as a viable foundation for uncertainty-aware Transformer architectures. Code available at: this https URL

Comments: 12 pages, 2 figures, 2 tables

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.19278 [cs.CL]

(or arXiv:2603.19278v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.19278

arXiv-issued DOI via DataCite

Submission history

From: Bartosz Trojan [view email] [v1] Sun, 1 Mar 2026 15:53:49 UTC (98 KB) [v2] Sun, 29 Mar 2026 14:35:38 UTC (97 KB)

Original source

arXiv

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Starlette 1.0 skill

Research: <a href="https://github.com/simonw/research/tree/main/starlette-1-skill#readme">Starlette 1.0 skill</a> See <a href="https://simonwillison.net/2026/Mar/22/starlette/">Experimenting with Starlette 1.0 with Claude skills</a>. Tags: <a href="https://simonwillison.net/tags/starlette">starlette</a>

Simon Willison Blog

1m9 days ago

Models

Streaming experts

I wrote about Dan Woods' experiments with streaming experts <a href="https://simonwillison.net/2026/Mar/18/llm-in-a-flash/">the other day</a>, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process. Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today <a href="https://twitter.com/seikixtc/status/2036246162936910322">@seikixtc reported</a> running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro. And <a href="https://twitter.com/anemll/status/2035901335984611412">@anemll showed</a> that same Qwen3.5-3

Simon Willison Blog

1m8 days ago

Products

Beats now have notes

Last month I <a href="https://simonwillison.net/2026/Feb/20/beats/">added a feature I call beats</a> to this blog, pulling in some of my other content from <a href="https://simonwillison.net/elsewhere/">external sources</a> and including it on the homepage, search and various archive pages on the site. On any given day these frequently outnumber my regular posts. They were looking a little bit thin and were lacking any form of explanation beyond a link, so I've added the ability to annotate them with a "note" which now shows up as part of their display. Here's what that looks like <a href="https://simonwillison.net/2026/Mar/22/">for the content I published yesterday</a>: <img class="blogmark-image" style="width:80%" src="https://static.simonwillison.net/static/2026/

Simon Willison Blog

1m9 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 232 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

The breakthrough that makes robot faces feel less creepy

Humans pay enormous attention to lips during conversation, and robots have struggled badly to keep up. A new robot developed at Columbia Engineering learned realistic lip movements by watching its own reflection and studying human videos online. This allowed it to speak and sing with synchronized facial motion, without being explicitly programmed. Researchers believe this breakthrough could help robots finally cross the uncanny valley.

ScienceDaily AI

1m2 months ago

Research Papers

Unbreakable? Researchers warn quantum computers have serious security flaws

Quantum computers could revolutionize everything from drug discovery to business analytics—but their incredible power also makes them surprisingly vulnerable. New research from Penn State warns that today’s quantum machines are not just futuristic tools, but potential gold mines for hackers. The study reveals that weaknesses can exist not only in software, but deep within the physical hardware itself, where valuable algorithms and sensitive data may be exposed.

ScienceDaily AI

1m2 months ago

Research Papers

A Complete List of All (arXiv) Adversarial Example Papers

Abstract: A continuously-updating list of all 1000+ papers posted to arXiv about adversarial examples.

Nicholas Carlini Blog

1malmost 7 years ago

Research Papers

Are adversarial example defenses improving?

Abstract: We (again) broke a large collection of published defenses to adversarial examples. Here's how and why.

Nicholas Carlini Blog

1mabout 6 years ago