Research Papers research paper arxiv nlp language-models

LLM Benchmark-User Need Misalignment for Climate Change

arXivMarch 30, 202610 min read0 views

arXiv:2603.26106v1 Announce Type: new Abstract: Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data repre — Oucheng Liu, Lexing Xie, Jing Jiang

View PDF HTML (experimental)

Abstract:Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at this https URL.

Comments: 37 pages (8 main), 31 figures, 14 tables

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26106 [cs.CL]

(or arXiv:2603.26106v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26106

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Oucheng Liu [view email] [v1] Fri, 27 Mar 2026 06:32:30 UTC (2,603 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26106

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m3 days ago

Research PapersLive

Assessing Pause Thresholds for empirical Translation Process Research

arXiv:2604.01410v1 Announce Type: new Abstract: Text production (and translations) proceeds in the form of stretches of typing, interrupted by keystroke pauses. It is often assumed that fast typing reflects unchallenged/automated translation production while long(er) typing pauses are indicative of translation problems, hurdles or difficulties. Building on a long discussion concerning the determination of pause thresholds that separate automated from presumably reflective translation processes (O'Brien, 2006; Alves and Vale, 2009; Timarova et al., 2011; Dragsted and Carl, 2013; Lacruz et al., 2014; Kumpulainen, 2015; Heilmann and Neumann 2016), this paper compares three recent approaches for computing these pause thresholds, and suggest and evaluate a novel method for computing Production

arXiv cs.CL

1mabout 1 hour ago

ModelsLive

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

arXiv:2604.01404v1 Announce Type: new Abstract: Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once

arXiv cs.CL

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 187 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Assessing Pause Thresholds for empirical Translation Process Research

arXiv cs.CL

1mabout 1 hour ago

Research PapersLive

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM , and I wanted to share it here because the findings are directly relevant to anyone dealing high frequency data and machine learning The core problem we solved: Every market maker's nightmare — getting picked off by informed traders right before a big move. We built a model that flags those toxic seconds before they wreck you. The data: - 31,081,463 second-level observations of BTC/USDT perpetual futures on Bybit - February 2025 → February 2026 (381 raw daily files) - Strict walk-forward regime, zero lookahead bias The key results (this is the part that shocked us): Our TailScore metric — which combines predicted toxicity probability with predicted price move severity — flags the top

Reddit r/MachineLearning

2mabout 1 hour ago

Research PapersLive

[D] ACL 2026 Decision

ACL 2026 decision are soon to be published ( submitted by /u/007noob0071 [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

Research PapersLive

Science Is Not a Reading Problem

For decades, scientific progress depended on reading papers. Continue reading on Medium »

Medium AI

1mabout 1 hour ago