LLM Benchmark-User Need Misalignment for Climate Change
arXiv:2603.26106v1 Announce Type: new Abstract: Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data repre — Oucheng Liu, Lexing Xie, Jing Jiang
View PDF HTML (experimental)
Abstract:Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at this https URL.
Comments: 37 pages (8 main), 31 figures, 14 tables
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.26106 [cs.CL]
(or arXiv:2603.26106v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.26106
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Oucheng Liu [view email] [v1] Fri, 27 Mar 2026 06:32:30 UTC (2,603 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Assessing Pause Thresholds for empirical Translation Process Research
arXiv:2604.01410v1 Announce Type: new Abstract: Text production (and translations) proceeds in the form of stretches of typing, interrupted by keystroke pauses. It is often assumed that fast typing reflects unchallenged/automated translation production while long(er) typing pauses are indicative of translation problems, hurdles or difficulties. Building on a long discussion concerning the determination of pause thresholds that separate automated from presumably reflective translation processes (O'Brien, 2006; Alves and Vale, 2009; Timarova et al., 2011; Dragsted and Carl, 2013; Lacruz et al., 2014; Kumpulainen, 2015; Heilmann and Neumann 2016), this paper compares three recent approaches for computing these pause thresholds, and suggest and evaluate a novel method for computing Production

Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models
arXiv:2604.01404v1 Announce Type: new Abstract: Language models can answer many entity-centric factual questions, but it remains unclear which internal mechanisms are involved in this process. We study this question across multiple language models. We localize entity-selective MLP neurons using templated prompts about each entity, and then validate them with causal interventions on PopQA-based QA examples. On a curated set of 200 entities drawn from PopQA, localized neurons concentrate in early layers. Negative ablation produces entity-specific amnesia, while controlled injection at a placeholder token improves answer retrieval relative to mean-entity and wrong-cell controls. For many entities, activating a single localized neuron is sufficient to recover entity-consistent predictions once
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Assessing Pause Thresholds for empirical Translation Process Research
arXiv:2604.01410v1 Announce Type: new Abstract: Text production (and translations) proceeds in the form of stretches of typing, interrupted by keystroke pauses. It is often assumed that fast typing reflects unchallenged/automated translation production while long(er) typing pauses are indicative of translation problems, hurdles or difficulties. Building on a long discussion concerning the determination of pause thresholds that separate automated from presumably reflective translation processes (O'Brien, 2006; Alves and Vale, 2009; Timarova et al., 2011; Dragsted and Carl, 2013; Lacruz et al., 2014; Kumpulainen, 2015; Heilmann and Neumann 2016), this paper compares three recent approaches for computing these pause thresholds, and suggest and evaluate a novel method for computing Production
![[R], 31 MILLIONS High frequency data, Light GBM worked perfectly](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-neural-network-P6fqXULWLNUwjuxqUZnB3T.webp)
[R], 31 MILLIONS High frequency data, Light GBM worked perfectly
We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM , and I wanted to share it here because the findings are directly relevant to anyone dealing high frequency data and machine learning The core problem we solved: Every market maker's nightmare — getting picked off by informed traders right before a big move. We built a model that flags those toxic seconds before they wreck you. The data: - 31,081,463 second-level observations of BTC/USDT perpetual futures on Bybit - February 2025 → February 2026 (381 raw daily files) - Strict walk-forward regime, zero lookahead bias The key results (this is the part that shocked us): Our TailScore metric — which combines predicted toxicity probability with predicted price move severity — flags the top


![[D] ACL 2026 Decision](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-satellite-bUaXYHZsoMZjyA4XgfFqkD.webp)
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!