Research Papers research paper arxiv nlp language-models

To Write or to Automate Linguistic Prompts, That Is the Question

arXivby [Submitted on 26 Mar 2026]March 26, 20261 min read1 views

LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach w — Marina Sánchez-Torrón, Daria Akselrod, Jason Rauchwerk

View PDF HTML (experimental)

Abstract:LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization. Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the comparison is asymmetric: GEPA optimization searches programmatically over gold-standard splits, whereas expert prompts require in principle no labeled data, relying instead on domain expertise and iterative refinement.

Comments: 10 pages, to be submitted for EAMT 2026

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25169 [cs.CL]

(or arXiv:2603.25169v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25169

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Marina Sánchez-Torrón [view email] [v1] Thu, 26 Mar 2026 08:42:06 UTC (31 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25169v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

I ran by instinct for years. Then I built an AI running coach.

How a 50 km trail race, a broken ChatGPT workflow, and 60+ research papers led me to create Coach Leo. Continue reading on Medium »

Medium AI

1mabout 1 hour ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m1 day ago

ModelsFresh

How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

arXiv:2511.06676v2 Announce Type: replace-cross Abstract: Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that "the AI is biased". While this is often said jokingly, the light-hearted remark reflects a deeper concern. How can we be certain that an online post flagged as "inappropriate" was not simply the victim of a biased algorithm? This paper investigates this problem using a dual approach. First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE). The benchmark reveals a clear, systematic bias: on average, the model scores AAE text as 1.8 times more toxic and 8.8 times higher for "identity hate"

arXiv cs.HC

1mabout 7 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 166 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

To Write or to Automate Linguistic Prompts, That Is the Question

Submission history

Daily AI Digest

More about

I ran by instinct for years. Then I built an AI running coach.

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

I ran by instinct for years. Then I built an AI running coach.

“It's not about gatekeeping."

Adversaries have under-protected APIs in their sights

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ