ProText: A benchmark dataset for measuring (mis)gendering in long-form texts
arXiv:2603.27838v1 Announce Type: new Abstract: We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional — Hadas Kotek, Margit Bowler, Patrick Sonnenberg, Yu'an Yang
View PDF
Abstract:We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender cues or when models default to heteronormative assumptions.
Comments: 13 pages, 10 figures, 6 tables
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.27838 [cs.CL]
(or arXiv:2603.27838v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.27838
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yu'an Yang [view email] [v1] Sun, 29 Mar 2026 19:45:31 UTC (766 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Sakana AI launches "Ultra Deep Research" to automate weeks of strategy work
Sakana AI has unveiled "Sakana Marlin," an AI assistant for business customers that researches autonomously for up to eight hours and delivers finished analyses. The tool is designed to compress weeks of strategy work into hours and is currently in beta testing. The article Sakana AI launches "Ultra Deep Research" to automate weeks of strategy work appeared first on The Decoder .

AI models will deceive you to save their own kind
Researchers find leading frontier models all exhibit peer preservation behavior Leading AI models will lie to preserve their own kind, according to researchers behind a study from the Berkeley Center for Responsible Decentralized Intelligence (RDI).…
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

AI models will deceive you to save their own kind
Researchers find leading frontier models all exhibit peer preservation behavior Leading AI models will lie to preserve their own kind, according to researchers behind a study from the Berkeley Center for Responsible Decentralized Intelligence (RDI).…





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!