Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI, Anthropic eye new AI safety solution - News.azGoogle News: AI SafetyChatGPT comes to CarPlay with iOS 26.4, supports voice-only interaction - The Times of IndiaGoogle News: ChatGPTHave an Unreasonably Specific Story About The FutureLessWrongFair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start - Asia Research News |Google News: Machine LearningWhy Vera cofounder Yaniv Bernstein was surprised when he said he was giving up AI - Startup DailyGoogle News: Machine LearningReact Native Background Task Processing Methods (2026)DEV CommunityFlutter AI Virtual Try-On: 6-Week Build, Zero BSDEV CommunityHow to Choose the Best Speech-to-text API for Voice AgentsHackernoon AIDetecting Bots in 2026: IP Intelligence + Email Validation in One API CallDEV CommunityExtremism Researchers Pivot to AI Industry’s Trust and Safety Gaps - Startup FortuneGoogle News: AI SafetyI built 2 free web tools to solve problems that annoyed me — here's what I learnedDEV CommunityHow to Build Production Ready AgentScope Workflows with ReAct Agents, Custom Tools, Multi-Agent Debate, Structured Output and Concurrent PipelinesMarkTechPostBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI, Anthropic eye new AI safety solution - News.azGoogle News: AI SafetyChatGPT comes to CarPlay with iOS 26.4, supports voice-only interaction - The Times of IndiaGoogle News: ChatGPTHave an Unreasonably Specific Story About The FutureLessWrongFair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start - Asia Research News |Google News: Machine LearningWhy Vera cofounder Yaniv Bernstein was surprised when he said he was giving up AI - Startup DailyGoogle News: Machine LearningReact Native Background Task Processing Methods (2026)DEV CommunityFlutter AI Virtual Try-On: 6-Week Build, Zero BSDEV CommunityHow to Choose the Best Speech-to-text API for Voice AgentsHackernoon AIDetecting Bots in 2026: IP Intelligence + Email Validation in One API CallDEV CommunityExtremism Researchers Pivot to AI Industry’s Trust and Safety Gaps - Startup FortuneGoogle News: AI SafetyI built 2 free web tools to solve problems that annoyed me — here's what I learnedDEV CommunityHow to Build Production Ready AgentScope Workflows with ReAct Agents, Custom Tools, Multi-Agent Debate, Structured Output and Concurrent PipelinesMarkTechPost

STCALIR: Semi-Synthetic Test Collection for Algerian Legal Information Retrieval

arXiv cs.IRby M'hamed Amine Hatem, Sofiane Batata, Amine Mammasse, Fai\c{c}al AzouaouApril 2, 20261 min read0 views
Source Quiz

arXiv:2604.00731v1 Announce Type: new Abstract: Test collections are essential for evaluating retrieval and re-ranking models. However, constructing such collections is challenging due to the high cost of manual annotation, particularly in specialized domains like Algerian legal texts, where high-quality corpora and relevance judgments are scarce. To address this limitation, we propose STCALIR, a framework for generating semi-synthetic test collections directly from raw legal documents. The pipeline follows the Cranfield paradigm, maintaining its core components of topics, corpus, and relevance judgments, while significantly reducing manual effort through automated multi-stage retrieval and filtering, achieving a 99% reduction in annotation workload. We validate STCALIR using the Mr. TyDi

View PDF HTML (experimental)

Abstract:Test collections are essential for evaluating retrieval and re-ranking models. However, constructing such collections is challenging due to the high cost of manual annotation, particularly in specialized domains like Algerian legal texts, where high-quality corpora and relevance judgments are scarce. To address this limitation, we propose STCALIR, a framework for generating semi-synthetic test collections directly from raw legal documents. The pipeline follows the Cranfield paradigm, maintaining its core components of topics, corpus, and relevance judgments, while significantly reducing manual effort through automated multi-stage retrieval and filtering, achieving a 99% reduction in annotation workload. We validate STCALIR using the Mr. TyDi benchmark, demonstrating that the resulting semi-synthetic relevance judgments yield retrieval effectiveness comparable to human-annotated evaluations (Hit@10 \approx 0.785). Furthermore, system-level rankings derived from these labels exhibit strong concordance with human-based evaluations, as measured by Kendall's {\tau} (0.89) and Spearman's \r{ho} (0.92). Overall, STCALIR offers a reproducible and cost-efficient solution for constructing reliable test collections in low-resource legal domains.

Subjects:

Information Retrieval (cs.IR)

Cite as: arXiv:2604.00731 [cs.IR]

(or arXiv:2604.00731v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2604.00731

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: M'hamed Amine Hatem [view email] [v1] Wed, 1 Apr 2026 10:50:28 UTC (4,222 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
STCALIR: Se…modelbenchmarkannouncevaluationlegalcomponentarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 211 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!