Releases model release announce available valuation arxiv

The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation

arXiv cs.CLby Pavel Braslavski, Dmitrii Iarosh, Nikita Sushko, Andrey Sakhovskiy, Vasily Konovalov, Elena Tutubalina, Alexander PanchenkoApril 2, 20261 min read0 views

Source Quiz

arXiv:2604.00019v1 Announce Type: new Abstract: We present a configurable pipeline for generating multilingual sets of entities with specified characteristics, such as domain, geographical location and popularity, using data from Wikipedia and Wikidata. These datasets are intended for evaluating the factuality of LLMs' long-form generation, thereby complementing evaluation based on short-form QA datasets. We present the RiDiC dataset as an example of this approach. RiDiC contains 3,000 entities from three domains -- rivers, natural disasters, and car models -- spanning different popularity tiers. Each entity is accompanied by its geographical location, English and Chinese names (if available) and relevant English and Chinese Wikipedia content, which is used to evaluate LLMs' responses. Gen

View PDF HTML (experimental)

Abstract:We present a configurable pipeline for generating multilingual sets of entities with specified characteristics, such as domain, geographical location and popularity, using data from Wikipedia and Wikidata. These datasets are intended for evaluating the factuality of LLMs' long-form generation, thereby complementing evaluation based on short-form QA datasets. We present the RiDiC dataset as an example of this approach. RiDiC contains 3,000 entities from three domains -- rivers, natural disasters, and car models -- spanning different popularity tiers. Each entity is accompanied by its geographical location, English and Chinese names (if available) and relevant English and Chinese Wikipedia content, which is used to evaluate LLMs' responses. Generations about RiDiC entities were obtained from three LLMs in English and Chinese. These were then evaluated using a third-party factuality checker, which showed that entities from our dataset caused even frontier models to hallucinate. To facilitate the evaluation of LLMs' long-form factuality in multiple languages, the code, data, and generation/evaluation scripts have been released.

Comments: Accepted to LREC 2026

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2604.00019 [cs.CL]

(or arXiv:2604.00019v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.00019

arXiv-issued DOI via DataCite

Submission history

From: Andrey Sakhovskiy [view email] [v1] Wed, 11 Mar 2026 01:02:55 UTC (1,200 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.00019

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelreleaseannounce

ProductsLive

AI Agent Tools for Small Business Owners: A Practical Guide

The AI landscape is overwhelming. Hundreds of tools, new launches every week, and most of them are designed for enterprise teams with dedicated engineering staff. If you're a small business owner — running a service company, an e-commerce shop, or a solo consulting practice — you need tools that actually work without a full-time developer to maintain them. Here's a practical breakdown of the AI agent tools that matter most for small business operations in 2026, focused on what's real and useful today. What AI Agents Actually Do for Small Businesses Forget the hype about artificial general intelligence. For small businesses, AI agents solve three specific problems: They monitor things you can't watch 24/7 — revenue, inventory, customer messages, system health They handle repetitive tasks on

Dev.to AI

5m14 minutes ago

ProductsLive

Your AI Chatbot Isn't Stupid. It Just Has No Memory. Here's How We Fixed That.

I had a moment in a session a few weeks ago that I haven't stopped thinking about. Someone asked an AI chatbot what their company's refund policy was. The bot answered confidently, fluently, with zero hesitation. It was also completely wrong. It had invented a policy — 14 days, original packaging, contact support@ — from thin air, because it had never actually seen the company's documentation. It wasn't broken. It was doing exactly what it was designed to do: predict the most plausible-sounding next word. And "most plausible" and "accurate" are not the same thing. That's the dirty secret of LLMs fresh out of training. They're brilliant at sounding right. They're not inherently good at being right — especially about things that aren't in their training data. The fix has a name: RAG. Retriev

Dev.to AI

7m11 minutes ago

Analyst NewsLive

The AI-Powered Agency: A Developer Playbook for Selling AI Services in 2026

A freelance brand designer I follow on X shared her numbers last month. In 2024, she was serving three to four clients at a time, billing around $150K per year. In 2025, she added AI to her workflow, not as a gimmick but as actual production infrastructure. She now serves fifteen to twenty concurrent clients, her annual revenue hit $720K, and she works fewer hours than before. She did not build a SaaS product. She did not raise money. She did not hire a team. She just got very good at using AI tools to deliver the same quality of work in a fraction of the time, and charged based on the value of the output rather than the hours it took. This is the model Y Combinator highlighted in their Spring 2026 Request for Startups. Their advice was blunt: instead of selling access to an AI tool for $5

Dev.to AI

15m9 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 223 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesLive

I Built 25 Cloudflare Workers APIs — Here's What I Learned

Over the past few months, I built and deployed 25 APIs on Cloudflare Workers . All running on the free tier. Total monthly hosting cost: $0 . Here's what I learned about building, deploying, and monetizing utility APIs at scale. The Stack Every API follows the same pattern: worker-api/ ├── src/ │ └── index.js # Single entry point ├── wrangler.toml # Cloudflare config └── package.json No frameworks. No bundlers. Just vanilla JavaScript on Cloudflare Workers. The 25 APIs Here's a sampling of what I built: API Purpose Complexity Readability Score Text analysis (Flesch-Kincaid, SMOG, ARI) Medium QR Code Generator Generate QR codes from text/URL Low Password Generator Cryptographically secure passwords Low Markdown to HTML CommonMark-compliant conversion Medium Color Converter HEX/RGB/HSL conve

DEV Community

3m28 minutes ago

ReleasesLive

Image Optimisation Strategies for Better LCP Scores

On many marketing and product pages, Largest Contentful Paint (LCP) is not abstract. It is a hero photograph, a product shot, or a full-width banner. The metric tracks when that largest visible element finishes rendering; if the element is an image, your optimisation work is mostly bytes, dimensions, and discovery order —not another round of “general speed tips”. This guide assumes you already know what LCP measures. If you need the full picture first, read What Are Core Web Vitals? A Practical Guide for 2026 and LCP, INP, CLS: What Each Core Web Vital Means and How to Fix It . Here we go deep on image-specific strategies that move LCP toward the “good” band (≤ 2.5 seconds in the field), and how to pair them with performance budgets so improvements stick. Start by identifying the real LCP

DEV Community

9m26 minutes ago

ReleasesRecent

v0.88.0

0.88.0 (2026-04-01) Full Changelog: v0.87.0...v0.88.0 Features api: add structured stop_details to message responses ( fd82d6b ) bedrock api key auth ( #1623 ) ( a95a3fc ) prepare aws package ( #1615 ) ( 6875fab ) Chores tests: bump steady to v0.20.2 ( 1bc4e9f )

Anthropic SDK Releases

1mabout 12 hours ago

ReleasesLive

Reply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation - Press Release Hub

Reply Signs Strategic Collaboration Agreement with AWS to Accelerate AI-Driven Cloud Transformation Press Release Hub

Google News: Generative AI

1m24 minutes ago