Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessExplained: The Source Code Leak that hit AI Giant Anthropic - Cyber MagazineGoogle News: ClaudeDespite Skepticism, Survey Shows Widespread AI Use at Cal State - Inside Higher EdGoogle News: ChatGPTCovalo raises €3.5M to become the shared data infrastructure for an industry where 80% of products will need reformulating by 2030The Next Web NeuralBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AITrust drives Korea’s generative AI adoption; usability and interaction sustain use - CHOSUNBIZ - ChosunbizGoogle News: Generative AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's GuideGoogle News: ChatGPTBlack Hat USADark ReadingBlack Hat AsiaAI BusinessExplained: The Source Code Leak that hit AI Giant Anthropic - Cyber MagazineGoogle News: ClaudeDespite Skepticism, Survey Shows Widespread AI Use at Cal State - Inside Higher EdGoogle News: ChatGPTCovalo raises €3.5M to become the shared data infrastructure for an industry where 80% of products will need reformulating by 2030The Next Web NeuralBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AITrust drives Korea’s generative AI adoption; usability and interaction sustain use - CHOSUNBIZ - ChosunbizGoogle News: Generative AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityChatGPT vs. Claude: 7 real-life benchmarks that crown the 2026 AI Madness Champion - Tom's GuideGoogle News: ChatGPT

Introducing MELI: the Mandarin-English Language Interview Corpus

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27043v1 Announce Type: new Abstract: We introduce the Mandarin-English Language Interview (MELI) Corpus, an open-source resource of 29.8 hours of speech from 51 Mandarin-English bilingual speakers. MELI combines matched sessions in Mandarin and English with two speaking styles: read sentences and spontaneous interviews about language varieties, standardness, and learning experiences. Audio was recorded at 44.1 kHz (16-bit, stereo). Interviews were fully transcribed, force-aligned at word and phone levels, and anonymized. Descriptively, the Mandarin component totals ~14.7 hours (mean — Suyuan Liu, Molly Babel

View PDF HTML (experimental)

Abstract:We introduce the Mandarin-English Language Interview (MELI) Corpus, an open-source resource of 29.8 hours of speech from 51 Mandarin-English bilingual speakers. MELI combines matched sessions in Mandarin and English with two speaking styles: read sentences and spontaneous interviews about language varieties, standardness, and learning experiences. Audio was recorded at 44.1 kHz (16-bit, stereo). Interviews were fully transcribed, force-aligned at word and phone levels, and anonymized. Descriptively, the Mandarin component totals ~14.7 hours (mean duration 17.3 minutes) and the English component ~15.1 hours (mean duration 17.8 minutes). We report token/type statistics for each language and document code-switching patterns (frequent in Mandarin sessions; more limited in English sessions). The corpus design supports within-/cross-speaker, within/cross-language acoustic comparison and links acoustics to speakers' stated language attitudes, enabling both quantitative and qualitative analyses. The MELI Corpus will be released with transcriptions, alignments, metadata, scans of labelled maps and documentation under a CC BY-NC 4.0 license.

Comments: Accepted at LREC 2026 (14th International Conference on Language Resources and Evaluation), to appear in the conference proceedings

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.27043 [cs.CL]

(or arXiv:2603.27043v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.27043

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Suyuan Liu [view email] [v1] Fri, 27 Mar 2026 23:15:30 UTC (1,897 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Introducing…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers