The Hidden Audio Bias Inside Audio-Visual Speech Recognition

Hackernoon AIby aimodels44April 5, 20266 min read1 views

Shapley analysis reveals why AVSR models keep trusting corrupted audio, exposing a hidden bias in multimodal speech recognition. Read All

New Story

The Hidden Audio Bias Inside Audio-Visual Speech Recognition

byaimodels44byaimodels44@aimodels44

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

SubscribeApril 5th, 2026

TLDR

Your browser does not support the audio element.Speed1xVoiceDr. One Ms. Hacker byaimodels44@aimodels44byaimodels44@aimodels44

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

byaimodels44@aimodels44

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

Subscribe← Previous

Matrix-Game-3.0 Brings Real-Time 720p Interactive Video to Open Source

About Author

aimodels44@aimodels44Subscribe

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

Read my storiesAbout @aimodels44

Comments

TOPICS

machine-learning#artificial-intelligence#software-architecture#data-science#performance#design#audio-visual-speech-ai#avsr-interpretability#speech-recognition-bias

THIS ARTICLE WAS FEATURED IN

Arweave

ViewBlock

Terminal

LiteAlso published hereXBskyMas

10 Languages, 9 Premium Voices: Meet Qwen3-TTS CustomVoice

aimodels44

Feb 11, 2026

#NOONIFICATION

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

Noonification

Jan 10, 2023

#NOONIFICATION

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

Noonification

Jan 11, 2024

#NOONIFICATION

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

Noonification

Nov 19, 2022

#AI

10 Ways AI Has Changed Our Lives

Bella Williams

Mar 04, 2020

#100-DAYS-OF-AI

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Nataraj

Jan 31, 2024

#AI

10 Languages, 9 Premium Voices: Meet Qwen3-TTS CustomVoice

aimodels44

Feb 11, 2026

#NOONIFICATION

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

Noonification

Jan 10, 2023

#NOONIFICATION

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

Noonification

Jan 11, 2024

#NOONIFICATION

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

Noonification

Nov 19, 2022

#AI

10 Ways AI Has Changed Our Lives

Bella Williams

Mar 04, 2020

#100-DAYS-OF-AI

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Nataraj

Jan 31, 2024

Original source

Hackernoon AI

https://hackernoon.com/the-hidden-audio-bias-inside-audio-visual-speech-recognition?source=rss

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelanalysismultimodal

ModelsFresh

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

The first workers displaced by generative AI weren't software engineers. They were translators and $1.32/hr data labelers. Philip K. Dick predicted why. Read All

Hackernoon AI

1mabout 2 hours ago

ModelsLive

Paper close reading: "Why Language Models Hallucinate"

People often talk about paper reading as a skill, but there aren’t that many examples of people walking through how they do it. Part of this is a problem of supply: it’s expensive to document one’s thought process for any significant length of time, and there’s the additional cost of probably looking quite foolish when doing so. Part of this is simply a question of demand: far more people will read a short paragraph or tweet thread summarizing a paper and offering some pithy comments, than a thousand-word post of someone’s train of thought as they look through a paper. Thankfully, I’m willing to risk looking a bit foolish, and I’m pretty unresponsive to demand at this present moment, so I’ll try and write down my thought processes as I read through as much of a a paper I can in 1-2 hours.

LessWrong AI

15m14 minutes ago

Open Source AILive

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started

OpenClaw is a powerful open-source AI agent, but self-hosting it is a pain. KiloClaw is OpenClaw fully hosted and managed by Kilo — sign up, connect your chat apps, and your agent is running in about a minute. No Docker, no YAML, no server babysitting. People are using it for personalized morning briefs, inbox digests, auto-building CRMs, browser automation, GitHub triage, and more. Hosting is $8/month with a 7-day free trial, inference runs through Kilo Gateway at zero markup across 500+ models, and it's free for open-source maintainers. Read All

Hackernoon AI

1m19 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 219 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

The first workers displaced by generative AI weren't software engineers. They were translators and $1.32/hr data labelers. Philip K. Dick predicted why. Read All

Hackernoon AI

1mabout 2 hours ago

ModelsLive

Paper close reading: "Why Language Models Hallucinate"

LessWrong AI

15m14 minutes ago

ModelsLive

Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake

I wanted to know which type of quant is the best on this laptop (Intel 258V - iGPU 140V 18GB), so I tested all these small quants hoping that it generalizes to bigger models: Winners in bold (KLD≤0.01) Uploader Quant tk/s KLD GB KLD/GB* mradermacher* Q4_0 28.97 0.052659918 2.37 0.04593 mradermacher_i1 Q4_0 28.89 0.059171561 2.37 0.05162 mradermacher_i1 IQ3_XXS 28.59 0.177140713 1.77 0.20736 Unsloth UD-IQ2_XXS 28.47 0.573673327 1.42 0.83747 Unsloth Q4_0 28.3 0.053431218 2.41 0.04583 Bartowski Q4_0 28.28 0.049796789 2.45 0.04200 mradermacher Q4_K_S 27.74 0.050305722 2.39 0.04350 Unsloth Q4_K_S 27.29 0.028402815 2.41 0.02429 Unsloth UD-IQ3_XXS 27.03 0.146879419 1.82 0.16718 mradermacher Q2_K 26.98 0.858648176 1.78 1.00000 mradermacher_i1 Q4_K_M 25.95 0.026540567 2.52 0.02169 mradermacher_i1 I

Reddit r/LocalLLaMA

3m38 minutes ago

ModelsFresh

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning

arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method

arXiv cs.RO

1mabout 3 hours ago

The Hidden Audio Bias Inside Audio-Visual Speech Recognition

The Hidden Audio Bias Inside Audio-Visual Speech Recognition

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Languages, 9 Premium Voices: Meet Qwen3-TTS CustomVoice

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Languages, 9 Premium Voices: Meet Qwen3-TTS CustomVoice

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Daily AI Digest

More about

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

Paper close reading: "Why Language Models Hallucinate"

OpenClaw Changed How We Use AI. KiloClaw Made It Effortless to Get Started

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

Paper close reading: "Why Language Models Hallucinate"

Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning