Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessFrom False Positives to Real Risk: AI‑Driven Compliance in Modern UC - UC TodayGoogle News: Generative AIClaude Code Leak: What Went Wrong at Anthropic? - AI MagazineGoogle News: ClaudeU.S. Reportedly Seeking Access To Three Additional Bases In Greenland, The First Expansion In DecadesInternational Business TimesAnthropic's Claude Code source code got accidentally leaked - qz.comGoogle News: ClaudeAI’s Biggest Opportunity Lies in the 92% of Work It Hasn’t Touched - PYMNTS.comGoogle News: AIWhy is gaming becoming so expensive? The answer is found in AI - The GuardianGoogle News: AIChoosing the Right Model is Hard. Maintaining Accuracy is Harder.AI YouTube Channel 24A YouTuber channeled his distaste for the PS5’s design into slick console coversThe Verge AILess than a month: StrictlyVC San Francisco brings leaders from TDK Ventures, Replit, and more togetherTechCrunch AIThe Strange, Shaky Alliance Taking on Trump and His Big Tech Friends - PoliticoGoogle News: AI SafetyI Asked ChatGPT If It Was A Psychopath—Here’s What It Said - ForbesGoogle News: ChatGPTGoogle’s TurboQuant Marks A Turning Point In AI’s Evolution - ForbesGoogle News: LLMBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessFrom False Positives to Real Risk: AI‑Driven Compliance in Modern UC - UC TodayGoogle News: Generative AIClaude Code Leak: What Went Wrong at Anthropic? - AI MagazineGoogle News: ClaudeU.S. Reportedly Seeking Access To Three Additional Bases In Greenland, The First Expansion In DecadesInternational Business TimesAnthropic's Claude Code source code got accidentally leaked - qz.comGoogle News: ClaudeAI’s Biggest Opportunity Lies in the 92% of Work It Hasn’t Touched - PYMNTS.comGoogle News: AIWhy is gaming becoming so expensive? The answer is found in AI - The GuardianGoogle News: AIChoosing the Right Model is Hard. Maintaining Accuracy is Harder.AI YouTube Channel 24A YouTuber channeled his distaste for the PS5’s design into slick console coversThe Verge AILess than a month: StrictlyVC San Francisco brings leaders from TDK Ventures, Replit, and more togetherTechCrunch AIThe Strange, Shaky Alliance Taking on Trump and His Big Tech Friends - PoliticoGoogle News: AI SafetyI Asked ChatGPT If It Was A Psychopath—Here’s What It Said - ForbesGoogle News: ChatGPTGoogle’s TurboQuant Marks A Turning Point In AI’s Evolution - ForbesGoogle News: LLM

Why not to use Cosine Similarity between Label Representations

arXiv cs.LGby Beatrix M. G. NielsenApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29488v1 Announce Type: new Abstract: Cosine similarity is often used to measure the similarity of vectors. These vectors might be the representations of neural network models. However, it is not guaranteed that cosine similarity of model representations will tell us anything about model behaviour. In this paper we show that when using a softmax classifier, be it an image classifier or an autoregressive language model, measuring the cosine similarity between label representations (called unembeddings in the paper) does not give any information on the probabilities assigned by the model. Specifically, we prove that for any softmax classifier model, given two label representations, it is possible to make another model which gives the same probabilities for all labels and inputs, bu

View PDF HTML (experimental)

Abstract:Cosine similarity is often used to measure the similarity of vectors. These vectors might be the representations of neural network models. However, it is not guaranteed that cosine similarity of model representations will tell us anything about model behaviour. In this paper we show that when using a softmax classifier, be it an image classifier or an autoregressive language model, measuring the cosine similarity between label representations (called unembeddings in the paper) does not give any information on the probabilities assigned by the model. Specifically, we prove that for any softmax classifier model, given two label representations, it is possible to make another model which gives the same probabilities for all labels and inputs, but where the cosine similarity between the representations is now either 1 or -1. We give specific examples of models with very high or low cosine simlarity between representations and show how to we can make equivalent models where the cosine similarity is now -1 or 1. This translation ambiguity can be fixed by centering the label representations, however, labels with representations with low cosine similarity can still have high probability for the same inputs. Fixing the length of the representations still does not give a guarantee that high(or low) cosine similarity will give high(or low) probability to the labels for the same inputs. This means that when working with softmax classifiers, cosine similarity values between label representations should not be used to explain model probabilities.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.29488 [cs.LG]

(or arXiv:2603.29488v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.29488

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Beatrix Miranda Ginn Nielsen [view email] [v1] Tue, 31 Mar 2026 09:33:12 UTC (177 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelneural network

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Why not to …modellanguage mo…neural netw…announcepaperarxivarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 215 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models