Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music
arXiv:2603.28994v1 Announce Type: new Abstract: Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (You
Authors:Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei Song
View PDF
Abstract:Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We share offline and live experiment results and present findings evaluating different KD techniques in this setting across two ranking models on the music app. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of ranking models on low traffic surfaces.
Subjects:
Information Retrieval (cs.IR)
Cite as: arXiv:2603.28994 [cs.IR]
(or arXiv:2603.28994v1 [cs.IR] for this version)
https://doi.org/10.48550/arXiv.2603.28994
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Nikhil Khani [view email] [v1] Mon, 30 Mar 2026 20:47:58 UTC (742 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannounce
Designing a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous Agents
How do 20+ AI agents talk to each other? A lightweight message bus design and lessons from real-world operation. The Problem: How Do Agents Communicate? When you have a single AI assistant, communication isn't a problem. But when you scale to 10+ agents distributed across multiple servers, a fundamental challenge emerges: how do agents communicate with each other? Our environment runs 20+ agents spread across 9 nodes, each responsible for different domains. They frequently need to: Delegate tasks : A manager agent assigns sub-tasks to specialist agents Sync state : An agent notifies others after completing a task Request information : Agent A queries knowledge held by Agent B Broadcast : System-wide announcements Why Not Use an Off-the-Shelf Message Queue? RabbitMQ, Redis Pub/Sub, or NATS

The Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply Chain
In the world of software development, we obsess over latency, vertical scaling, and the elimination of technical debt. We build CI/CD pipelines to ensure that code moves from a developer’s IDE to a production server with zero friction. But what happens when the "production environment" isn't a cloud server, but a physical manufacturing floor? The global textile industry is currently undergoing its most significant "version update" in a century. For decades, the industry operated on a fragmented, "monolithic" architecture—slow, prone to bugs (defects), and incredibly difficult to scale ethically. Today, a new breed of FashionTech is emerging, treating the supply chain as a programmable stack. This article explores the technical transition from fragmented outsourcing to Vertical Integration

The Security Scanner Was the Attack Vector — How Supply Chain Attacks Hit AI Agents Differently
In March 2026, TeamPCP compromised Trivy — the vulnerability scanner used by thousands of CI/CD pipelines. Through that foothold, they trojaned LiteLLM, the library that connects AI agents to their model providers. SentinelOne then observed Claude Code autonomously installing the poisoned version without human review. The security scanner was the attack vector. The guard was the thief. This is not a hypothetical scenario. This happened. And it exposed something that the traditional supply chain security conversation completely misses when agents are involved. The Chain Trivy compromised (CVE-2026-33634, CVSS 9.4) ↓ LiteLLM trojaned (versions 1.82.7-1.82.8 on PyPI) ↓ Claude Code auto-installs the poisoned version ↓ Credentials harvested from 1000+ cloud environments Each component functione
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian
Persistent AI memory without embeddings, Pinecone, or a PhD in similarity search. The post I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian appeared first on Towards Data Science .

Designing a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous Agents
How do 20+ AI agents talk to each other? A lightweight message bus design and lessons from real-world operation. The Problem: How Do Agents Communicate? When you have a single AI assistant, communication isn't a problem. But when you scale to 10+ agents distributed across multiple servers, a fundamental challenge emerges: how do agents communicate with each other? Our environment runs 20+ agents spread across 9 nodes, each responsible for different domains. They frequently need to: Delegate tasks : A manager agent assigns sub-tasks to specialist agents Sync state : An agent notifies others after completing a task Request information : Agent A queries knowledge held by Agent B Broadcast : System-wide announcements Why Not Use an Off-the-Shelf Message Queue? RabbitMQ, Redis Pub/Sub, or NATS

The Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply Chain
In the world of software development, we obsess over latency, vertical scaling, and the elimination of technical debt. We build CI/CD pipelines to ensure that code moves from a developer’s IDE to a production server with zero friction. But what happens when the "production environment" isn't a cloud server, but a physical manufacturing floor? The global textile industry is currently undergoing its most significant "version update" in a century. For decades, the industry operated on a fragmented, "monolithic" architecture—slow, prone to bugs (defects), and incredibly difficult to scale ethically. Today, a new breed of FashionTech is emerging, treating the supply chain as a programmable stack. This article explores the technical transition from fragmented outsourcing to Vertical Integration



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!