Products model training announce product application platform

Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music

arXiv cs.IRby Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei SongApril 1, 20261 min read0 views

Source Quiz

arXiv:2603.28994v1 Announce Type: new Abstract: Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (You

Authors:Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei Song

View PDF

Abstract:Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We share offline and live experiment results and present findings evaluating different KD techniques in this setting across two ranking models on the music app. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of ranking models on low traffic surfaces.

Subjects:

Information Retrieval (cs.IR)

Cite as: arXiv:2603.28994 [cs.IR]

(or arXiv:2603.28994v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2603.28994

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nikhil Khani [view email] [v1] Mon, 30 Mar 2026 20:47:58 UTC (742 KB)

Original source

arXiv cs.IR

https://arxiv.org/abs/2603.28994

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingannounce

ProductsLive

Designing a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous Agents

How do 20+ AI agents talk to each other? A lightweight message bus design and lessons from real-world operation. The Problem: How Do Agents Communicate? When you have a single AI assistant, communication isn't a problem. But when you scale to 10+ agents distributed across multiple servers, a fundamental challenge emerges: how do agents communicate with each other? Our environment runs 20+ agents spread across 9 nodes, each responsible for different domains. They frequently need to: Delegate tasks : A manager agent assigns sub-tasks to specialist agents Sync state : An agent notifies others after completing a task Request information : Agent A queries knowledge held by Agent B Broadcast : System-wide announcements Why Not Use an Off-the-Shelf Message Queue? RabbitMQ, Redis Pub/Sub, or NATS

DEV Community

3m21 minutes ago

ProductsLive

The Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply Chain

In the world of software development, we obsess over latency, vertical scaling, and the elimination of technical debt. We build CI/CD pipelines to ensure that code moves from a developer’s IDE to a production server with zero friction. But what happens when the "production environment" isn't a cloud server, but a physical manufacturing floor? The global textile industry is currently undergoing its most significant "version update" in a century. For decades, the industry operated on a fragmented, "monolithic" architecture—slow, prone to bugs (defects), and incredibly difficult to scale ethically. Today, a new breed of FashionTech is emerging, treating the supply chain as a programmable stack. This article explores the technical transition from fragmented outsourcing to Vertical Integration

DEV Community

6m16 minutes ago

ReleasesLive

The Security Scanner Was the Attack Vector — How Supply Chain Attacks Hit AI Agents Differently

In March 2026, TeamPCP compromised Trivy — the vulnerability scanner used by thousands of CI/CD pipelines. Through that foothold, they trojaned LiteLLM, the library that connects AI agents to their model providers. SentinelOne then observed Claude Code autonomously installing the poisoned version without human review. The security scanner was the attack vector. The guard was the thief. This is not a hypothetical scenario. This happened. And it exposed something that the traditional supply chain security conversation completely misses when agents are involved. The Chain Trivy compromised (CVE-2026-33634, CVSS 9.4) ↓ LiteLLM trojaned (versions 1.82.7-1.82.8 on PyPI) ↓ Claude Code auto-installs the poisoned version ↓ Credentials harvested from 1000+ cloud environments Each component functione

DEV Community

6m19 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 175 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsLive

ReAct Agents in 2026: Build a Real-World Research Agent with LangGraph

Reason → Act → Observe with practical tool calling, source grounding, and stop controls. Continue reading on Towards AI »

Towards AI

1m35 minutes ago

ProductsLive

I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian

Persistent AI memory without embeddings, Pinecone, or a PhD in similarity search. The post I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian appeared first on Towards Data Science .

Towards Data Science

13mabout 1 hour ago

ProductsLive

Designing a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous Agents

DEV Community

3m21 minutes ago

ProductsLive

The Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply Chain

DEV Community

6m16 minutes ago