Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessDatabase Performance Issues in Production: Identifying and Resolving Masked Problems from Small-Scale TestingDEV CommunityEngineering Backpressure: Keeping AI-Generated Code Honest Across 10 SvelteKit ReposDEV CommunitySecuring Asgard: Why I Built a Card Game Suite for Docker SecurityDEV CommunityAgentic Engineering Journey — Brain DumpDEV CommunityStandardizing 'I Built' Posts: A Unified Tool and Narrative Framework for Efficient Project SharingDEV Community404: The Page That Gets WorseDEV CommunityThe Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply ChainDEV CommunityThe Security Scanner Was the Attack Vector — How Supply Chain Attacks Hit AI Agents DifferentlyDEV CommunityMCP: Programmatic Tool Calling (Code Mode) with OpenSandboxDEV CommunityDesigning a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous AgentsDEV CommunityBeware Even Small Amounts of WooLessWrongUtah Lets AI Chatbot Prescribe Psychiatric Meds Alone - techbuzz.aiGNews AI mental healthBlack Hat USADark ReadingBlack Hat AsiaAI BusinessDatabase Performance Issues in Production: Identifying and Resolving Masked Problems from Small-Scale TestingDEV CommunityEngineering Backpressure: Keeping AI-Generated Code Honest Across 10 SvelteKit ReposDEV CommunitySecuring Asgard: Why I Built a Card Game Suite for Docker SecurityDEV CommunityAgentic Engineering Journey — Brain DumpDEV CommunityStandardizing 'I Built' Posts: A Unified Tool and Narrative Framework for Efficient Project SharingDEV Community404: The Page That Gets WorseDEV CommunityThe Full-Stack Factory: How Digital Architectures are Re-Engineering the Textile Supply ChainDEV CommunityThe Security Scanner Was the Attack Vector — How Supply Chain Attacks Hit AI Agents DifferentlyDEV CommunityMCP: Programmatic Tool Calling (Code Mode) with OpenSandboxDEV CommunityDesigning a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous AgentsDEV CommunityBeware Even Small Amounts of WooLessWrongUtah Lets AI Chatbot Prescribe Psychiatric Meds Alone - techbuzz.aiGNews AI mental health
AI NEWS HUBbyEIGENVECTOREigenvector

Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music

arXiv cs.IRby Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei SongApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.28994v1 Announce Type: new Abstract: Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (You

Authors:Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei Song

View PDF

Abstract:Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We share offline and live experiment results and present findings evaluating different KD techniques in this setting across two ranking models on the music app. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of ranking models on low traffic surfaces.

Subjects:

Information Retrieval (cs.IR)

Cite as: arXiv:2603.28994 [cs.IR]

(or arXiv:2603.28994v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2603.28994

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nikhil Khani [view email] [v1] Mon, 30 Mar 2026 20:47:58 UTC (742 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Zero-shot C…modeltrainingannounceproductapplicationplatformarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 175 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products