Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI agents are now playing Mafia (social deduction with humans)Hacker News AI TopLet's be Honest about AI CodingHacker News AI Toptrunk/bc68fe94fe043b4c8484129d229012735df224e1PyTorch ReleasesHow to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn WorkflowsMarkTechPostBillion dollar AI company was built on lies [video]Hacker News AI Toptrunk/08b65b957401b4df41e7d458d953f237e06eae9a: Remove stale Python comments (#179106)PyTorch ReleasesComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV CommunityOpenAI sees a new round of executive shake-upsBusiness Insider26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI agents are now playing Mafia (social deduction with humans)Hacker News AI TopLet's be Honest about AI CodingHacker News AI Toptrunk/bc68fe94fe043b4c8484129d229012735df224e1PyTorch ReleasesHow to Build Production-Ready Agentic Systems with Z.AI GLM-5 Using Thinking Mode, Tool Calling, Streaming, and Multi-Turn WorkflowsMarkTechPostBillion dollar AI company was built on lies [video]Hacker News AI Toptrunk/08b65b957401b4df41e7d458d953f237e06eae9a: Remove stale Python comments (#179106)PyTorch ReleasesComparing Today's Multi-Model DatabasesDEV CommunityBuilding a WeChat Mini Program Pre-Sale System from Scratch: A Builder's LogDEV CommunityOpenAI sees a new round of executive shake-upsBusiness Insider26 Quizzes: What We've Learned About Which Results People Actually ShareDEV CommunityLayered Agentic Retrieval for Retail Floor Questions: A Solo PoCDEV CommunityHow to Handle Sensitive Data Securely in TerraformDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

trunk/1781f63f19fdfb119cc58a9a9b6c6ef0650c7cc4: Codegen backward prologue subclass unwrapping (#178927)

PyTorch Releasesby pytorchApril 3, 20262 min read1 views
Source Quiz

Generate a codegen'd identity unwrap function for the backward prologue's non-tangent args (saved tensors, symints, opaque objects). In AOT dispatch, these are always plain tensors since the compiled forward operates on unwrapped inner tensors, so the codegen eliminates the per-element is_traceable_wrapper_subclass checks from runtime_unwrap_tensor_subclasses. In AOT dispatch, non-tangent backward args (saved tensors, symints, opaque objects) are always plain tensors since the compiled forward operates on unwrapped inner tensors. The codegen'd function is therefore an identity: def unwrap_fn(args): return list(args) This eliminates the per-element is_traceable_wrapper_subclass checks from runtime_unwrap_tensor_subclasses that ran on every backward call. Unwrap step in isolation (us/call):

Generate a codegen'd identity unwrap function for the backward prologue's non-tangent args (saved tensors, symints, opaque objects). In AOT dispatch, these are always plain tensors since the compiled forward operates on unwrapped inner tensors, so the codegen eliminates the per-element is_traceable_wrapper_subclass checks from runtime_unwrap_tensor_subclasses.

In AOT dispatch, non-tangent backward args (saved tensors, symints, opaque objects) are always plain tensors since the compiled forward operates on unwrapped inner tensors. The codegen'd function is therefore an identity:

def unwrap_fn(args): return list(args)

This eliminates the per-element is_traceable_wrapper_subclass checks from runtime_unwrap_tensor_subclasses that ran on every backward call.

Unwrap step in isolation (us/call):

CaseBefore (isinstance loop)After (codegen identity)Speedup
5 args0.78 us0.14 us5.5x
10 args1.43 us0.15 us9.5x
20 args2.70 us0.16 us16.9x
50 args6.61 us0.20 us33.1x

The codegen'd identity (list(args)) eliminates O(n) isinstance / is_traceable_wrapper_subclass checks per backward call. Speedup scales linearly with the number of non-tangent args since the old loop cost is per-element while the new path is O(1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/178927 Approved by: https://github.com/aorenste ghstack dependencies: #178675`

Assets 2

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

github

Knowledge Map

Knowledge Map
TopicsEntitiesSource
trunk/1781f…githubPyTorch Rel…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 174 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!