Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHow the Amazon Echo learned to talk — and listenThe Verge AI🔥 sponsors/atilaahmettanerGitHub Trending🔥 google-deepmind/gemmaGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 sponsors/badlogicGitHub TrendingEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV Community"Be Anything You Want" — OK, Here's How (Technically)DEV CommunityAI Automation for Data Analysts: 10 Workflows That Will Make You Irreplaceable in 2026Medium AII Started Learning AI-Assisted Development — And It Completely Changed How I Think About CodingDEV CommunityO que uma usina nuclear tem a ver com o seu processo de QA?DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHow the Amazon Echo learned to talk — and listenThe Verge AI🔥 sponsors/atilaahmettanerGitHub Trending🔥 google-deepmind/gemmaGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 sponsors/badlogicGitHub TrendingEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV Community"Be Anything You Want" — OK, Here's How (Technically)DEV CommunityAI Automation for Data Analysts: 10 Workflows That Will Make You Irreplaceable in 2026Medium AII Started Learning AI-Assisted Development — And It Completely Changed How I Think About CodingDEV CommunityO que uma usina nuclear tem a ver com o seu processo de QA?DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

trunk/f2ebf8ce7c86a061a181e8da9e5c0a6150955e0a: [xpu][fix] Fix DeviceOpOverrides registered incorrectly (#178959)

PyTorch Releasesby pytorchApril 3, 20262 min read1 views
Source Quiz

Motivation The current initialization logic for DeviceOpOverrides relies on checking whether device_op_overrides_dict is empty: DeviceOpOverrides: assert isinstance(device, str), type(device) if not device_op_overrides_dict: from . import ( # noqa: F401 # noqa: F401 cpu_device_op_overrides, mps_device_op_overrides, ) from .cuda import device_op_overrides # noqa: F401 from .mtia import device_op_overrides as mtia_op_overrides # noqa: F401 from .xpu import device_op_overrides as xpu_op_overrides # noqa: F401 if device not in device_op_overrides_dict: # For backends like TPU that only need no-op overrides (Pallas handles codegen) from .cpu_device_op_overrides import CpuDeviceOpOverrides register_device_op_overrides(device, CpuDeviceOpOverrides()) return device_op_overrides_dict[device]"> def

# Motivation The current initialization logic for DeviceOpOverrides relies on checking whether 
device_op_overrides_dict is empty:

python
def get_device_op_overrides(device: str) -> DeviceOpOverrides:
 assert isinstance(device, str), type(device)

 if not device_op_overrides_dict:
 from . import ( # noqa: F401 # noqa: F401
 cpu_device_op_overrides,
 mps_device_op_overrides,
 )
 from .cuda import device_op_overrides # noqa: F401
 from .mtia import device_op_overrides as mtia_op_overrides # noqa: F401
 from .xpu import device_op_overrides as xpu_op_overrides # noqa: F401

 if device not in device_op_overrides_dict:
 # For backends like TPU that only need no-op overrides (Pallas handles codegen)
 from .cpu_device_op_overrides import CpuDeviceOpOverrides

 register_device_op_overrides(device, CpuDeviceOpOverrides())

 return device_op_overrides_dict[device]

This approach is fragile because it assumes no overrides are registered prior to calling get_device_op_overrides. However, if register_device_op_overrides is invoked independently (e.g., in tests or other modules), the dictionary may become partially populated before full initialization occurs.

In such cases, the lazy initialization block is skipped, and some backends (e.g., XPU) never register their corresponding DeviceOpOverrides. As a result, the system silently falls back to CpuDeviceOpOverrides, leading to incorrect behavior.

This issue has already caused multiple failures in XPU CI, particularly in test_gpu_cpp_wrapper.py. For example, PR #175385 unintentionally registers CUDADeviceOpOverrides early, making device_op_overrides_dict non-empty and preventing XPU overrides from being registered.

The silent fallback to CPU overrides makes these issues difficult to detect and debug.

Solution

To make initialization robust and deterministic, introduce a dedicated flag _device_op_overrides_initialized to explicitly track whether all device overrides have been fully registered._

Additional Context

fix https://github.com/pytorch/pytorch/issues/178857 fix https://github.com/pytorch/pytorch/issues/178761 fix https://github.com/pytorch/pytorch/issues/178753 fix https://github.com/pytorch/pytorch/issues/178855, etc...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/178959 Approved by: https://github.com/jansel`

Assets 2

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

github

Knowledge Map

Knowledge Map
TopicsEntitiesSource
trunk/f2ebf…githubPyTorch Rel…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 161 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products