trunk/9589e5796da98dfff1519ebb0cd5be9794cf7302: Fix int64 indexing with >65k M/N size (#172925)
Summary Fixes: #171389 Interesting one; two bugs one for not using index dtype and another where M*N overflows and the the early return kicks us out before doing any work Pull Request resolved: #172925 Approved by: https://github.com/eellison
# Summary
Fixes: https://github.com/pytorch/pytorch/issues/171389
Interesting one; two bugs one for not using index dtype and another where MN overflows and the the early return kicks us out before doing any work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/172925 Approved by: https://github.com/eellison`
Assets 2
PyTorch Releases
https://github.com/pytorch/pytorch/releases/tag/trunk%2F9589e5796da98dfff1519ebb0cd5be9794cf7302Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
github
Copilot cloud agent signs its commits
Copilot cloud agent now signs every commit it makes. Signed commits appear as Verified on GitHub, giving you confidence that the commit was genuinely made by the agent and hasn t The post Copilot cloud agent signs its commits appeared first on The GitHub Blog .

VRAM optimization for gemma 4
TLDR: add -np 1 to your llama.cpp launch command if you are the only user, cuts SWA cache VRAM by 3x instantly So I was messing around with Gemma 4 and noticed the dense model hogs a massive chunk of VRAM before you even start generating anything. If you are on 16GB you might be hitting OOM and wondering why. The culprit is the SWA (Sliding Window Attention) KV cache. It allocates in F16 and does not get quantized like the rest of the KV cache. A couple days ago ggerganov merged a PR that accidentally made this worse by keeping the SWA portion unquantized even when you have KV cache quantization enabled. It got reverted about 2 hours later here https://github.com/ggml-org/llama.cpp/pull/21332 so make sure you are on a recent build. A few things that actually help with VRAM: The SWA cache s
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Zhipu AI s GLM-5V-Turbo turns design mockups directly into executable front-end code
Chinese AI startup Zhipu AI has released GLM-5V-Turbo, a multimodal model that processes images, video, and text and is designed for use in agent workflows. The article Zhipu AI s GLM-5V-Turbo turns design mockups directly into executable front-end code appeared first on The Decoder .

OpenAI decides the best way to fight critical AI coverage is to own a newsroom
OpenAI has acquired tech talk show TBPN. The show will supposedly remain editorially independent but report to OpenAI's communications department. That's as contradictory as it sounds. So what's OpenAI really after? The article OpenAI decides the best way to fight critical AI coverage is to own a newsroom appeared first on The Decoder .
Claude Code and Cowork now let Anthropic s AI take control of your Mac or Windows desktop
Anthropic has announced a new feature for its AI assistant Claude: the ability to directly operate a user's computer, handling tasks people would normally do themselves at their desk. The article Claude Code and Cowork now let Anthropic s AI take control of your Mac or Windows desktop appeared first on The Decoder .


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!