Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business"Final Year Student? Here's Exactly What You Need to Get a Dev Job in 2026"DEV CommunityHow I Launched 14 SaaS Products in 6 Months as a Solo Founder Using LovableDEV CommunityFDB Just Launched the First MCP Server for Medication DecisionsDEV CommunityAnthropic accidentally leaked thousands of lines of code - Los Angeles TimesGoogle News: AIClaude Code Unpacked: A Visual GuideDEV CommunityGoogle Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild - the-decoder.comGoogle News: DeepMindThe Shokz OpenRun Pro 2 are now at their lowest price in monthsThe Verge3 Lines of Code Saved Anthropic 250K API Calls Per DayDEV CommunityClaude Knows When You're Mad — And Uses Regex, Not AIDEV CommunityInside Claude Code: 12 Hidden Features Anthropic Didn't Want You to SeeDEV CommunityCameo partners with TikTok to boost popularityTechCrunch AI🔐 AES-256 Finally Makes Sense (And It’s Way Simpler Than You Think)DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI Business"Final Year Student? Here's Exactly What You Need to Get a Dev Job in 2026"DEV CommunityHow I Launched 14 SaaS Products in 6 Months as a Solo Founder Using LovableDEV CommunityFDB Just Launched the First MCP Server for Medication DecisionsDEV CommunityAnthropic accidentally leaked thousands of lines of code - Los Angeles TimesGoogle News: AIClaude Code Unpacked: A Visual GuideDEV CommunityGoogle Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild - the-decoder.comGoogle News: DeepMindThe Shokz OpenRun Pro 2 are now at their lowest price in monthsThe Verge3 Lines of Code Saved Anthropic 250K API Calls Per DayDEV CommunityClaude Knows When You're Mad — And Uses Regex, Not AIDEV CommunityInside Claude Code: 12 Hidden Features Anthropic Didn't Want You to SeeDEV CommunityCameo partners with TikTok to boost popularityTechCrunch AI🔐 AES-256 Finally Makes Sense (And It’s Way Simpler Than You Think)DEV Community

b8603

llama.cpp Releasesby ggml-orgMarch 31, 20262 min read1 views
Source Quiz

<details open=""> <p>CANN: fix multi-thread set_tensor race conditions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4031773941" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/20151" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/20151/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/20151">#20151</a>)</p> <ul> <li>CANN: fix multi-thread set_tensor race conditions</li> </ul> <p>When ollama calls ggml_backend_tensor_set from multiple threads (each<br> writing a different chunk of the same tensor), the CANN backend had<br> three concurrency issues:</p> <ol> <li> <p>Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform<br> before uploading to device

CANN: fix multi-thread set_tensor race conditions (#20151)

  • CANN: fix multi-thread set_tensor race conditions

When ollama calls ggml_backend_tensor_set from multiple threads (each writing a different chunk of the same tensor), the CANN backend had three concurrency issues:

  • Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform before uploading to device. Per-chunk transforms produced corrupt data.

  • ND-to-NZ weight conversion requires complete tensor data on device. Per-chunk conversion operated on incomplete data.

  • The global g_nz_workspaces array had unprotected concurrent access.

Fix by introducing a TensorSetTracker that accumulates write progress per tensor. For quantized tensors, raw data is staged in a host buffer and the transform + upload is deferred until all chunks arrive. For NZ weights, chunks are uploaded directly but conversion is deferred. The tracker and its staging buffer are released immediately after post-processing completes.

Add per-device mutex to g_nz_workspaces to prevent data races.

  • CANN: fix L2_NORM ignoring eps parameter

The L2_NORM implementation was not using the eps parameter from op_params, causing incorrect results when eps is large (e.g. 10.0). The CPU reference computes scale = 1/fmaxf(norm, eps), so add a Clamp step to clamp the norm to at least eps before dividing.

  • ggml/cann: compare op_params for POOL_2D in ACL graph cache matching

When ACL graph mode is enabled, the graph LRU cache checks whether a cached graph matches the current computation graph. Previously, GGML_OP_POOL_2D was not included in the op_params comparison, so two POOL_2D nodes with different pooling parameters (kernel size, stride, padding) but identical tensor shapes and addresses could incorrectly reuse a cached graph, leading to wrong results or aclnn errors.

Add GGML_OP_POOL_2D to the list of ops that require op_params matching in ggml_graph_node_properties::has_matching_properties().

  • cann: fix ACL graph cache matching by adding tensor type and unconditional op_params comparison

The ACL graph LRU cache was incorrectly reusing cached graphs for operations with different tensor types or op_params, causing test failures for CPY (f16 vs bf16), POOL_2D, L2_NORM, NORM_MUL_ADD, RMS_NORM_MUL_ADD, and ADD_RMS_NORM.

Changes:

  • Add node_type and src_type[] fields to ggml_graph_node_properties so the cache can distinguish tensors with different types but identical ne/nb (e.g. f16 and bf16 both have 2-byte elements)

  • Compare op_params unconditionally for all ops instead of only for SCALE/UNARY/GLU/ROPE/POOL_2D

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamareleaseversion

Knowledge Map

Knowledge Map
TopicsEntitiesSource
b8603llamareleaseversionglobalgithubollamallama.cpp R…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 181 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI