Open Source AI llama release version global github ollama

b8603

llama.cpp Releasesby ggml-orgMarch 31, 20262 min read1 views

<details open=""> CANN: fix multi-thread set_tensor race conditions (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="4031773941" data-permission-text="Title is private" data-url="https://github.com/ggml-org/llama.cpp/issues/20151" data-hovercard-type="pull_request" data-hovercard-url="/ggml-org/llama.cpp/pull/20151/hovercard" href="https://github.com/ggml-org/llama.cpp/pull/20151">#20151</a>) <ul> <li>CANN: fix multi-thread set_tensor race conditions</li> </ul> When ollama calls ggml_backend_tensor_set from multiple threads (each writing a different chunk of the same tensor), the CANN backend had three concurrency issues: <ol> <li> Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform before uploading to device

CANN: fix multi-thread set_tensor race conditions (#20151)

CANN: fix multi-thread set_tensor race conditions

When ollama calls ggml_backend_tensor_set from multiple threads (each writing a different chunk of the same tensor), the CANN backend had three concurrency issues:

Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform before uploading to device. Per-chunk transforms produced corrupt data.
ND-to-NZ weight conversion requires complete tensor data on device. Per-chunk conversion operated on incomplete data.
The global g_nz_workspaces array had unprotected concurrent access.

Fix by introducing a TensorSetTracker that accumulates write progress per tensor. For quantized tensors, raw data is staged in a host buffer and the transform + upload is deferred until all chunks arrive. For NZ weights, chunks are uploaded directly but conversion is deferred. The tracker and its staging buffer are released immediately after post-processing completes.

Add per-device mutex to g_nz_workspaces to prevent data races.

CANN: fix L2_NORM ignoring eps parameter

The L2_NORM implementation was not using the eps parameter from op_params, causing incorrect results when eps is large (e.g. 10.0). The CPU reference computes scale = 1/fmaxf(norm, eps), so add a Clamp step to clamp the norm to at least eps before dividing.

ggml/cann: compare op_params for POOL_2D in ACL graph cache matching

When ACL graph mode is enabled, the graph LRU cache checks whether a cached graph matches the current computation graph. Previously, GGML_OP_POOL_2D was not included in the op_params comparison, so two POOL_2D nodes with different pooling parameters (kernel size, stride, padding) but identical tensor shapes and addresses could incorrectly reuse a cached graph, leading to wrong results or aclnn errors.

Add GGML_OP_POOL_2D to the list of ops that require op_params matching in ggml_graph_node_properties::has_matching_properties().

cann: fix ACL graph cache matching by adding tensor type and unconditional op_params comparison

The ACL graph LRU cache was incorrectly reusing cached graphs for operations with different tensor types or op_params, causing test failures for CPY (f16 vs bf16), POOL_2D, L2_NORM, NORM_MUL_ADD, RMS_NORM_MUL_ADD, and ADD_RMS_NORM.

Changes:

Add node_type and src_type[] fields to ggml_graph_node_properties so the cache can distinguish tensors with different types but identical ne/nb (e.g. f16 and bf16 both have 2-byte elements)
Compare op_params unconditionally for all ops instead of only for SCALE/UNARY/GLU/ROPE/POOL_2D

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8603

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamareleaseversion

ModelsLive

MLCommons Releases New MLPerf Inference v6.0 Benchmark Results - HPCwire

<a href="https://news.google.com/rss/articles/CBMisgFBVV95cUxOSFFkZHVvNWJCSHY1ZTQ4NWlLdUVjMVZsS0FVNHRpQy1SSEJ5ZWxZRk9yVnhrdjZyRnQxLTRkenlkS0hmMWh6YnRsNkJDT0NsdEM4RTM4Sm9fTGNVdG85Vm5pT0VRZzRaZmJxcVlzUHVCYTViWnMwaFJsendTaFhBa0VEM1R3TVZYNU5nS1BxVzE3cXVMT3dBcmdzZ2sxeC1OR2lPbWFkNEo0RnR6dER5UTFB?oc=5" target="_blank">MLCommons Releases New MLPerf Inference v6.0 Benchmark Results</a> HPCwire

Google News: LLM

1m19 minutes ago

ReleasesLive

Monorepo Architecture with pnpm Workspace, Turborepo & Changesets 📦

When you're developing a project with multiple packages, managing each one in its own repo can quickly turn into a nightmare. In this article, we'll set up a monorepo architecture from scratch using pnpm workspace, speed up build processes with Turborepo, and build an automated NPM publish pipeline with Changesets. <h2> 🏗️ What Is a Monorepo? </h2> Let's say you're building a design system. You have a core package, a theme package, and a utils package. Now imagine keeping all of these in separate repositories. When you fix a bug in the core package, what happens? You switch to the theme repo and update the dependency. Then you switch to the utils repo. You open separate PRs for each, wait for separate CI/

DEV Community

20m17 minutes ago

ProductsLive

I Built an OPA Plugin That Turns It Into an AuthZEN-Compatible PDP

<h1> Introduction </h1> In my <a href="https://dev.to/kanywst/authzen-authorization-api-10-deep-dive-the-standard-api-that-separates-authorization-decisions-1m2a">previous article</a>, I did a deep dive into the AuthZEN Authorization API 1.0 spec. It standardizes communication between PEPs and PDPs. You send a JSON request asking "can this subject do this action on this resource?" and get back <code>{"decision": true/false}</code>. So the spec makes sense. But how do you actually use OPA as an AuthZEN-compatible PDP? OPA already has a REST API (<code>POST /v1/data/...</code>), but it doesn't match the AuthZEN API. <ul> <li>Different path: AuthZEN uses <code>POST /access/v1/evaluation</code> </li> <li>Different request structure: OPA requires wrapping in <code>{"input":

DEV Community

8m17 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 181 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

I Built a Social Post Engine to Escape the Canva-Export-Schedule Loop

As a solo founder running WahResume.com, I was spending way too much time on social media - not on creativity, but on process. Same templates. Same brand assets. Same hashtags. Every post meant opening Canva, exporting, uploading, scheduling… and repeating it the next day. So I built something to fix that. Social Post Engine is a small tool that helps me stay consistent on social media without having to touch Canva or an endless queue of schedulers. Here’s what it does: ✅ Seed & review topics in one command — it researches, outlines, and preps your next posts. ✅ Pre-generates branded images from templates (checklists, stat cards, charts, comparisons). It also writes captions in your brand’s voice using AI. ✅ Publishes automatically to LinkedIn

DEV Community

2mabout 1 hour ago

Open Source AIFresh

Wedeo – a Rust Rewrite of FFmpeg

Article URL: https://github.com/sharifhsn/wedeo Comments URL: https://news.ycombinator.com/item?id=47601272 Points: 3 # Comments: 4

Hacker News Top

1mabout 3 hours ago

Open Source AIFresh

Escaping API Quotas: How I Built a Local 14B Multi-Agent Squad for 16GB VRAM (Qwen3.5 & DeepSeek-R1)

I was building a complex web app prototype using a cloud-based AI IDE. Just as I was getting into the flow, I hit the dreaded wall: "429 Too Many Requests". I was done dealing with subscription anxiety and 6-day quota limits. I wanted to offload the heavy cognitive work to my local machine. But there was a catch: my rig runs on an AMD Radeon RX 6800 with 16GB of VRAM. Here is how I bypassed the cloud limits and built a fully functional local multi-agent system without melting my GPU. <h3> The "Goldilocks" Zone: Why 14B? </h3> Running a multi-agent system locally is tricky when you have strict hardware limits. Through trial and error, I quickly realized: <ul> <li> 7B/8B models? They are fast, but too prone to ha

DEV Community

3mabout 3 hours ago

Open Source AIFresh

I'm 18 and Built an Open-Source Camera That Cryptographically Proves Photos Are Real

In 2026, generating a photorealistic fake image takes seconds. The C2PA standard (Adobe, Microsoft, Google) solves this with Content Credentials — but only on Samsung S25+ and Pixel 10. The other 3 billion Android phones have nothing. I'm 18, from Brazil, and I built <a href="https://github.com/YuriTheCoder/TrueShot" rel="noopener noreferrer">TrueShot</a> to change that. <h2> What happens when you take a photo </h2> <ol> <li> 14 physical sensors are sampled at the exact instant of the shutter — accelerometer, gyroscope, magnetometer, barometer, light, proximity, gravity, rotation vectors, and more</li> <li> SHA-256 hash is computed on the JPEG bytes up to the EOI marker</li> <li> ECDSA P-256 signs the manifest via hardware-ba

DEV Community

4mabout 3 hours ago