b8629

llama.cpp Releasesby github-actions[bot]April 2, 20261 min read0 views

sycl : fix llama_kv_cache hang when kv_cache is huge: 5GB ( #21283 ) macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8629

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llama

Open Source AILive

Gemma 4 E4B + E2B Uncensored (Aggressive) — GGUF + K_P Quants (Multimodal: Vision, Video, Audio)

My first Gemma 4 uncensors are out. Two models dropping today, the E4B (4B) and E2B (2B). Both Aggressive variants, both fully multimodal. Aggressive means no refusals. I don't do any personality changes or alterations. The ORIGINAL Google release, just uncensored. Gemma 4 E4B (4B): https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive Gemma 4 E2B (2B): https://huggingface.co/HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive 0/465 refusals * on both. Fully unlocked with zero capability loss. These are natively multimodal so text, image, video, and audio all in one model. The mmproj file is included for vision/audio support. What's included: E4B: Q8_K_P, Q6_K_P, Q5_K_P, Q5_K_M, Q4_K_P, Q4_K_M, IQ4_XS, Q3_K_P, Q3_K_M, IQ3_M, Q2_K_P + mmproj E2B: Q8_K_P, Q6_K_P, Q5_K_P,

Reddit r/LocalLLaMA

2mabout 1 hour ago

ModelsLive

I Built a Vision-Based Desktop Agent That Navigates by Screenshot. Here's What Actually Works.

I Built a Vision-Based Desktop Agent That Navigates by Screenshot. Here’s What Actually Works. DOM-based automation requires you to reverse-engineer someone else’s frontend and pray they don’t change it. They always change it. Source: Image by Resource Database on Unsplash Last month, I spent a couple of weeks attempting to build a testing framework for an app that includes a web app, a Slack app, and connections to multiple external sources, requiring testing of interface elements on external web interfaces. I managed to vibe engineer a Playwright-based test suite that “sort of™” worked. Until it didn’t. One of the external sites had updated its dashboard. Not a redesign, just a CSS class rename on a table component. Three automations targeting that table stopped working simultaneously. A

Towards AI

12mabout 2 hours ago

ModelsFresh

b8640

tests : add unit test coverage for llama_tensor_get_type ( #20112 ) Add unit test coverage for llama_tensor_get_type Fix merge conflicts, add more schemas clang formatter changes Trailing whitespace Update name Start rebase Updating files with upstream changes prior to rebase Changes needed from rebase Update attn_qkv schema, change throw behaviour Fix merge conflicts White space Update with latest changes to state counters Revert accidental personal CLAUDE.md changes Change quotation mark Reuse metadata.name since we have it Move test-only stuff out of llama-quant.cpp Hide the regex functionality back in llama-quant.cpp, use a unique pointer to a new struct 'compiled_tensor_type_patterns' which contains the patterns cont : inital deslop guidelines Cleanup based on review comments Continue

llama.cpp Releases

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Google battles Chinese open-weights models with Gemma 4 - theregister.com

Google battles Chinese open-weights models with Gemma 4 theregister.com

GNews AI Google

1mabout 4 hours ago

ModelsFresh

Google Vids gets AI upgrade with Veo and Lyria models, directable AI avatars - Ars Technica

Google Vids gets AI upgrade with Veo and Lyria models, directable AI avatars Ars Technica

GNews AI Google

1mabout 5 hours ago

ModelsFresh

Maybe a party-pooper but: A dozen 120B models later, and GPTOSS-120B is still king

Never consumes entire context walking in place. Never fails at tool calling. Never runs slow regardless the back-end. Never misses a piece of context in its entire window. Never slows down no matter how long the prompt is. As much as I despise OpenAI, I believe they've done something exceptional with that model. This is the Toyota Tacoma of open models and I see myself using it a 500K more miles. submitted by /u/ParaboloidalCrest [link] [comments]

Reddit r/LocalLLaMA

1mabout 3 hours ago

ModelsFresh

One of the best sensible reasons that I can think of to have an llm downloaded on my cell phone would be emergency advice.

It seems like every conversation about derestricted models everyone treat you like a pervert. The fact is you can be sensible and be a pervert 😂. submitted by /u/RedParaglider [link] [comments]

Reddit r/LocalLLaMA

1mabout 3 hours ago