b8668

llama.cpp Releasesby github-actions[bot]April 5, 20261 min read0 views

server : fix logging of build + system info ( #21460 ) This PR changes the logging that occurs at startup of llama-server. Currently, it is redundant (including CPU information twice) and it is missing the build + commit info. macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Original source

llama.cpp Releases

https://github.com/ggml-org/llama.cpp/releases/tag/b8668

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamastartup

Open Source AILive

Why APEX Matters for MoE Coding Models and why it's NOT the same as K quants

I posted about my APEX quantization of QWEN Coder 80B Next yesterday and got a ton of great questions. Some people loved it, some people were skeptical, and one person asked "what exactly is the point of this when K quants already do mixed precision?" It's a great question. I've been deep in this for the last few days running APEX on my own hardware and I want to break down what I've learned because I think most people are missing the bigger picture here. So yes K quants like Q4_K_M already apply different precision to different layers. Attention gets higher precision, feed-forward gets lower. That's been in llama.cpp for a while and it works. But here's the thing nobody is talking about. MoE models have a coherence problem. I was reading this article last night and it clicked for me. When

Reddit r/LocalLLaMA

3m37 minutes ago

Products

Taiwan Startups Showcase AI Capabilities at NVIDIA GTC 2026, Highlighting Strategic Role in Global AI Supply Chain - streetinsider.com

Taiwan Startups Showcase AI Capabilities at NVIDIA GTC 2026, Highlighting Strategic Role in Global AI Supply Chain streetinsider.com

GNews AI Taiwan

1m14 days ago

ModelsFresh

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

arXiv:2411.10656v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited

arXiv cs.SE

1mabout 9 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

Products

Taiwan Startups Showcase AI Capabilities at NVIDIA GTC 2026, Highlighting Strategic Role in Global AI Supply Chain - streetinsider.com

Taiwan Startups Showcase AI Capabilities at NVIDIA GTC 2026, Highlighting Strategic Role in Global AI Supply Chain streetinsider.com

GNews AI Taiwan

1m14 days ago

ProductsFresh

[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.

PokeClaw - A Pocket Version of OpenClaw Most "private" AI assistants are private because the company says so. PokeClaw is private because there's literally no server component. The AI model runs on your phone's CPU. There's no cloud endpoint. You can block the app from the internet entirely and it works the same. It runs Gemma 4 on-device using LiteRT and controls your phone through Android Accessibility. You type a command, the AI reads the screen, decides what to tap, and executes. Works with any app. I built this because I wanted a phone assistant that couldn't spy on me even if it wanted to. Not because of a privacy policy, but because of architecture. There's nowhere for the data to go. First app I've found that does fully local LLM phone control — every other option I checked either

Reddit r/LocalLLaMA

1mabout 3 hours ago

Products

Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation - Daytona Beach News-Journal

Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation Daytona Beach News-Journal

GNews AI assistant

1m12 days ago

Products

Interactive apps, AI chatbots promote playfulness, reduce privacy concerns - The Pennsylvania State University

Interactive apps, AI chatbots promote playfulness, reduce privacy concerns The Pennsylvania State University

GNews AI privacy

1m7 months ago