b8668
server : fix logging of build + system info ( #21460 ) This PR changes the logging that occurs at startup of llama-server. Currently, it is redundant (including CPU information twice) and it is missing the build + commit info. macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign up
Appearance settings
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamastartup
Why APEX Matters for MoE Coding Models and why it's NOT the same as K quants
I posted about my APEX quantization of QWEN Coder 80B Next yesterday and got a ton of great questions. Some people loved it, some people were skeptical, and one person asked "what exactly is the point of this when K quants already do mixed precision?" It's a great question. I've been deep in this for the last few days running APEX on my own hardware and I want to break down what I've learned because I think most people are missing the bigger picture here. So yes K quants like Q4_K_M already apply different precision to different layers. Attention gets higher precision, feed-forward gets lower. That's been in llama.cpp for a while and it works. But here's the thing nobody is talking about. MoE models have a coherence problem. I was reading this article last night and it clicked for me. When


Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
arXiv:2411.10656v2 Announce Type: replace Abstract: Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

![[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.](https://preview.redd.it/56hbny8rrjtg1.png?width=640&crop=smart&auto=webp&s=26d91255bcdd942aea5255c7d3ac259db5bebf23)
[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone. Fully on-device, no cloud.
PokeClaw - A Pocket Version of OpenClaw Most "private" AI assistants are private because the company says so. PokeClaw is private because there's literally no server component. The AI model runs on your phone's CPU. There's no cloud endpoint. You can block the app from the internet entirely and it works the same. It runs Gemma 4 on-device using LiteRT and controls your phone through Android Accessibility. You type a command, the AI reads the screen, decides what to tap, and executes. Works with any app. I built this because I wanted a phone assistant that couldn't spy on me even if it wanted to. Not because of a privacy policy, but because of architecture. There's nowhere for the data to go. First app I've found that does fully local LLM phone control — every other option I checked either

Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation - Daytona Beach News-Journal
Silverback AI Chatbot Introduces AI Assistant Feature to Support Structured Digital Communication and Intelligent Workflow Automation Daytona Beach News-Journal


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!