Google releases Gemma 4, a family of open models built off of Gemini 3 - Engadget

More about

geminimodelrelease

Distributed 1-bit LLM inference over P2P - 50 nodes validated, 100% shard discovery, CPU-only

There are roughly 4 billion CPUs on Earth. Most of them sit idle 70% of the time. Meanwhile, the AI industry is burning $100B+ per year on GPU clusters to run models that 95% of real-world tasks don't actually need. ARIA Protocol is an attempt to flip that equation. It's a peer-to-peer distributed inference system built specifically for 1-bit quantized models (ternary weights: -1, 0, +1). No GPU. No cloud. No central server. Nodes discover each other over a Kademlia DHT, shard model layers across contributors, and pipeline inference across the network. Think Petals meets BitNet, minus the GPU requirement. This isn't Ollama or llama.cpp — those are great tools, but they're single-machine. ARIA distributes inference across multiple CPUs over the internet so that no single node needs to hold

Reddit r/LocalLLaMA

3mabout 3 hours ago

Open Source AIFresh

B70: Quick and Early Benchmarks & Backend Comparison

llama.cpp: f1f793ad0 (8657) This is a quick attempt to just get it up and running. Lots of oneapi runtime still using "stable" from Intels repo. Kernel 6.19.8+deb13-amd64 with an updated xe firmware built. Vulkan is Debian but using latest Mesa compiled from source. Openvino is 2026.0. Feels like everything is "barely on the brink of working" (which is to be expected). sycl: $ build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -p 512,16384 -n 128,512 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp512 | 798.07 ± 2.72 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp16384

Reddit r/LocalLLaMA

3mabout 2 hours ago

ModelsFresh

Closed model providers change behavior between API versions with no real changelog. Building anything on top of them is a gamble.

This is one of the reasons I keep gravitating back to local models even when the closed API ones are technically stronger. I had a production pipeline running on a major closed API for about four months. Stable, tested, working. Then one day the outputs started drifting. Not breaking errors, just subtle behavioral changes. Format slightly different, refusals on things it used to handle fine, confidence on certain task types quietly degraded. No changelog. No notification. Support ticket response was essentially "models are updated periodically to improve quality." There is no way to pin to a specific checkpoint. You signed up for a service that reserves the right to change what the service does at any time. The thing that gets me is how normalized this is. If a database provider silently c

Reddit r/LocalLLaMA

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 237 connections

Scroll to zoom · drag to pan · click to open

More in Models

ModelsFresh

Distributed 1-bit LLM inference over P2P - 50 nodes validated, 100% shard discovery, CPU-only

There are roughly 4 billion CPUs on Earth. Most of them sit idle 70% of the time. Meanwhile, the AI industry is burning $100B+ per year on GPU clusters to run models that 95% of real-world tasks don't actually need. ARIA Protocol is an attempt to flip that equation. It's a peer-to-peer distributed inference system built specifically for 1-bit quantized models (ternary weights: -1, 0, +1). No GPU. No cloud. No central server. Nodes discover each other over a Kademlia DHT, shard model layers across contributors, and pipeline inference across the network. Think Petals meets BitNet, minus the GPU requirement. This isn't Ollama or llama.cpp — those are great tools, but they're single-machine. ARIA distributes inference across multiple CPUs over the internet so that no single node needs to hold

Reddit r/LocalLLaMA

3mabout 3 hours ago

Models

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center - WSJ

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center WSJ

GNews AI Mistral

1m5 days ago

Models

Mistral AI inks a deal with global consulting giant Accenture - TechCrunch

Mistral AI inks a deal with global consulting giant Accenture TechCrunch

GNews AI Mistral

1mabout 1 month ago

ModelsFresh

Closed model providers change behavior between API versions with no real changelog. Building anything on top of them is a gamble.

This is one of the reasons I keep gravitating back to local models even when the closed API ones are technically stronger. I had a production pipeline running on a major closed API for about four months. Stable, tested, working. Then one day the outputs started drifting. Not breaking errors, just subtle behavioral changes. Format slightly different, refusals on things it used to handle fine, confidence on certain task types quietly degraded. No changelog. No notification. Support ticket response was essentially "models are updated periodically to improve quality." There is no way to pin to a specific checkpoint. You signed up for a service that reserves the right to change what the service does at any time. The thing that gets me is how normalized this is. If a database provider silently c

Reddit r/LocalLLaMA

2mabout 2 hours ago

Google releases Gemma 4, a family of open models built off of Gemini 3 - Engadget

Daily AI Digest

More about

Distributed 1-bit LLM inference over P2P - 50 nodes validated, 100% shard discovery, CPU-only

B70: Quick and Early Benchmarks & Backend Comparison

Closed model providers change behavior between API versions with no real changelog. Building anything on top of them is a gamble.

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Models

Distributed 1-bit LLM inference over P2P - 50 nodes validated, 100% shard discovery, CPU-only

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center - WSJ

Mistral AI inks a deal with global consulting giant Accenture - TechCrunch

Closed model providers change behavior between API versions with no real changelog. Building anything on top of them is a gamble.