Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI videoAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AISwift package AI inference engine generated from Rust crateHacker News AI TopZeta-2 Turns Code Edits Into Context-Aware Rewrite SuggestionsHackernoon AIAI Tools That Actually Pay You Back: A Developer's Guide to Monetizing AIDev.to AIThe $6 Million Shockwave: How DeepSeek Just Broke the AI MonopolyMedium AIHow I Got My First Freelance Client in 3 Days (Using AI) — Beginner Guide (India 2026)Medium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employeesThe DecoderThe Invisible Broken Clock in AI Video Generation - HackerNoonGNews AI videoAnthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demandThe DecoderDesktop Canary v2.1.48-canary.31LobeChat ReleasesQwen 3.5 397B vs Qwen 3.6-PlusReddit r/LocalLLaMAThe Invisible Broken Clock in AI Video GenerationHackernoon AIMean field sequence: an introductionLessWrong AISwift package AI inference engine generated from Rust crateHacker News AI TopZeta-2 Turns Code Edits Into Context-Aware Rewrite SuggestionsHackernoon AIAI Tools That Actually Pay You Back: A Developer's Guide to Monetizing AIDev.to AIThe $6 Million Shockwave: How DeepSeek Just Broke the AI MonopolyMedium AIHow I Got My First Freelance Client in 3 Days (Using AI) — Beginner Guide (India 2026)Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)

Hacker News AI Topby hamtun24April 3, 20265 min read2 views
Source Quiz
🧒Explain Like I'm 5Simple language

Imagine your computer is like a toy box, and inside are two friends: a super-smart brain (that's your computer's CPU) and a super-fast artist (that's your computer's iGPU).

Usually, they have their own separate piles of toys. But this new magic trick, called OpenUMA, lets them share one big pile of toys! 🎉

This means when the artist needs a toy, the brain doesn't have to send it over slowly. They can both grab toys from the same pile super fast! This helps your computer play smart AI games, like making up stories or drawing pictures, much quicker and smoother. It's like they're working together perfectly!

Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0

OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.

┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘

Key Features

  • Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs

  • Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference

  • Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU

  • Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers

  • Interactive TUI - Full terminal UI for hardware monitoring and configuration

  • Benchmarking - Real inference benchmarks with llama.cpp

Supported Hardware

Vendor Series Examples

AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G

AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370

AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350

Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H

Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V

Quick Start

# Build cargo build --release

Detect hardware

./target/release/openuma probe

Launch interactive TUI

./target/release/openuma tui

Generate config for llama.cpp

./target/release/openuma configure --engine llamacpp --model model.gguf`

Terminal UI

┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘

Commands

Command Description

openuma probe Detect hardware profile

openuma tui Launch interactive terminal UI

openuma partition --model Show memory partition for model

openuma configure --engine --model Generate engine config

openuma benchmark --model Run inference benchmark

openuma zerocopy --test Test DMA-BUF zero-copy

openuma serve Start REST API server

openuma profile list List known hardware profiles

Supported Inference Engines

llama.cpp

openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf

Ollama

openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf

KTransformers (MoE models)

openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf

How It Works

Memory Model

┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘

Key Insight

For LLM inference on APUs:

  • Attention layers → benefit from iGPU (parallel matrix ops)

  • MoE expert layers → should stay on CPU (sparse activation)

  • KV cache → benefits from unified memory zero-copy

Benchmarking

# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf

Full multi-backend comparison

openuma benchmark --model model.gguf --full

 
╔════════════════════════════════════════════════════════════════════╗ ║ OpenUMA Benchmark Report ║ ╠════════════════════════════════════════════════════════════════════╣ ║ Best Backend: vulkan (12.5 t/s) ║ Average TPS: 8.2 ╠════════════════════════════════════════════════════════════════════╣ ║ Test 1: model.gguf [vulkan] ║ └── 12.5 tokens/sec | 8000 ms ║ Test 2: model.gguf [opencl] ║ └── 10.2 tokens/sec | 9800 ms ║ Test 3: model.gguf [cpu] ║ └── 4.8 tokens/sec | 20800 ms ╠════════════════════════════════════════════════════════════════════╣ ║ Recommendations: ║ • Best performing backend: vulkan (~12.5 tokens/sec) ║ • GPU acceleration provides 2.6x speedup over CPU ╚════════════════════════════════════════════════════════════════════╝`

Architecture

openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profiles

Installation

Option A — Download Binary (Linux x86_64)

# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \  | tar xz

Run it

./openuma probe`

Option B — Build from Source

# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe

System Requirements

Requirement Notes

OS Linux (kernel 5.10+)

CPU Any x86_64 with AMD APU or Intel iGPU

RAM 16GB minimum, 32GB recommended

Optional llama.cpp in PATH for real benchmarks

Optional Vulkan drivers for iGPU acceleration

Install Vulkan Drivers (if missing)

# AMD iGPU sudo apt install mesa-vulkan-drivers

Intel iGPU

sudo apt install intel-media-va-driver mesa-vulkan-drivers`

Install llama.cpp (optional)

git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"

Real World Results

OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.

Example: AMD Ryzen 5 5600G + 32GB DDR4

Setup Command Tokens/sec

llama.cpp defaults llama-cli -m model.gguf ~3.1 t/s

OpenUMA-configured openuma configure --engine llamacpp --model model.gguf ~7.2 t/s

Improvement

+132%

What OpenUMA changed:

  • Enabled Vulkan backend (default is CPU)

  • Set correct --n-gpu-layers for available shared VRAM

  • Configured dual-channel memory-aware thread count

  • Disabled mmap in favor of zero-copy DMA-BUF path

Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.

Community Benchmarks

This section will grow as users submit hardware profiles. Submit your results →

Contributing

Contributions welcome! Open issues and pull requests.

License

MIT License - see LICENSE for details.

OpenUMA - Making every x86 machine a first-class AI citizen.

Original source

Hacker News AI Top

https://github.com/hamtun24/openuma
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
OpenUMA – b…githubHacker News…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 125 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!