OpenUMA – bring Apple-style unified memory to x86 AI inference (Rust, Linux)
Imagine your computer is like a toy box, and inside are two friends: a super-smart brain (that's your computer's CPU) and a super-fast artist (that's your computer's iGPU).
Usually, they have their own separate piles of toys. But this new magic trick, called OpenUMA, lets them share one big pile of toys! 🎉
This means when the artist needs a toy, the brain doesn't have to send it over slowly. They can both grab toys from the same pile super fast! This helps your computer play smart AI games, like making up stories or drawing pictures, much quicker and smoother. It's like they're working together perfectly!
Article URL: https://github.com/hamtun24/openuma Comments URL: https://news.ycombinator.com/item?id=47624865 Points: 1 # Comments: 0
OpenUMA (Unified Memory Abstraction) is a Rust middleware for detecting shared memory hardware (AMD APUs, Intel iGPUs), configuring unified memory pools, and generating optimal configs for AI inference engines.
┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐ │ OpenUMA v0.6.2 │ │ Unified Memory Abstraction for AI Inference │ └─────────────────────────────────────────────────────────────────────────┘Key Features
-
Hardware Detection - Automatic detection of AMD APUs and Intel iGPUs
-
Memory Partitioning - Intelligent iGPU/CPU memory allocation for LLM inference
-
Zero-Copy DMA-BUF - Direct memory transfers between CPU and iGPU
-
Multiple Engines - Generate configs for llama.cpp, Ollama, and KTransformers
-
Interactive TUI - Full terminal UI for hardware monitoring and configuration
-
Benchmarking - Real inference benchmarks with llama.cpp
Supported Hardware
Vendor Series Examples
AMD Zen 3 (Cezanne, Renoir) Ryzen 5 5600G, Ryzen 7 5700G
AMD Zen 4 (Phoenix, Hawk Point) Ryzen 7 7840HS, Ryzen AI 9 HX 370
AMD Zen 5 (Strix Point) Ryzen AI 9 HX 370, Ryzen AI 7 350
Intel Alder Lake, Raptor Lake Core i5-1240P, Core i7-12700H
Intel Meteor Lake, Lunar Lake Core Ultra 5 125H, Core Ultra 7 258V
Quick Start
# Build cargo build --release# Build cargo build --releaseDetect hardware
./target/release/openuma probe
Launch interactive TUI
./target/release/openuma tui
Generate config for llama.cpp
./target/release/openuma configure --engine llamacpp --model model.gguf`
Terminal UI
┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────────────────┐ │ [D]ashboard [M]emory [B]enchmark [P]rofiles [C]onfigure [S]ettings│ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ╔═══════════════════════════════════════════════════════════════════╗ │ │ ║ Hardware Overview ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ CPU AMD Ryzen 5 5600G (Cezanne) ║ │ │ ║ 6 cores (12 threads), AVX2, 16MB L3 ║ │ │ ║ iGPU AMD Vega7 (Raven Ridge) ║ │ │ ║ 7 CUs, 512MB / 16384MB shared VRAM ║ │ │ ║ Vulkan ✓ OpenCL ✓ Zero-copy ✓ ║ │ │ ║ RAM 32GB DDR4-3200 (Dual-channel) ║ │ │ ║ 51.2 GB/s theoretical, 46.8 GB/s measured ║ │ │ ╠═══════════════════════════════════════════════════════════════════╣ │ │ ║ ✓ Unified Memory Available Tier: CONSUMER_UMA ║ │ │ ╚═══════════════════════════════════════════════════════════════════╝ │ │ │ │ Memory Partition: iGPU: 7168 MB (35.0%) CPU: 13312 MB (65.0%) │ │ [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 35% │ │ │ │ Strategy: HybridIgpu Zero-copy: Available │ │ │ │ [r] Refresh [q] Quit │ └─────────────────────────────────────────────────────────────────────────┘Commands
Command Description
openuma probe
Detect hardware profile
openuma tui
Launch interactive terminal UI
openuma partition --model
Show memory partition for model
openuma configure --engine --model
Generate engine config
openuma benchmark --model
Run inference benchmark
openuma zerocopy --test
Test DMA-BUF zero-copy
openuma serve
Start REST API server
openuma profile list
List known hardware profiles
Supported Inference Engines
llama.cpp
openuma configure --engine llamacpp --model llama3-8b-q4_k_m.gguf
Ollama
openuma configure --engine ollama --model llama3-8b-q4_k_m.gguf
KTransformers (MoE models)
openuma configure --engine ktransformers --model deepseek-v3-q4km.gguf
How It Works
Memory Model
┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘┌────────────────────────────────────────────────────────────────┐ │ Unified Memory Pool │ │ │ │ ┌──────────────┐ ┌──────────────────────┐ │ │ │ iGPU VRAM │ ◄── Zero-Copy ──►│ System RAM │ │ │ │ (Shared) │ DMA-BUF │ (DDR4/DDR5) │ │ │ └──────────────┘ └──────────────────────┘ │ │ │ │ Attention layers benefit from iGPU │ │ MoE experts stay on CPU │ └────────────────────────────────────────────────────────────────┘Key Insight
For LLM inference on APUs:
-
Attention layers → benefit from iGPU (parallel matrix ops)
-
MoE expert layers → should stay on CPU (sparse activation)
-
KV cache → benefits from unified memory zero-copy
Benchmarking
# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.gguf# Quick benchmark openuma benchmark --model llama3-8b-q4_k_m.ggufFull multi-backend comparison
openuma benchmark --model model.gguf --full
Architecture
openuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profilesopenuma/ ├── crates/ │ ├── hw_probe/ # Hardware detection │ ├── mem_mgr/ # Memory partitioning + zero-copy │ ├── config_gen/ # Model metadata (GGUF) │ ├── profile_db/ # Hardware profile database │ ├── benchmark/ # Inference benchmarking │ ├── api_server/ # REST API │ ├── cli/ # CLI interface │ └── tui/ # Terminal UI └── profiles/ # Hardware profilesInstallation
Option A — Download Binary (Linux x86_64)
# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \ | tar xz# Download latest release curl -L https://github.com/hamtun24/openuma/releases/latest/download/openuma-linux-x86_64.tar.gz \ | tar xzRun it
./openuma probe`
Option B — Build from Source
# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probe# Prerequisites: Rust 1.70+ git clone https://github.com/hamtun24/openuma.git cd openuma cargo build --release ./target/release/openuma probeSystem Requirements
Requirement Notes
OS Linux (kernel 5.10+)
CPU Any x86_64 with AMD APU or Intel iGPU
RAM 16GB minimum, 32GB recommended
Optional llama.cpp in PATH for real benchmarks
Optional Vulkan drivers for iGPU acceleration
Install Vulkan Drivers (if missing)
# AMD iGPU sudo apt install mesa-vulkan-drivers# AMD iGPU sudo apt install mesa-vulkan-driversIntel iGPU
sudo apt install intel-media-va-driver mesa-vulkan-drivers`
Install llama.cpp (optional)
git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"git clone https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build && cd build cmake .. -DLLAMA_BUILD_EXAMPLES=ON make -j$(nproc) export PATH="$PATH:$(pwd)/bin"Real World Results
OpenUMA's value is in the configuration it generates — not just detecting hardware, but knowing the exact flags that extract maximum performance from it.
Example: AMD Ryzen 5 5600G + 32GB DDR4
Setup Command Tokens/sec
llama.cpp defaults
llama-cli -m model.gguf
~3.1 t/s
OpenUMA-configured
openuma configure --engine llamacpp --model model.gguf
~7.2 t/s
Improvement
+132%
What OpenUMA changed:
-
Enabled Vulkan backend (default is CPU)
-
Set correct --n-gpu-layers for available shared VRAM
-
Configured dual-channel memory-aware thread count
-
Disabled mmap in favor of zero-copy DMA-BUF path
Note: Numbers above are estimates from the profile database for this hardware. Run openuma benchmark --model your-model.gguf --full on your machine to get real measured numbers and contribute them to the community database.
Community Benchmarks
This section will grow as users submit hardware profiles. Submit your results →
Contributing
Contributions welcome! Open issues and pull requests.
License
MIT License - see LICENSE for details.
OpenUMA - Making every x86 machine a first-class AI citizen.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
github
Quoting Kyle Daigle
[GitHub] platform activity is surging. There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.) GitHub Actions has grown from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes so far this week. Kyle Daigle , COO, GitHub Tags: github , github-actions
trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"
This reverts commit 670be7c . Reverted #179314 on behalf of https://github.com/izaitsevfb due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

Anthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demand
Anthropic is cutting off Claude usage through external tools like OpenClaw for subscription customers. The decision exposes a core problem in the AI industry: flat-rate pricing and agent-driven nonstop usage don't mix. The article Anthropic cuts off third-party tools like OpenClaw for Claude subscribers, citing unsustainable demand appeared first on The Decoder .

Anthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employees
Anthropic is paying 400 million dollars for an eight-month-old biotech startup with fewer than ten employees. The investor walks away with a 38,513 percent return. The article Anthropic drops 400 million in shares on an eight-month-old AI pharma startup with fewer than ten employees appeared first on The Decoder .


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!