Gemma 4 and Qwen3.5 on shared benchmarks
submitted by /u/fulgencio_batista [link] [comments]
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1saoyj7/gemma_4_and_qwen35_on_shared_benchmarks/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
benchmark
Advanced Compact Patterns for Web3 Developers
Introduction If you've spent years building on EVM chains, Midnight's architecture might feel like a paradigm shift. On Ethereum, you push computation onto the blockchain itself. On Midnight, you do the opposite you move computation off-chain and prove it correctly using zero-knowledge proofs. This isn't just a different implementation detail. It fundamentally changes how you think about state management, data disclosure, and circuit design. Samantha's foundational guide introduced the three-part structure of Midnight contracts: the public ledger, zero-knowledge circuits, and local computation. But understanding the basics and architecting production systems are two different challenges. This guide dives into the patterns that separate working prototypes from robust systems. We'll explore

Thoughts on causal isolation of AI evaluation benchmarks
AI benchmarks seem to saturate quite quickly. One sentiment I've heard a lot is that AI companies optimize their training for the most popular benchmarks. In the best case, that could mean focusing more on getting better on the topics that are benchmarked the most, which is still somewhat suboptimal as the benchmarks tend to be a proxy for the real skill and now the AI is trained for the proxy. In the worst case, the AI training is iterated directly against the benchmark, causing overfitting and good benchmark results. And avoiding this completely is not that easy. The training dataset is essentially the whole internet. When someone publishes a benchmark, the training set includes that. And people post benchmark solutions online too; those will be in the training data as well. Filtering al
Can We Secure AI With Formal Methods? January-March 2026
In the month or so around the previous new years, as 2024 became 2025, we were saying “2025: year of the agent”. MCP was taking off, the inspect-ai and pydantic-ai python packages were becoming the standards, products were branching out from chatbots to heavy and autonomous use of toolcalls. While much of the product engineering scene may have underdelivered (in the sense that “planning a vacation” isn’t entirely something most people do with agents yet), the field of FMxAI I think was right on target. Feels like there’s an agentic component to everything I read these days. What is 2026 the year of? Besides “year of investors pressure all the math companies to pivot to program synthesis”? I’m declaring it now The number of blogposts relating to secure program synthesis went exponential sin
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
b8638
tests: allow exporting graph ops from HF file without downloading weights ( #21182 ) tests: allow exporting graph ops from HF file without downloading weights use unique_ptr for llama_context in HF metadata case fix missing non-required tensors falling back to type f32 use unique pointers where possible use no_alloc instead of fixing f32 fallback fix missing space macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310

The AI That Actually Builds Unreal Engine Blueprints
A few weeks ago I sat down with a simple but slightly insane thought: What if I could type “Build me a third-person character blueprint with a follow camera and a basic mesh”… and an AI just did it? No ChatGPT spitting out code I have to paste. No manual drag-and-drop in the editor. Just set a goal and walk away while it works inside Unreal.That’s Cipher. It’s not another “AI for game dev” wrapper. It’s an autonomous agent that lives in your Unreal project, reads your high-level goal, plans its own steps, executes them through the Python API, checks its work, and keeps going until the job is done.Why this actually feels differentMost AI tools I’ve tried are fancy autocomplete or chatbots. They hand you a blueprint graph screenshot or a wall of nodes and say “good luck implementing this.” C



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!