Gemma 4 and Qwen3.5 on shared benchmarks

Reddit r/LocalLLaMAby /u/fulgencio_batista https://www.reddit.com/user/fulgencio_batistaApril 2, 20261 min read0 views

Source Quiz

submitted by /u/fulgencio_batista [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1saoyj7/gemma_4_and_qwen35_on_shared_benchmarks/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

benchmark

ReleasesLive

Advanced Compact Patterns for Web3 Developers

Introduction If you've spent years building on EVM chains, Midnight's architecture might feel like a paradigm shift. On Ethereum, you push computation onto the blockchain itself. On Midnight, you do the opposite you move computation off-chain and prove it correctly using zero-knowledge proofs. This isn't just a different implementation detail. It fundamentally changes how you think about state management, data disclosure, and circuit design. Samantha's foundational guide introduced the three-part structure of Midnight contracts: the public ledger, zero-knowledge circuits, and local computation. But understanding the basics and architecting production systems are two different challenges. This guide dives into the patterns that separate working prototypes from robust systems. We'll explore

DEV Community

14mabout 1 hour ago

ProductsFresh

Thoughts on causal isolation of AI evaluation benchmarks

AI benchmarks seem to saturate quite quickly. One sentiment I've heard a lot is that AI companies optimize their training for the most popular benchmarks. In the best case, that could mean focusing more on getting better on the topics that are benchmarked the most, which is still somewhat suboptimal as the benchmarks tend to be a proxy for the real skill and now the AI is trained for the proxy. In the worst case, the AI training is iterated directly against the benchmark, causing overfitting and good benchmark results. And avoiding this completely is not that easy. The training dataset is essentially the whole internet. When someone publishes a benchmark, the training set includes that. And people post benchmark solutions online too; those will be in the training data as well. Filtering al

LessWrong AI

5mabout 3 hours ago

ModelsFresh

Can We Secure AI With Formal Methods? January-March 2026

In the month or so around the previous new years, as 2024 became 2025, we were saying “2025: year of the agent”. MCP was taking off, the inspect-ai and pydantic-ai python packages were becoming the standards, products were branching out from chatbots to heavy and autonomous use of toolcalls. While much of the product engineering scene may have underdelivered (in the sense that “planning a vacation” isn’t entirely something most people do with agents yet), the field of FMxAI I think was right on target. Feels like there’s an agentic component to everything I read these days. What is 2026 the year of? Besides “year of investors pressure all the math companies to pivot to program synthesis”? I’m declaring it now The number of blogposts relating to secure program synthesis went exponential sin

LessWrong AI

9mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 169 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Microsoft Unveils MAI-Transcribe-1, Its Own Speech-to-Text Model - theaieconomy.substack.com

Microsoft Unveils MAI-Transcribe-1, Its Own Speech-to-Text Model theaieconomy.substack.com

GNews AI Microsoft

1mabout 8 hours ago

ModelsFresh

b8638

tests: allow exporting graph ops from HF file without downloading weights ( #21182 ) tests: allow exporting graph ops from HF file without downloading weights use unique_ptr for llama_context in HF metadata case fix missing non-required tensors falling back to type f32 use unique pointers where possible use no_alloc instead of fixing f32 fallback fix missing space macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310

llama.cpp Releases

1mabout 2 hours ago

Models

Språkbanken and the National Library of Sweden collaborate on the AI models of the future - Göteborgs universitet

Språkbanken and the National Library of Sweden collaborate on the AI models of the future Göteborgs universitet

Google News AI Sweden

1m2 months ago

ModelsLive

The AI That Actually Builds Unreal Engine Blueprints

A few weeks ago I sat down with a simple but slightly insane thought: What if I could type “Build me a third-person character blueprint with a follow camera and a basic mesh”… and an AI just did it? No ChatGPT spitting out code I have to paste. No manual drag-and-drop in the editor. Just set a goal and walk away while it works inside Unreal.That’s Cipher. It’s not another “AI for game dev” wrapper. It’s an autonomous agent that lives in your Unreal project, reads your high-level goal, plans its own steps, executes them through the Python API, checks its work, and keeps going until the job is done.Why this actually feels differentMost AI tools I’ve tried are fancy autocomplete or chatbots. They hand you a blueprint graph screenshot or a wall of nodes and say “good luck implementing this.” C

DEV Community

5mabout 1 hour ago