Extended NYT Connections Benchmark scores: MiniMax-M2.7 34.4, Gemma 4 31B 30.1, Arcee Trinity Large Thinking 29.5

Reddit r/LocalLLaMAby /u/zero0_one1 https://www.reddit.com/user/zero0_one1April 4, 20261 min read1 views

Source Quiz

More info: github.com/lechmazur/nyt-connections/ submitted by /u/zero0_one1 [link] [comments]

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1scl7pl/extended_nyt_connections_benchmark_scores/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

benchmarkgithub

ModelsFresh

Spent the weekend reading a local agent runtime repo. The TS-only packaging and persistent MCP ports are both very smart.

I like reading local LLM infra repos more than launch posts, and I ended up deep in one this weekend because it supports local providers like Ollama. Two things gave me the “okay, someone actually cared about runtime engineering” reaction. First, the runtime path was moved fully into TypeScript. The API layer, runner orchestration, workspace MCP hosting, and packaging all live there now, and the packaged runtime no longer ships Python source or Python deps. For local/self-hosted stacks that matters more than it sounds: smaller bundle, fewer moving pieces, less cross-language drift. Second, they stopped doing hardcoded MCP port math. Ports are persisted in SQLite with UNIQUE(port) and (workspace_id, app_id) as the key, and the runner merges prepared MCP servers during bootstrap. So local si

Reddit r/LocalLLaMA

2mabout 3 hours ago

ModelsFresh

Comparing Qwen3.5 vs Gemma4 for Local Agentic Coding

Gemma4 was relased by Google on April 2nd earlier this week and I wanted to see how it performs against Qwen3.5 for local agentic coding. This post is my notes on benchmarking the two model families. I ran two types of tests: Standard llama-bench benchmarks for raw prefill and generation speed Single-shot agentic coding tasks using Open Code to see how these models actually perform on real multi-step coding workflows My pick is Qwen3.5-27B which is still the best model for local agentic coding on an 24GB card (RTX 3090/4090). It is reliable, efficient, produces the cleanest code and fits comfortably on a 4090. Model Gen tok/s Turn(correct) Code Quality VRAM Max Context Gemma4-26B-A4B ~135 3rd Weakest ~21 GB 256K Qwen3.5-35B-A3B ~136 2nd Best structure, wrong API ~23 GB 200K Qwen3.5-27B ~45

Reddit r/LocalLLaMA

2mabout 6 hours ago

ModelsLive

trunk/83e9e15421782cf018dae04969a387901ba8ec1b: Fix Python refcounting bugs in profiler_python.cpp (#179285)

Use Py_XNewRef with PyDict_GetItemString to properly convert borrowed refs to strong refs owned by THPObjectPtr (fixes leak on 3.13+ where the Py_INCREF was applied to an already-owned ref from PyMapping_GetItemString, and fixes potential NULL deref on Add Py_NewRef for Py_None passed to PyTuple_SetItem (which steals refs) Wrap PyObject_Call results in THPObjectPtr to avoid leaking return values Use PyObject_CallOneArg instead of PyTuple_Pack + PyObject_Call Clear exception from PySequence_Index when gc callback not found Remove unused thread_state_ member from ThreadLocalResults Authored with Claude. Pull Request resolved: #179285 Approved by: https://github.com/Skylion007

PyTorch Releases

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 141 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto - Bitcoin.com News

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto Bitcoin.com News

Google News: Claude

1m38 minutes ago

Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ

Google News: Claude

1m4 days ago

ModelsLive

I Asked ChatGPT Which Investments Won’t Survive the Next Recession: Here’s What It Said - AOL.com

I Asked ChatGPT Which Investments Won’t Survive the Next Recession: Here’s What It Said AOL.com

Google News: ChatGPT

1m7 minutes ago

ModelsFresh

How well do current models handle Icelandic audio?

I’ve been doing some informal testing on how current multimodal models handle speech + multilingual understanding, and came across an interesting behavior that feels slightly beyond standard translation.I used a short audio clip in a language I don’t understand (likely Icelandic) and evaluated the output along a few dimensions:1. Transcription qualityThe model produced a relatively clean transcript, with no obvious structural breakdown.2. Translation fidelity vs. fluencyInstead of sticking closely to literal phrasing, the translation leaned more toward natural English, sometimes smoothing or rephrasing content.3. Context / tone inferenceThis was the most notable part — the model attempted to describe the tone and intent of the speakers (e.g., casual vs. serious), which goes beyond typical

Reddit r/LocalLLaMA

1mabout 4 hours ago