Models claude gemini model benchmark training announce

How we turned a small open-source model into the world's best AI forecaster

Reddit r/LocalLLaMAby /u/LightningRodLabs https://www.reddit.com/user/LightningRodLabsApril 3, 20262 min read1 views

tldr: Our model Foresight V3 is #1 on Prophet Arena, beating every frontier model. The base model is gpt-oss-120b, training data was auto-generated using public news. Benchmark Prophet Arena is a live forecasting benchmark from UChicago's SIGMA Lab. Every model receives identical context, so the leaderboard reflects the model's reasoning ability. OpenAI's Head of Applied Research called it "the only benchmark that can't be hacked." We lead both the Overall and Sports categories, ahead of every frontier model including GPT-5.2, Gemini 3 Pro, and Claude Opus 4.5. Data Generation Pipeline Real-world data is messy, unstructured, and doesn't have labels. But it does have timestamps. We turn those timestamps into labeled training data using an approach we call future-as-label. We start with a so

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sbd0rc/how_we_turned_a_small_opensource_model_into_the/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminimodel

ProductsLive

Self-Hosting in 2026: Why It Matters and How to Get Started

Every year, another SaaS tool raises prices, removes features, or shuts down. Your monthly stack — file storage, password management, project tracking, monitoring, analytics, automation — keeps growing. So does the bill. Self-hosting is the alternative. Run the software on your own server, keep your data under your control, and stop paying per-seat fees for tools that are free and open-source. Docker made deployment trivial. Open-source alternatives have matured to rival their commercial counterparts. And a $4–20/month VPS gives you enough compute to run a full stack. Self-hosting in 2026 isn't a niche hobby — it's a practical strategy. What Self-Hosting Means in Practice You install and run applications on a server you control. Your files, passwords, analytics, and workflows stay on your

DEV Community

5m31 minutes ago

ProductsLive

Built a Lightweight GitHub Action for Deploying to Azure Static Web Apps

TL;DR I created shibayan/swa-deploy — a lightweight GitHub Action that only deploys to Azure Static Web Apps, without the Docker-based build overhead of the official action. It wraps the same StaticSitesClient that SWA CLI uses internally, includes automatic caching, and supports both Deployment Token and azure/login authentication. The Problem with the Official Action When deploying static sites (built with Astro, Vite, etc.) to Azure Static Web Apps, the standard approach is to use the official Azure/static-web-apps-deploy action that gets auto-generated when you link a GitHub repo to your SWA resource. Unlike other Azure deployment actions (e.g., for App Service or Azure Functions), this action uses Oryx — the build engine used across Azure App Service — to build your application intern

DEV Community

5m30 minutes ago

ReleasesLive

Valkey vs Redis, browser-side AI models, and why quiet weeks are the best weeks

Browser-Embedded AI Models: Backend Engineers, You Can Relax (For Now) Gemma Gem hit Show HN this week — a project that runs Google's Gemma model entirely in the browser. No API keys, no cloud, no backend. It's a neat proof-of-concept using WebGPU/WASM to do inference client-side. Honest take: This is a frontend/edge play, not a backend threat. The models that fit in a browser tab are tiny — fine for autocomplete or simple classification, nowhere near replacing your inference API serving real workloads. File this under "watch, don't act." Source: https://github.com/kessler/gemma-gem The Quiet Week Problem: What It Actually Tells Us When GitHub Trending, r/java, r/backend, and HN backend threads all go quiet in the same week — that's not nothing. It usually means no major releases, the ecos

DEV Community

2m24 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 229 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Your Work Trained the Model. The Model Replaced You. Philip K. Dick Wrote This Story in 1968.

The first workers displaced by generative AI weren't software engineers. They were translators and $1.32/hr data labelers. Philip K. Dick predicted why. Read All

Hackernoon AI

1mabout 3 hours ago

ModelsLive

Paper close reading: "Why Language Models Hallucinate"

People often talk about paper reading as a skill, but there aren’t that many examples of people walking through how they do it. Part of this is a problem of supply: it’s expensive to document one’s thought process for any significant length of time, and there’s the additional cost of probably looking quite foolish when doing so. Part of this is simply a question of demand: far more people will read a short paragraph or tweet thread summarizing a paper and offering some pithy comments, than a thousand-word post of someone’s train of thought as they look through a paper. Thankfully, I’m willing to risk looking a bit foolish, and I’m pretty unresponsive to demand at this present moment, so I’ll try and write down my thought processes as I read through as much of a a paper I can in 1-2 hours.

LessWrong AI

15m19 minutes ago

ModelsLive

Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake

I wanted to know which type of quant is the best on this laptop (Intel 258V - iGPU 140V 18GB), so I tested all these small quants hoping that it generalizes to bigger models: Winners in bold (KLD≤0.01) Uploader Quant tk/s KLD GB KLD/GB* mradermacher* Q4_0 28.97 0.052659918 2.37 0.04593 mradermacher_i1 Q4_0 28.89 0.059171561 2.37 0.05162 mradermacher_i1 IQ3_XXS 28.59 0.177140713 1.77 0.20736 Unsloth UD-IQ2_XXS 28.47 0.573673327 1.42 0.83747 Unsloth Q4_0 28.3 0.053431218 2.41 0.04583 Bartowski Q4_0 28.28 0.049796789 2.45 0.04200 mradermacher Q4_K_S 27.74 0.050305722 2.39 0.04350 Unsloth Q4_K_S 27.29 0.028402815 2.41 0.02429 Unsloth UD-IQ3_XXS 27.03 0.146879419 1.82 0.16718 mradermacher Q2_K 26.98 0.858648176 1.78 1.00000 mradermacher_i1 Q4_K_M 25.95 0.026540567 2.52 0.02169 mradermacher_i1 I

Reddit r/LocalLLaMA

3m43 minutes ago

ModelsFresh

Goal-Conditioned Neural ODEs with Guaranteed Safety and Stability for Learning-Based All-Pairs Motion Planning

arXiv:2604.02821v1 Announce Type: new Abstract: This paper presents a learning-based approach for all-pairs motion planning, where the initial and goal states are allowed to be arbitrary points in a safe set. We construct smooth goal-conditioned neural ordinary differential equations (neural ODEs) via bi-Lipschitz diffeomorphisms. Theoretical results show that the proposed model can provide guarantees of global exponential stability and safety (safe set forward invariance) regardless of goal location. Moreover, explicit bounds on convergence rate, tracking error, and vector field magnitude are established. Our approach admits a tractable learning implementation using bi-Lipschitz neural networks and can incorporate demonstration data. We illustrate the effectiveness of the proposed method

arXiv cs.RO

1mabout 3 hours ago