Yann LeCun: Meta ‘fudged a little bit’ when benchmark-testing Llama 4 model - Fast Company

GNews AI LlamaJanuary 6, 20261 min read0 views

<a href="https://news.google.com/rss/articles/CBMigwFBVV95cUxOanFleEQxSVRYdTBhRjVjRTF0bUlsSTc3Y0QyMzllYjJxRzFnSjRMbXlOQXZjT05ObnMyUWtpQkEtRHduR211MnE4Q2sxTXVTZjdobXB1MWRoQUtfTy1IelA2WnNSNjFWSVlVQVo2ZFh5am5ZMFk4LVVtdTJPd1Q3X01PVQ?oc=5" target="_blank">Yann LeCun: Meta ‘fudged a little bit’ when benchmark-testing Llama 4 model</a> <font color="#6f6f6f">Fast Company</font>

Could not retrieve the full article text.

Read on GNews AI Llama →

Original source

GNews AI Llama

https://news.google.com/rss/articles/CBMigwFBVV95cUxOanFleEQxSVRYdTBhRjVjRTF0bUlsSTc3Y0QyMzllYjJxRzFnSjRMbXlOQXZjT05ObnMyUWtpQkEtRHduR211MnE4Q2sxTXVTZjdobXB1MWRoQUtfTy1IelA2WnNSNjFWSVlVQVo2ZFh5am5ZMFk4LVVtdTJPd1Q3X01PVQ?oc=5

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodelbenchmark

ProductsLive

Claude Code slash commands: the complete reference with custom examples

Claude Code slash commands: the complete reference with custom examples If you've been using Claude Code for more than a week, you've probably typed /help and seen a list of slash commands. But most developers only use /clear and /exit . Here's everything else — and how to build your own. Built-in slash commands Command What it does /help Show all commands /clear Clear conversation context /compact Summarize and compress context /memory Show what Claude remembers /review Request code review /init Initialize CLAUDE.md in current dir /exit or /quit Exit Claude Code /model Switch between Claude models /cost Show token usage and cost /doctor Check your setup The ones you're probably not using /compact vs /clear Most people use /clear when the context gets long. But /compact is usually better:

Dev.to AI

4mabout 1 hour ago

ModelsLive

Interpreting Gradient Routing’s Scalable Oversight Experiment

%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO) . The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG) , discuss how it might be useful in combination, say that it’s closer to semi-supervised reinforcement learning (SSRL) , and point to some other possible baselines. We think this post would be useful for interpreting Gradient Routing’s SO experiment and for readers who are trying to build intuition about what modern Scalable Oversight work does and does not assume. This post is mainly about two things. First , it’s about the importance of simple baselines. Second , it's about different ways of mode

lesswrong.com

14mabout 1 hour ago

ModelsLive

Research note on selective inoculation

Introduction Inoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact that we need to know about the behavior beforehand in order to craft the prompt. We study more realistic scenarios using broad persona-level trait datasets from Persona Vectors and construct dataset variants where a positive trait and a negative trait coexist, with the negative behavior present in

lesswrong.com

13mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 252 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Interpreting Gradient Routing’s Scalable Oversight Experiment

lesswrong.com

14mabout 1 hour ago

ModelsLive

Research note on selective inoculation

lesswrong.com

13mabout 1 hour ago

Models

Fears Over U.S. AI Dominance Boost Business for France’s Mistral - WSJ

Fears Over U.S. AI Dominance Boost Business for France’s Mistral WSJ

Google News - Mistral AI France

1m10 months ago

ModelsLive

Cheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)

We already knew there's nothing new under the sun. Thanks to advances in telescopes, orbital launch, satellites, and space vehicles we now know there's nothing new above the sun either, but there is rather a lot of energy! For many phenomena, I think it's a matter of convenience and utility where you model them as discrete or continuous, aka, qualitative vs quantitative. On one level, nukes are simply a bigger explosion, and we already had explosions. On another level, they're sufficiently bigger as to have reshaped global politics and rewritten the decision theory of modern war. Perhaps the key thing is remembering that sufficiently large quantitative changes can make for qualitative macro effects. For example, basic elements of modern life include transport, communication, energy, comput

LessWrong

6mabout 1 hour ago