Yann LeCun: Meta ‘fudged a little bit’ when benchmark-testing Llama 4 model - Fast Company
<a href="https://news.google.com/rss/articles/CBMigwFBVV95cUxOanFleEQxSVRYdTBhRjVjRTF0bUlsSTc3Y0QyMzllYjJxRzFnSjRMbXlOQXZjT05ObnMyUWtpQkEtRHduR211MnE4Q2sxTXVTZjdobXB1MWRoQUtfTy1IelA2WnNSNjFWSVlVQVo2ZFh5am5ZMFk4LVVtdTJPd1Q3X01PVQ?oc=5" target="_blank">Yann LeCun: Meta ‘fudged a little bit’ when benchmark-testing Llama 4 model</a> <font color="#6f6f6f">Fast Company</font>
Could not retrieve the full article text.
Read on GNews AI Llama →Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamamodelbenchmark
Claude Code slash commands: the complete reference with custom examples
Claude Code slash commands: the complete reference with custom examples If you've been using Claude Code for more than a week, you've probably typed /help and seen a list of slash commands. But most developers only use /clear and /exit . Here's everything else — and how to build your own. Built-in slash commands Command What it does /help Show all commands /clear Clear conversation context /compact Summarize and compress context /memory Show what Claude remembers /review Request code review /init Initialize CLAUDE.md in current dir /exit or /quit Exit Claude Code /model Switch between Claude models /cost Show token usage and cost /doctor Check your setup The ones you're probably not using /compact vs /clear Most people use /clear when the context gets long. But /compact is usually better:

Interpreting Gradient Routing’s Scalable Oversight Experiment
%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO) . The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG) , discuss how it might be useful in combination, say that it’s closer to semi-supervised reinforcement learning (SSRL) , and point to some other possible baselines. We think this post would be useful for interpreting Gradient Routing’s SO experiment and for readers who are trying to build intuition about what modern Scalable Oversight work does and does not assume. This post is mainly about two things. First , it’s about the importance of simple baselines. Second , it's about different ways of mode

Research note on selective inoculation
Introduction Inoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact that we need to know about the behavior beforehand in order to craft the prompt. We study more realistic scenarios using broad persona-level trait datasets from Persona Vectors and construct dataset variants where a positive trait and a negative trait coexist, with the negative behavior present in
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Interpreting Gradient Routing’s Scalable Oversight Experiment
%TLDR. We discuss the setting that Gradient Routing (GR) paper uses to model Scalable Oversight (SO) . The first part suggests an improved naive baseline using early stopping which performs on-par with GR. In the second part, we compare GR’s setting to SO and Weak-to-Strong generalization (W2SG) , discuss how it might be useful in combination, say that it’s closer to semi-supervised reinforcement learning (SSRL) , and point to some other possible baselines. We think this post would be useful for interpreting Gradient Routing’s SO experiment and for readers who are trying to build intuition about what modern Scalable Oversight work does and does not assume. This post is mainly about two things. First , it’s about the importance of simple baselines. Second , it's about different ways of mode

Research note on selective inoculation
Introduction Inoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact that we need to know about the behavior beforehand in order to craft the prompt. We study more realistic scenarios using broad persona-level trait datasets from Persona Vectors and construct dataset variants where a positive trait and a negative trait coexist, with the negative behavior present in

Cheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)
We already knew there's nothing new under the sun. Thanks to advances in telescopes, orbital launch, satellites, and space vehicles we now know there's nothing new above the sun either, but there is rather a lot of energy! For many phenomena, I think it's a matter of convenience and utility where you model them as discrete or continuous, aka, qualitative vs quantitative. On one level, nukes are simply a bigger explosion, and we already had explosions. On another level, they're sufficiently bigger as to have reshaped global politics and rewritten the decision theory of modern war. Perhaps the key thing is remembering that sufficiently large quantitative changes can make for qualitative macro effects. For example, basic elements of modern life include transport, communication, energy, comput



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!