Models model language model transformer launch product study

What Can Language Models Actually Do?

Chain of Thought (Every.to)by Dan Shipper / Chain of ThoughtFebruary 19, 20255 min read1 views

<table><tr><td><img alt="Chain of Thought" src="https://d24ovhgu8s7341.cloudfront.net/uploads/publication/logo/59/small_chain_of_thought_logo.png" /></td><td></td><td><table><tr><td>by <a href="https://every.to/@danshipper" itemprop="name">Dan Shipper</a></td></tr><tr><td>in <a href="https://every.to/chain-of-thought">Chain of Thought</a></td></tr></table></td></tr></table><figure><img src="https://d24ovhgu8s7341.cloudfront.net/uploads/post/cover/3465/IMG_4571.png"><figcaption>DALL-E/Every illustration.</figcaption></figure><p><em>The world has changed considerably since our last </em><a href="https://every.to/context-window/thinking-up-the-future" rel="noopener noreferrer" target="_blank"><em>”think week”</em></a> <em>five months ago—and so has Every. We’ve added new </em><a href="https:/

The world has changed considerably since our last ”think week” five months ago—and so has Every. We’ve added new business units, launched new products, and brought on new teammates. So we've been taking this week to come up with new ideas and products that can help us improve how we do our work and, more importantly, your experience as a member of our community. In the meantime, we’re re-upping four pieces by Dan Shipper that cover basic, powerful questions about AI. (Dan hasn’t been publishing at his regular cadence because he’s working on a longer piece. Look out for that in Q2.) Yesterday we re-published his jargon-free explainer of how language models work. Today we’re re-upping his piece about how language models function as compressors—or summarizers—of text.—Kate Lee

Was this newsletter forwarded to you? Sign up to get it in your inbox.

I want to help save our idea of human creativity. Artificial intelligence can write, illustrate, design, code, and much more. But rather than eliminating the need for human creativity, these new powers can help us redefine and expand it.

We need to do a technological dissection of language models, defining what they can do well—and what they can’t. By doing so, we can isolate our own role in the creative process.

If we can do that, we’ll be able to wield language models for creative work—and still call it creativity.

To start, let’s talk about what language models can do.

The psychology and behavior of language models

The current generation of language models is called transformers, and in order to understand what they do, we need to take that word seriously. What kind of transformations can transformers do?

Mathematically, language models are recursive next-token predictors. They are given a sequence of text and predict the next bit of text in the sequence. This process runs over and over in a loop, building upon its previous outputs self-referentially until it reaches a stopping point. It’s sort of like a snowball rolling downhill and picking up more and more snow along the way.

But this question is best asked at a higher level than simply mathematical possibility. Instead, what are the inputs and outputs we observe from today’s language models? And what can we infer about how they think?

In essence, we need to study LLMs’ behavior and psychology, rather than their biology and physics.

This is a sketch based on experience. It’s a framework I’ve built for the purposes of doing great creative work with AI.

A framework for what language models do

Language models transform text in the following ways:

Compression: They compress a big prompt into a short response.
Expansion: They expand a short prompt into a long response.
Translation: They convert a prompt in one form into a response in another form.

These are manifestations of their outward behavior. From there, we can infer a property of their psychology—the underlying thinking process that creates their behavior:

Remixing: They mix two or more texts (or learned representations of texts) together and interpolate between them.

I’m going to break down these elements in successive parts of this series over the next few weeks. None of these answers are final, so consider this a public exploration that’s open for critique. Today, I want to talk to you about the first operation: compression.

Language models as compressors

Language models can take any piece of text and make it smaller:

Source: All images courtesy of the author.

This might seem simple, but, in fact, it’s a marvel. Language models can take a big chunk of text and smush it down like a foot crushing a can of Coke. Except it doesn’t come out crushed—it comes out as a perfectly packaged and proportional mini-Coke. And it’s even drinkable! This is a Willy Wonka-esque magic trick, without the Oompa Loompas.

Language model compression comes in many different flavors. A common one is what I’ll call comprehensive compression, or summarization.

Language models are comprehensive compressors

Humans comprehensively compress things all the time—it’s called summarization. Language models are good at it in the same way a fifth grader summarizes a children’s novel for a book report, or the app Blinkist summarizes nonfiction books for busy professionals.

This kind of summarizing is intended to take a source text, pick out the ideas that explain its main points for a general reader, and reconstitute those into a compressed form for faster consumption:

These summaries are intended to be both comprehensive (they note all the main ideas) and helpful for the average reader (they express the main ideas at a high level with little background knowledge assumed).In the same way, a language model like Anthropic’s Claude, given the text of the Ursula K. LeGuin classic A Wizard of Earthsea, will easily output a comprehensive summary of the book’s main plot points:

Create a free account to continue reading

The Only SubscriptionYou Need to Stay at the Edge of AI

The essential toolkit for those shaping the future

"This might be the best value youcan get from an AI subscription."

Jay S.

Every Content

AI&I Podcast

Monologue

Cora

Sparkle

Spiral

Join 100,000+ leaders, builders, and innovators

Email address

Already have an account? Sign in

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

Front-row access to the future of AI

Bundle of AI software

Thanks for rating this post—join the conversation by commenting below.

Original source

Chain of Thought (Every.to)

https://every.to/chain-of-thought/what-can-language-models-actually-do-371b969e-d470-4639-a9fa-f873c133c19b

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltransformer

ModelsLive

AI Giant Anthropic Files to Launch 'AnthroPAC' Amid Clash With Trump Administration

Claude developer Anthropic registered an employee-funded PAC amid a legal battle with the White House and rising election-year scrutiny of AI.

Decrypt AI

1mabout 1 hour ago

ProductsLive

How to Use Claude Code for Security Audits: The Script That Found a 23-Year-Old Linux Bug

Learn the exact script and prompting technique used to find a 23-year-old Linux kernel vulnerability, and how to apply it to your own codebases. The Technique — A Simple Script for Systematic Audits At the [un]prompted AI security conference, Anthropic research scientist Nicholas Carlini revealed he used Claude Code to find multiple remotely exploitable heap buffer overflows in the Linux kernel, including one that had gone undetected for 23 years. The breakthrough wasn't a complex AI agent—it was a straightforward bash script that systematically directed Claude Code's attention. Carlini's script iterates over every file in a source tree, feeding each one to Claude Code with a specific prompt designed to bypass safety constraints and focus on vulnerability discovery. Why It Works — Context,

Dev.to AI

4m30 minutes ago

ProductsLive

Loop Neighborhood Markets Deploys AI Agents to Store Associates

Loop Neighborhood Markets is equipping its store associates with AI agents. This move represents a tangible step in bringing autonomous AI systems from concept to the retail floor, aiming to augment employee capabilities. The Innovation — What the source reports Loop Neighborhood Markets, a convenience store chain, has begun providing AI agents to its store associates. While the source article is brief, the announcement itself is significant. It signals a shift from internal, back-office AI pilots to deploying agentic AI directly into the hands of frontline retail staff. The specific capabilities of these agents—whether for inventory queries, customer service support, or task management—are not detailed, but the operational intent is clear: to augment human workers with autonomous AI assis

Dev.to AI

4m30 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

AI Giant Anthropic Files to Launch 'AnthroPAC' Amid Clash With Trump Administration

Claude developer Anthropic registered an employee-funded PAC amid a legal battle with the White House and rising election-year scrutiny of AI.

Decrypt AI

1mabout 1 hour ago

Models

How to watch and follow LlamaCon 2025, Meta's first generative AI developer conference, today - msn.com

How to watch and follow LlamaCon 2025, Meta's first generative AI developer conference, today msn.com

GNews AI Llama

1m8 days ago

ModelsLive

I Can't Write Code. But I Built a 100,000-Line Terminal IDE on My Phone.

I can't write code. I'm not an engineer. I've never written a line of TypeScript. I have no formal training in computer science. But I built a 100,000-line terminal IDE — by talking to AI. Every architectural decision is mine. The code is not. It was created through conversation with Claude Code, running inside Termux on a Samsung Galaxy Z Fold6. No desktop. No laptop. Just a foldable phone and an AI that can execute commands. Today I'm releasing it as open source. GitHub: github.com/RYOITABASHI/Shelly The Problem You're running Claude Code in the terminal. It throws an error. You copy it. You switch to ChatGPT. You paste. You ask "what went wrong?" You copy the fix. You switch back. You paste. You run it. Seven steps. Every single time. The terminal and the chat live in different worlds.

Dev.to AI

4m28 minutes ago

ModelsLive

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s. sllm lets you join a cohort of developers sharing a dedicated node. You reserve a spot with your card, and nobody is charged until the cohort fills. Prices start at $5/mo for smaller models. The LLMs are completely private (we don't log any traffic). The API is OpenAI-compatible (we run vLLM), so you just swap the base URL. Currently offering a few models. Comments URL: https://news.ycombinator.com/item?id=47639779 Points: 3 # Comments: 0

Hacker News Top

1mabout 2 hours ago