Research Papers claude open-source review cursor paper research

Academic Proof-of-Work in the Age of LLMs

LessWrong AIby LawrenceCApril 5, 20265 min read0 views

Written quickly as part of the Inkhaven Residency . Related: Bureaucracy as active ingredient , pain as active ingredient A widely known secret in academia is that many of the formalities serve in large part proof of work . That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engaging in research enough to understand its quality can be expensive. As a result, people look toward signs of visible, expensive effort in order to determine whether to engage in the research at all. Why do people insist only on reading research that’s published in well-formatted, well-written pa

Written quickly as part of the Inkhaven Residency.

Related: Bureaucracy as active ingredient, pain as active ingredient

A widely known secret in academia is that many of the formalities serve in large part proof of work. That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engaging in research enough to understand its quality can be expensive. As a result, people look toward signs of visible, expensive effort in order to determine whether to engage in the research at all.

Why do people insist only on reading research that’s published in well-formatted, well-written papers, as opposed to looking at random blog posts? Part of the answer is that good writing and formatting makes the research easier to digest, and another part is that investing the time to properly write up your results often causes the results to improve. But part of the answer is proof-of-work: surely, if your research is good, you’d be willing to put in the 30-40 hours to do the required experiments and format it nicely as a paper?

Similarly, why do fields often insist on experiments beyond their scientific value? For example, why does machine learning often insist that people do expensive empirical experiments even for theory papers. Of course, part of the answer is that it’s easy to generate theoretical results that have no connection to reality. But another part of the answer is that doing the empirical experiments serves as the required proof of work; implementing anything on even a medium sized open-source LLM is hard, but surely you’d invest the effort if you believed enough in your idea? (This helps explain the apparently baffling observation that many of the empirical results in theoretical papers have little relevance to the correctness or even the applicability of the theoretical results.)

Other aspects of ML academia – the beautifully polished figures[1], the insistence on citing the relevant papers to show knowledge of the field, and so forth – also exist in part to serve as a proof-of-work filter for quality.

In a sense, this is one of the reasons academia is great. In the absence of a proof-of-work system, the default would be something closer to proof-of-stake: that is, some form of reputational system based on known, previously verified accomplishments. While proof-of-work filters can be wasteful, they nonetheless allow new, unknown researchers to enter the field and contribute (assuming they invest the requisite effort).

An obvious problem with this entire setup is that LLMs exist, and what was once expensive is now cheap. While previously, good writing was expensive, LLMs allow anyone to produce seemingly coherent, well-argued English text. While it was once quite expensive to produce ML code, current LLMs produce seemingly correct code for experiments quickly. And the same is true for most of the proof-of-work signifiers that academia used to depend on: any frontier LLM can produce beautifully formatted figures in matplotlib, cite relevant work (or at least convincingly hallucinate citations), and produce long mathematical arguments.

I’ve observed this myself in actual ML conference contexts. In the past, crackpot papers were relatively easily to identify. But in the last year, I’ve seen at least one crackpot paper get past other peer reviewers through a combination of dense mathematical jargon and an expansive code base that was hardcoded to produce the desired results. Specifically, while the reviewers knew that they didn't fully understand the mathematical results, they assumed that this was due to their lack of knowledge, instead of the results themselves being wrong. And since the codebase passed the cursory review given to it by the other reviewers, they did not investigate it deeply enough to notice the hardcoding.[2]

In a sense, this is no different than the problems introduced by AI in other contexts, and I’m not sure there’s a better solution than to fall back to previous proof-of-stake–like reputation systems.[3] At the very least, I find it hard not to engage with new, seemingly-exciting results from unknown researchers without a high degree of skepticism.

This makes me sad, but I'm not sure there's a real solution here.

Especially the proliferation of beautiful "figure one"s that encapsulate the paper's core ideas and results in a single figure.

In fact, it took me about an hour to decide that the paper's results were simply wrong as opposed to confusing. Thankfully, in this case, the paper's problems were obvious enough that I could point at e.g. specific hardcoded results to the other reviewers, (and the paper was not accepted for publication) but there's no guarantee that this would always be the case.

Of course, there are other possibilities that less pessimistic people would no doubt point to: for example, there could be a shift toward proof-of-work setups that are LLM resistant, or we could rely on LLMs to do the filtering instead. But insofar as LLMs are good at replicating all cognitively shallow human effort, then I don't imagine there are going to be any proof-of-work setups that would continue to work as LLMs get better. And I personally feel pretty sad about delegating all of my input to Claude.

Original source

LessWrong AI

https://www.lesswrong.com/posts/Tfixo2RhNXgHzLwZx/academic-proof-of-work-in-the-age-of-llms

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudeopen-sourcereview

Research PapersRecent

This Wi-Fi receiver can work inside a nuclear reactor, keeping robots connected

The research, presented at the IEEE International Solid-State Circuits Conference in San Francisco earlier this year, shows the receiver can continue operating after exposure to 500 kilograys of radiation. That level of endurance far exceeds what even space-grade electronics are designed to handle. Read Entire Article

TechSpot

1mabout 15 hours ago

ModelsRecent

Human context missing: AI benchmarks are flawed, researcher explains why - digit.in

Human context missing: AI benchmarks are flawed, researcher explains why digit.in

GNews AI benchmark

1m2 days ago

ModelsLive

AI offensive cyber capabilities are doubling every six months, safety researchers find

AI models are rapidly improving at exploiting security vulnerabilities. According to a new study, their offensive cyber capability has been doubling every 5.7 months since 2024, with Opus 4.6 and GPT-5.3 Codex now solving tasks that take human experts about three hours. The article AI offensive cyber capabilities are doubling every six months, safety researchers find appeared first on The Decoder .

The Decoder

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Academic Proof-of-Work in the Age of LLMs

Daily AI Digest

More about

This Wi-Fi receiver can work inside a nuclear reactor, keeping robots connected

Human context missing: AI benchmarks are flawed, researcher explains why - digit.in

AI offensive cyber capabilities are doubling every six months, safety researchers find

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

This Wi-Fi receiver can work inside a nuclear reactor, keeping robots connected

AI Music & Creators Conference - Bennett College

Can space solve AI's crisis? Oracle cuts 30,000 workers while half of Earth projects remain stuck - Cryptopolitan

Apple Machine Learning Research at NeurIPS 2025 - Apple Machine Learning Research