Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business$1,700 liquid-cooled phone can run GTA V at up to 100 FPS, Red Dead 2 at 50+ FPS via emulation — Redmagic 11 Pro packs 24 GB of RAM and pulls more than 40W at peak loadtomshardware.comGoogle's Head Of Learning Says AI Can't Solve Education's Real Problem - ForbesGNews AI GoogleApple approves drivers that let AMD and Nvidia eGPUs run on Mac — software designed for AI, though, and not built for gamingtomshardware.comShow HN: Vektor – local-first associative memory for AI agentsHacker News AI TopWorkers are feeling AI anxiety — and that they might be training their replacementsBusiness InsiderYour neighbor just got a home security system, but should you be worried? It s inherently a little creepy says surveillance expertFortune TechWe Asked A.I. to Build Us a Video Game. The Result Was Strange. - slate.comGoogle News: AISteam geeft mogelijk indicatie van framerate games op je specifieke hardwareTweakers.netStudy: The AI Body GapHacker News AI TopI’m Worried About the Helpless AI Disruptors of the Future - GizmodoGoogle News: AII m Worried About the Helpless AI Disruptors of the FutureGizmodoDid Strong Earnings and AI Cooling Momentum Just Shift Modine Manufacturing's (MOD) Investment Narrative? - simplywall.stGNews AI manufacturingBlack Hat USAAI BusinessBlack Hat AsiaAI Business$1,700 liquid-cooled phone can run GTA V at up to 100 FPS, Red Dead 2 at 50+ FPS via emulation — Redmagic 11 Pro packs 24 GB of RAM and pulls more than 40W at peak loadtomshardware.comGoogle's Head Of Learning Says AI Can't Solve Education's Real Problem - ForbesGNews AI GoogleApple approves drivers that let AMD and Nvidia eGPUs run on Mac — software designed for AI, though, and not built for gamingtomshardware.comShow HN: Vektor – local-first associative memory for AI agentsHacker News AI TopWorkers are feeling AI anxiety — and that they might be training their replacementsBusiness InsiderYour neighbor just got a home security system, but should you be worried? It s inherently a little creepy says surveillance expertFortune TechWe Asked A.I. to Build Us a Video Game. The Result Was Strange. - slate.comGoogle News: AISteam geeft mogelijk indicatie van framerate games op je specifieke hardwareTweakers.netStudy: The AI Body GapHacker News AI TopI’m Worried About the Helpless AI Disruptors of the Future - GizmodoGoogle News: AII m Worried About the Helpless AI Disruptors of the FutureGizmodoDid Strong Earnings and AI Cooling Momentum Just Shift Modine Manufacturing's (MOD) Investment Narrative? - simplywall.stGNews AI manufacturing
AI NEWS HUBbyEIGENVECTOREigenvector

Academic Proof-of-Work in the Age of LLMs

LessWrong AIby LawrenceCApril 5, 20265 min read0 views
Source Quiz

Written quickly as part of the Inkhaven Residency . Related: Bureaucracy as active ingredient , pain as active ingredient A widely known secret in academia is that many of the formalities serve in large part proof of work . That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engaging in research enough to understand its quality can be expensive. As a result, people look toward signs of visible, expensive effort in order to determine whether to engage in the research at all. Why do people insist only on reading research that’s published in well-formatted, well-written pa

Written quickly as part of the Inkhaven Residency.

Related: Bureaucracy as active ingredient, pain as active ingredient

A widely known secret in academia is that many of the formalities serve in large part proof of work. That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engaging in research enough to understand its quality can be expensive. As a result, people look toward signs of visible, expensive effort in order to determine whether to engage in the research at all.

Why do people insist only on reading research that’s published in well-formatted, well-written papers, as opposed to looking at random blog posts? Part of the answer is that good writing and formatting makes the research easier to digest, and another part is that investing the time to properly write up your results often causes the results to improve. But part of the answer is proof-of-work: surely, if your research is good, you’d be willing to put in the 30-40 hours to do the required experiments and format it nicely as a paper?

Similarly, why do fields often insist on experiments beyond their scientific value? For example, why does machine learning often insist that people do expensive empirical experiments even for theory papers. Of course, part of the answer is that it’s easy to generate theoretical results that have no connection to reality. But another part of the answer is that doing the empirical experiments serves as the required proof of work; implementing anything on even a medium sized open-source LLM is hard, but surely you’d invest the effort if you believed enough in your idea? (This helps explain the apparently baffling observation that many of the empirical results in theoretical papers have little relevance to the correctness or even the applicability of the theoretical results.)

Other aspects of ML academia – the beautifully polished figures[1], the insistence on citing the relevant papers to show knowledge of the field, and so forth – also exist in part to serve as a proof-of-work filter for quality.

In a sense, this is one of the reasons academia is great. In the absence of a proof-of-work system, the default would be something closer to proof-of-stake: that is, some form of reputational system based on known, previously verified accomplishments. While proof-of-work filters can be wasteful, they nonetheless allow new, unknown researchers to enter the field and contribute (assuming they invest the requisite effort).

An obvious problem with this entire setup is that LLMs exist, and what was once expensive is now cheap. While previously, good writing was expensive, LLMs allow anyone to produce seemingly coherent, well-argued English text. While it was once quite expensive to produce ML code, current LLMs produce seemingly correct code for experiments quickly. And the same is true for most of the proof-of-work signifiers that academia used to depend on: any frontier LLM can produce beautifully formatted figures in matplotlib, cite relevant work (or at least convincingly hallucinate citations), and produce long mathematical arguments.

I’ve observed this myself in actual ML conference contexts. In the past, crackpot papers were relatively easily to identify. But in the last year, I’ve seen at least one crackpot paper get past other peer reviewers through a combination of dense mathematical jargon and an expansive code base that was hardcoded to produce the desired results. Specifically, while the reviewers knew that they didn't fully understand the mathematical results, they assumed that this was due to their lack of knowledge, instead of the results themselves being wrong. And since the codebase passed the cursory review given to it by the other reviewers, they did not investigate it deeply enough to notice the hardcoding.[2]

In a sense, this is no different than the problems introduced by AI in other contexts, and I’m not sure there’s a better solution than to fall back to previous proof-of-stake–like reputation systems.[3] At the very least, I find it hard not to engage with new, seemingly-exciting results from unknown researchers without a high degree of skepticism.

This makes me sad, but I'm not sure there's a real solution here.

  • ^

Especially the proliferation of beautiful "figure one"s that encapsulate the paper's core ideas and results in a single figure.

  • ^

In fact, it took me about an hour to decide that the paper's results were simply wrong as opposed to confusing. Thankfully, in this case, the paper's problems were obvious enough that I could point at e.g. specific hardcoded results to the other reviewers, (and the paper was not accepted for publication) but there's no guarantee that this would always be the case.

  • ^

Of course, there are other possibilities that less pessimistic people would no doubt point to: for example, there could be a shift toward proof-of-work setups that are LLM resistant, or we could rely on LLMs to do the filtering instead. But insofar as LLMs are good at replicating all cognitively shallow human effort, then I don't imagine there are going to be any proof-of-work setups that would continue to work as LLMs get better. And I personally feel pretty sad about delegating all of my input to Claude.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Academic Pr…claudeopen-sourcereviewcursorpaperresearchLessWrong AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!