Activation Function Ablation
An ablation of activation functions in GPT-like autoregressive language models.
This was an ablation of activation functions on GPT-like models of ~100M params that I ran ages ago. Each model was run for 10k iters, which isn't very long. My original goal was to show that activation function doesn't matter than much, but to do so I'd need to run a bunch more runs to get variance and show no statistical significance, and I don't plan on running a more exhaustive version of this experiment any time soon. So, I'm just dumping these results here in case anyone has any use for them. All the activation definitions are here.
Name Pile Validation BPB LAMBADA acc LAMBADA ppl
softsign 1.1485 34.3 81.32
ReLU 1.1482 34.3 82.01
spike2 1.1480 34.4 83.13
selu 1.1485 34.5 83.32
elish 1.1492 33.9 84.04
tanhexp 1.1474 33.7 84.06
sigmoid 1.1484 33.9 85.20
tanhshrink 1.1483 33.9 85.42
maxtanh 1.1479 33.7 85.53
roottanh 1.1485 33.4 86.00
softplusmone 1.1488 34.1 86.21
logsoftmax 1.1492 34.2 86.29
ELU 1.1496 33.8 86.37
Swish 1.1482 33.7 86.42
softmax 1.1491 33.2 86.74
square_relax 1.1484 33.5 86.92
lisht 1.1500 33.8 87.17
GELU 1.1453 34.0 87.84
abs 1.1489 33.5 87.96
tanh 1.1481 33.2 89.28
Mish 1.1482 33.6 89.84
triangle_relax 1.1502 33.7 89.91
seagull 1.1487 33.3 90.08
maxsig 1.1480 33.3 90.23
softplus 1.1460 33.1 90.74
minsin 1.1498 33.3 91.18
snake 1.1484 33.1 91.93
cosid 1.1490 33.3 92.99
spike 1.1498 33.3 93.78
bipolarsigmoid 1.1513 32.8 96.73
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelBeyond Human Wisdom: Can Humanity Survive the Rise of AGI?
𝙰 𝙿𝙴𝚁𝙵𝙾𝚁𝙼𝙰𝙽𝙲𝙴 𝙰𝚁𝚃/𝚁𝙴𝚂𝙴𝙰𝚁𝙲𝙷 𝙿𝚁𝙾𝙹𝙴𝙲𝚃 𝙱𝚈 𝙲𝙷𝚁𝙸𝚂 𝙻𝙴𝙾𝙽𝙶 [1] 𝙽𝙾𝚃 𝚃𝙾 𝙱𝙴 𝚃𝙰𝙺𝙴𝙽 𝚂𝙴𝚁𝙸𝙾𝚄𝚂𝙻𝚈 . 𝙵𝙾𝚁 𝚁𝙴𝙰𝙻 Note: Apologies, some of the formatting seems messed up, likely due to the new editor. Unfortunately, I don't have time to fix this until after I return from camping. ❦ a story: raw human wisdom is vastly insufficient for the task before us... of navigating an entire series of highly uncertain and deeply contested decisions where a single mistake could prove ruinous with a greatly compressed timeline ❦ skip to the "real content" Literary Reflection - Vanity of vanities! All is vanity! - April 1st Bonus DLC ✧ so this is my final output or maybe not? it's pretty much impossible to know either way. ■ what I'm hearing is that your ERA
React 20 Is Coming. Here's What Actually Matters (and What Doesn't).
<h1> React 20 Is Coming: Here's What Actually Matters (and What Doesn't) </h1> <p>Let's be honest. Every time a major framework version is on the horizon, a little knot forms in our stomachs. "Oh no, another paradigm shift? Am I going to have to re-learn everything?" We've all been there, staring at an announcement, wondering if our existing codebase is about to become a legacy nightmare overnight. It's a valid feeling in our fast-paced industry.</p> <p>But here’s the unvarnished truth about "React 20": For most professional developers and engineering teams, the impending updates are far less about a complete rewrite of your mental model, and far more about a profound, subtle evolution that will deliver tangible benefits in performance, developer experience, and maintainability. It’s not a
CancerLLM: a large language model in cancer domain - npj Digital Medicine - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE1WaVpBQ2o5ZWdBdC1vVTlITXAyZDZnUGVFdmNfRHZPcUFaYXBNUXhSSDk0Q0pZTENKb1NWZGtULTVRdU9zZUR5a0ZPNktpQl9fbUxNZ2J0dWkxc0lwVjFz?oc=5" target="_blank">CancerLLM: a large language model in cancer domain - npj Digital Medicine</a> <font color="#6f6f6f">Nature</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT - AiThority
<a href="https://news.google.com/rss/articles/CBMi1AFBVV95cUxQVkxhdm9KX0NMdWRwQ2pmUy1XMk1ocUtTYWt5WVpRU2M1NWNlcmZfSjhHb2hjM2FIcmhsM0lkZURja2hZcU1obi1tajlyV0NCQ2c5VTZreVVSTms0bVZpSnJRQ2x5UlA3YUIzOUQ1aHBVV2ZqTldsc25hSWNLVXdCVDhxMUxYZmRETzZxSEJmXzJZNUFpTW44NXVLeGJsbk02UVRhTHVLamh3aGhBVUM4YjFWbERjTzM5M3gzUklvbUlRVXgzeERjSnN3Sm5JbzBhMnNYOQ?oc=5" target="_blank">airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT</a> <font color="#6f6f6f">AiThority</font>
SeatGeek launches its app in ChatGPT - IQ Magazine
<a href="https://news.google.com/rss/articles/CBMifEFVX3lxTE12cjR3RUJjYlpUVXJ6RUpvV2tLUEpIWV9LOHJsV1lyTDNRdTNVUzI2Q1RtY1dYTTZQdnN3cGhSNzNnd1dSamNFbHJWRHE0WFF3R3QyS0tiOGVmRktId1otRzdmNUpQYjlBd3VNdXlYRTNydWdybzlpZG9JSGI?oc=5" target="_blank">SeatGeek launches its app in ChatGPT</a> <font color="#6f6f6f">IQ Magazine</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxPa0JGZTRUNGdpem5GcmF1MWoxTllsMFF5VHFxS2xxaVNtcGMwMnpHdTQ0Z01OMEJvcHJ5cW5OSlhKTVJDUkhpVFAwam9mTEJwWU9pQ0NGcW8wTEdRTHZ2ek9WcTdHRUxXZmZLSHJsTy1uTXZtSERNUTRzNWR5RWc3V2pFdTRjalduUDRmX2V6eWFPZGxSeDQtM2RjZTFHV0RuLTJhLWNFMVdCQzl1Z29ZQVIwSkdDMUJoc2dxUzVHNDQyUlVzd2VkT0VJTFZwSWFBSW42VERkZzg1Um9RWmVHQ3UwcmY5LXdhN3hVSlZ5R1BEVUdmSHZCNXdGdG9jR2s5d252SUNhamhTaGZEakxZMzRiekp0VDVqek1jVXQ4S0hhY0JGNEdrWWpiY0cwOU1QTEpBcFBDSmxYYl9vOWNtWnBEcy03MFo3bzk1c0VaUkhzVE5Fc1JOVUgtN2EyajN5cm9ka1BPMThPdEhKYU9qcEE4RHk4RGRDbks2UVQxMFNBRXpOSUhYRThHUTdCbWxvXzR1c0NGYWJpVEdjeFo3MGx3?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>
CancerLLM: a large language model in cancer domain - npj Digital Medicine - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE1WaVpBQ2o5ZWdBdC1vVTlITXAyZDZnUGVFdmNfRHZPcUFaYXBNUXhSSDk0Q0pZTENKb1NWZGtULTVRdU9zZUR5a0ZPNktpQl9fbUxNZ2J0dWkxc0lwVjFz?oc=5" target="_blank">CancerLLM: a large language model in cancer domain - npj Digital Medicine</a> <font color="#6f6f6f">Nature</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!