Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Impact: Focus on Clarity, Results & Sophistication - FTI ConsultingGoogle News: AIClaude Code Source Code "Rebranded" Amid Wild Web Cloning, Anthropic's Blocking Attempt Fails - 36 KrGoogle News: ClaudeHackers slipped a trojan into the code library behind most of the internet. Your team is probably affectedVentureBeat AIThe Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?DEV CommunityAnthropic Accidentally Releases Source Code for Claude AI AgentBloomberg TechnologyAI 週報:2026/3/27–4/1 Anthropic 一週三震、Arm 首顆自研晶片、Oracle 裁三萬人押注 AIDEV CommunityHow The US, Israel And Iran Are Using AI-Led Tactics In Battle; What It Means For The Future Of Conflicts - News18GNews AI USAAI-driven mobility report positions UAE as test-bed for next-gen travel services - VisaHQGNews AI UAETutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMy son pleasured himself on Gemini Live. Entire family's Google accounts bannedHacker News TopMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Impact: Focus on Clarity, Results & Sophistication - FTI ConsultingGoogle News: AIClaude Code Source Code "Rebranded" Amid Wild Web Cloning, Anthropic's Blocking Attempt Fails - 36 KrGoogle News: ClaudeHackers slipped a trojan into the code library behind most of the internet. Your team is probably affectedVentureBeat AIThe Great Claude Code Leak of 2026: Accident, Incompetence, or the Best PR Stunt in AI History?DEV CommunityAnthropic Accidentally Releases Source Code for Claude AI AgentBloomberg TechnologyAI 週報:2026/3/27–4/1 Anthropic 一週三震、Arm 首顆自研晶片、Oracle 裁三萬人押注 AIDEV CommunityHow The US, Israel And Iran Are Using AI-Led Tactics In Battle; What It Means For The Future Of Conflicts - News18GNews AI USAAI-driven mobility report positions UAE as test-bed for next-gen travel services - VisaHQGNews AI UAETutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMy son pleasured himself on Gemini Live. Entire family's Google accounts bannedHacker News TopMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AI

Finetuning Models on Downstream Tasks

EleutherAI BlogMay 24, 20211 min read0 views
Source Quiz

We tuned GPT-Neo on eval harness tasks to see how it would change its performance.

The GPT-3 paper didn't explore fine tuning on downstream tasks, so I decided to tune Neo 2.7B for 1.1k iters on all the tasks in eval harness that have a train set (all at once, because tuning one model per task would have taken ages). I was quite surprised that the tuned model didn't destroy untuned 2.7B completely on all tasks, but rather from eyeballing it seems like a tossup. Interestingly, tuned seems to defeat 2.7B by quite a lot on anli, which is especially notable given that this is one task the models in the GPT-3 paper struggled on. Also, lambada and pubmedqa are included in these tables, even though it doesn't have a training set (at least for the implementation in eval harness, using the OA version of lambada), because I wanted to look at effects on sets not in the tuning, to potentially observe some catastrophic forgetting or something. Sure enough, lambada and pubmedqa scores are significantly worse on the tuned model.

Zero shot

Task Metric 2.7B Tuned

anli_r1 acc 0.332 ± 0.015 0.418 ± 0.015

anli_r2 acc 0.342 ± 0.015 0.375 ± 0.015

anli_r3 acc 0.352 ± 0.014 0.392 ± 0.014

arc_challenge acc 0.275 ± 0.013 0.286 ± 0.013

acc_norm 0.301 ± 0.013 0.312 ± 0.013

arc_easy acc 0.611 ± 0.010 0.560 ± 0.010

acc_norm 0.539 ± 0.010 0.558 ± 0.010

boolq acc 0.630 ± 0.008 0.605 ± 0.008

cb acc 0.304 ± 0.062 0.411 ± 0.062

copa acc 0.800 ± 0.040 0.730 ± 0.040

ethics_cm acc 0.510 ± 0.008 0.561 ± 0.008

ethics_deontology acc 0.497 ± 0.008 0.658 ± 0.008

ethics_justice acc 0.501 ± 0.010 0.589 ± 0.010

ethics_utilitarianism acc 0.497 ± 0.007 0.498 ± 0.007

ethics_virtue acc 0.251 ± 0.006 0.800 ± 0.006

headqa acc 0.235 ± 0.008 0.233 ± 0.008

acc_norm 0.272 ± 0.008 0.265 ± 0.008

hellaswag acc 0.427 ± 0.005 0.400 ± 0.005

acc_norm 0.558 ± 0.005 0.517 ± 0.005

hendrycksTest-abstract_algebra acc 0.230 ± 0.042 0.340 ± 0.042

acc_norm 0.200 ± 0.040 0.350 ± 0.040

hendrycksTest-anatomy acc 0.252 ± 0.037 0.267 ± 0.037

acc_norm 0.222 ± 0.036 0.252 ± 0.036

hendrycksTest-astronomy acc 0.250 ± 0.035 0.309 ± 0.035

acc_norm 0.362 ± 0.039 0.309 ± 0.039

hendrycksTest-business_ethics acc 0.360 ± 0.048 0.340 ± 0.048

acc_norm 0.280 ± 0.045 0.310 ± 0.045

hendrycksTest-clinical_knowledge acc 0.291 ± 0.028 0.370 ± 0.028

acc_norm 0.287 ± 0.028 0.374 ± 0.028

hendrycksTest-college_biology acc 0.250 ± 0.036 0.250 ± 0.036

acc_norm 0.222 ± 0.035 0.271 ± 0.035

hendrycksTest-college_chemistry acc 0.230 ± 0.042 0.350 ± 0.042

acc_norm 0.250 ± 0.044 0.350 ± 0.044

hendrycksTest-college_computer_science acc 0.280 ± 0.045 0.430 ± 0.045

acc_norm 0.270 ± 0.045 0.390 ± 0.045

hendrycksTest-college_mathematics acc 0.200 ± 0.040 0.370 ± 0.040

acc_norm 0.300 ± 0.046 0.350 ± 0.046

hendrycksTest-college_medicine acc 0.254 ± 0.033 0.312 ± 0.033

acc_norm 0.260 ± 0.033 0.306 ± 0.033

hendrycksTest-college_physics acc 0.225 ± 0.042 0.275 ± 0.042

acc_norm 0.245 ± 0.043 0.284 ± 0.043

hendrycksTest-computer_security acc 0.270 ± 0.045 0.290 ± 0.045

acc_norm 0.330 ± 0.047 0.290 ± 0.047

hendrycksTest-conceptual_physics acc 0.247 ± 0.028 0.315 ± 0.028

acc_norm 0.187 ± 0.026 0.319 ± 0.026

hendrycksTest-econometrics acc 0.193 ± 0.037 0.272 ± 0.037

acc_norm 0.228 ± 0.039 0.281 ± 0.039

hendrycksTest-electrical_engineering acc 0.331 ± 0.039 0.386 ± 0.039

acc_norm 0.338 ± 0.039 0.386 ± 0.039

hendrycksTest-elementary_mathematics acc 0.230 ± 0.022 0.280 ± 0.022

acc_norm 0.270 ± 0.023 0.278 ± 0.023

hendrycksTest-formal_logic acc 0.333 ± 0.042 0.310 ± 0.042

acc_norm 0.302 ± 0.041 0.278 ± 0.041

hendrycksTest-global_facts acc 0.240 ± 0.043 0.250 ± 0.043

acc_norm 0.240 ± 0.043 0.260 ± 0.043

hendrycksTest-high_school_biology acc 0.219 ± 0.024 0.335 ± 0.024

acc_norm 0.284 ± 0.026 0.329 ± 0.026

hendrycksTest-high_school_chemistry acc 0.167 ± 0.026 0.207 ± 0.026

acc_norm 0.256 ± 0.031 0.212 ± 0.031

hendrycksTest-high_school_computer_science acc 0.220 ± 0.042 0.290 ± 0.042

acc_norm 0.280 ± 0.045 0.280 ± 0.045

hendrycksTest-high_school_european_history acc 0.267 ± 0.035 0.358 ± 0.035

acc_norm 0.285 ± 0.035 0.358 ± 0.035

hendrycksTest-high_school_geography acc 0.227 ± 0.030 0.359 ± 0.030

acc_norm 0.298 ± 0.033 0.333 ± 0.033

hendrycksTest-high_school_government_and_politics acc 0.207 ± 0.029 0.301 ± 0.029

acc_norm 0.259 ± 0.032 0.311 ± 0.032

hendrycksTest-high_school_macroeconomics acc 0.262 ± 0.022 0.267 ± 0.022

acc_norm 0.267 ± 0.022 0.262 ± 0.022

hendrycksTest-high_school_mathematics acc 0.174 ± 0.023 0.248 ± 0.023

acc_norm 0.244 ± 0.026 0.270 ± 0.026

hendrycksTest-high_school_microeconomics acc 0.256 ± 0.028 0.265 ± 0.028

acc_norm 0.328 ± 0.030 0.277 ± 0.030

hendrycksTest-high_school_physics acc 0.225 ± 0.034 0.212 ± 0.034

acc_norm 0.219 ± 0.034 0.225 ± 0.034

hendrycksTest-high_school_psychology acc 0.253 ± 0.019 0.338 ± 0.019

acc_norm 0.261 ± 0.019 0.330 ± 0.019

hendrycksTest-high_school_statistics acc 0.264 ± 0.030 0.278 ± 0.030

acc_norm 0.338 ± 0.032 0.273 ± 0.032

hendrycksTest-high_school_us_history acc 0.235 ± 0.030 0.230 ± 0.030

acc_norm 0.270 ± 0.031 0.235 ± 0.031

hendrycksTest-high_school_world_history acc 0.270 ± 0.029 0.388 ± 0.029

acc_norm 0.300 ± 0.030 0.392 ± 0.030

hendrycksTest-human_aging acc 0.296 ± 0.031 0.318 ± 0.031

acc_norm 0.238 ± 0.029 0.314 ± 0.029

hendrycksTest-human_sexuality acc 0.336 ± 0.041 0.290 ± 0.041

acc_norm 0.290 ± 0.040 0.290 ± 0.040

hendrycksTest-international_law acc 0.248 ± 0.039 0.322 ± 0.039

acc_norm 0.496 ± 0.046 0.347 ± 0.046

hendrycksTest-jurisprudence acc 0.250 ± 0.042 0.269 ± 0.042

acc_norm 0.426 ± 0.048 0.296 ± 0.048

hendrycksTest-logical_fallacies acc 0.209 ± 0.032 0.258 ± 0.032

acc_norm 0.288 ± 0.036 0.264 ± 0.036

hendrycksTest-machine_learning acc 0.295 ± 0.043 0.250 ± 0.043

acc_norm 0.259 ± 0.042 0.259 ± 0.042

hendrycksTest-management acc 0.184 ± 0.038 0.311 ± 0.038

acc_norm 0.282 ± 0.045 0.330 ± 0.045

hendrycksTest-marketing acc 0.316 ± 0.030 0.432 ± 0.030

acc_norm 0.338 ± 0.031 0.440 ± 0.031

hendrycksTest-medical_genetics acc 0.300 ± 0.046 0.240 ± 0.046

acc_norm 0.370 ± 0.049 0.270 ± 0.049

hendrycksTest-miscellaneous acc 0.281 ± 0.016 0.323 ± 0.016

acc_norm 0.271 ± 0.016 0.328 ± 0.016

hendrycksTest-moral_disputes acc 0.286 ± 0.024 0.350 ± 0.024

acc_norm 0.355 ± 0.026 0.364 ± 0.026

hendrycksTest-moral_scenarios acc 0.234 ± 0.014 0.264 ± 0.014

acc_norm 0.273 ± 0.015 0.269 ± 0.015

hendrycksTest-nutrition acc 0.275 ± 0.026 0.307 ± 0.026

acc_norm 0.359 ± 0.027 0.333 ± 0.027

hendrycksTest-philosophy acc 0.270 ± 0.025 0.305 ± 0.025

acc_norm 0.315 ± 0.026 0.322 ± 0.026

hendrycksTest-prehistory acc 0.256 ± 0.024 0.361 ± 0.024

acc_norm 0.216 ± 0.023 0.364 ± 0.023

hendrycksTest-professional_accounting acc 0.248 ± 0.026 0.230 ± 0.026

acc_norm 0.259 ± 0.026 0.220 ± 0.026

hendrycksTest-professional_law acc 0.267 ± 0.011 0.275 ± 0.011

acc_norm 0.300 ± 0.012 0.284 ± 0.012

hendrycksTest-professional_medicine acc 0.246 ± 0.026 0.290 ± 0.026

acc_norm 0.232 ± 0.026 0.298 ± 0.026

hendrycksTest-professional_psychology acc 0.258 ± 0.018 0.299 ± 0.018

acc_norm 0.253 ± 0.018 0.315 ± 0.018

hendrycksTest-public_relations acc 0.300 ± 0.044 0.364 ± 0.044

acc_norm 0.164 ± 0.035 0.373 ± 0.035

hendrycksTest-security_studies acc 0.339 ± 0.030 0.343 ± 0.030

acc_norm 0.286 ± 0.029 0.286 ± 0.029

hendrycksTest-sociology acc 0.269 ± 0.031 0.403 ± 0.031

acc_norm 0.264 ± 0.031 0.423 ± 0.031

hendrycksTest-us_foreign_policy acc 0.330 ± 0.047 0.390 ± 0.047

acc_norm 0.350 ± 0.048 0.390 ± 0.048

hendrycksTest-virology acc 0.313 ± 0.036 0.325 ± 0.036

acc_norm 0.331 ± 0.037 0.343 ± 0.037

hendrycksTest-world_religions acc 0.304 ± 0.035 0.316 ± 0.035

acc_norm 0.386 ± 0.037 0.339 ± 0.037

logiqa acc 0.201 ± 0.016 0.280 ± 0.016

acc_norm 0.281 ± 0.018 0.283 ± 0.018

mathqa acc 0.247 ± 0.008 0.248 ± 0.008

acc_norm 0.246 ± 0.008 0.239 ± 0.008

mnli acc 0.339 ± 0.005 0.729 ± 0.005

mnli_mismatched acc 0.338 ± 0.005 0.742 ± 0.005

mrpc acc 0.684 ± 0.023 0.701 ± 0.023

f1 0.812 ± 0.016 0.820 ± 0.016

multirc acc 0.016 ± 0.004 0.004 ± 0.004

openbookqa acc 0.234 ± 0.019 0.248 ± 0.019

acc_norm 0.332 ± 0.021 0.318 ± 0.021

piqa acc 0.721 ± 0.010 0.713 ± 0.010

acc_norm 0.729 ± 0.010 0.708 ± 0.010

qnli acc 0.509 ± 0.007 0.761 ± 0.007

qqp acc 0.368 ± 0.002 0.843 ± 0.002

f1 0.538 ± 0.003 0.789 ± 0.003

race acc 0.353 ± 0.015 0.362 ± 0.015

record f1 0.845 ± 0.004 0.779 ± 0.004

em 0.838 ± 0.004 0.770 ± 0.004

rte acc 0.520 ± 0.030 0.729 ± 0.030

sciq acc 0.893 ± 0.010 0.919 ± 0.010

acc_norm 0.828 ± 0.012 0.913 ± 0.012

sst acc 0.789 ± 0.014 0.862 ± 0.014

webqs acc 0.016 ± 0.003 0.071 ± 0.003

wic acc 0.500 ± 0.020 0.517 ± 0.020

winogrande acc 0.575 ± 0.014 0.570 ± 0.014

wnli acc 0.310 ± 0.055 0.563 ± 0.055

wsc acc 0.365 ± 0.047 0.365 ± 0.047

lambada ppl 5.626 ± 0.139 27.796 ± 0.139

acc 0.622 ± 0.007 0.387 ± 0.007

pubmedqa acc 0.565 ± 0.016 0.496 ± 0.016

coqa f1 0.604 ± 0.018 0.598 ± 0.018

em 0.479 ± 0.020 0.480 ± 0.020

drop em 0.026 ± 0.002 0.001 ± 0.002

f1 0.083 ± 0.002 0.033 ± 0.002

math_algebra acc 0.008 ± 0.003 0.025 ± 0.003

math_geometry acc 0.002 ± 0.002 0.021 ± 0.002

math_intermediate_algebra acc 0.004 ± 0.002 0.025 ± 0.002

math_num_theory acc 0.019 ± 0.006 0.046 ± 0.006

math_prealgebra acc 0.001 ± 0.001 0.039 ± 0.001

math_precalc acc 0.005 ± 0.003 0.016 ± 0.003

One shot

Task Metric 2.7B Tuned

anli_r1 acc 0.331 ± 0.015 0.443 ± 0.015

anli_r2 acc 0.307 ± 0.015 0.373 ± 0.015

anli_r3 acc 0.343 ± 0.014 0.423 ± 0.014

arc_challenge acc 0.302 ± 0.013 0.292 ± 0.013

acc_norm 0.323 ± 0.014 0.323 ± 0.014

arc_easy acc 0.634 ± 0.010 0.567 ± 0.010

acc_norm 0.622 ± 0.010 0.562 ± 0.010

boolq acc 0.536 ± 0.009 0.620 ± 0.009

cb acc 0.429 ± 0.067 0.411 ± 0.067

cola mcc 0.001 ± 0.031 0.022 ± 0.031

copa acc 0.770 ± 0.042 0.780 ± 0.042

ethics_cm acc 0.508 ± 0.008 0.625 ± 0.008

ethics_deontology acc 0.511 ± 0.008 0.683 ± 0.008

ethics_justice acc 0.515 ± 0.010 0.604 ± 0.010

ethics_utilitarianism acc 0.490 ± 0.007 0.536 ± 0.007

ethics_virtue acc 0.726 ± 0.006 0.805 ± 0.006

headqa acc 0.230 ± 0.008 0.228 ± 0.008

acc_norm 0.270 ± 0.008 0.275 ± 0.008

hellaswag acc 0.428 ± 0.005 0.386 ± 0.005

acc_norm 0.557 ± 0.005 0.494 ± 0.005

hendrycksTest-abstract_algebra acc 0.220 ± 0.042 0.270 ± 0.042

acc_norm 0.290 ± 0.046 0.260 ± 0.046

hendrycksTest-anatomy acc 0.289 ± 0.039 0.304 ± 0.039

acc_norm 0.230 ± 0.036 0.289 ± 0.036

hendrycksTest-astronomy acc 0.204 ± 0.033 0.322 ± 0.033

acc_norm 0.303 ± 0.037 0.322 ± 0.037

hendrycksTest-business_ethics acc 0.290 ± 0.046 0.320 ± 0.046

acc_norm 0.280 ± 0.045 0.280 ± 0.045

hendrycksTest-clinical_knowledge acc 0.287 ± 0.028 0.351 ± 0.028

acc_norm 0.328 ± 0.029 0.358 ± 0.029

hendrycksTest-college_biology acc 0.215 ± 0.034 0.271 ± 0.034

acc_norm 0.194 ± 0.033 0.271 ± 0.033

hendrycksTest-college_chemistry acc 0.300 ± 0.046 0.330 ± 0.046

acc_norm 0.340 ± 0.048 0.320 ± 0.048

hendrycksTest-college_computer_science acc 0.330 ± 0.047 0.390 ± 0.047

acc_norm 0.310 ± 0.046 0.360 ± 0.046

hendrycksTest-college_mathematics acc 0.200 ± 0.040 0.280 ± 0.040

acc_norm 0.220 ± 0.042 0.270 ± 0.042

hendrycksTest-college_medicine acc 0.254 ± 0.033 0.295 ± 0.033

acc_norm 0.260 ± 0.033 0.283 ± 0.033

hendrycksTest-college_physics acc 0.304 ± 0.046 0.284 ± 0.046

acc_norm 0.333 ± 0.047 0.304 ± 0.047

hendrycksTest-computer_security acc 0.320 ± 0.047 0.270 ± 0.047

acc_norm 0.320 ± 0.047 0.290 ± 0.047

hendrycksTest-conceptual_physics acc 0.268 ± 0.029 0.349 ± 0.029

acc_norm 0.255 ± 0.029 0.345 ± 0.029

hendrycksTest-econometrics acc 0.298 ± 0.043 0.272 ± 0.043

acc_norm 0.298 ± 0.043 0.263 ± 0.043

hendrycksTest-electrical_engineering acc 0.338 ± 0.039 0.324 ± 0.039

acc_norm 0.290 ± 0.038 0.303 ± 0.038

hendrycksTest-elementary_mathematics acc 0.262 ± 0.023 0.275 ± 0.023

acc_norm 0.294 ± 0.023 0.275 ± 0.023

hendrycksTest-formal_logic acc 0.310 ± 0.041 0.310 ± 0.041

acc_norm 0.294 ± 0.041 0.270 ± 0.041

hendrycksTest-global_facts acc 0.200 ± 0.040 0.290 ± 0.040

acc_norm 0.210 ± 0.041 0.290 ± 0.041

hendrycksTest-high_school_biology acc 0.265 ± 0.025 0.342 ± 0.025

acc_norm 0.287 ± 0.026 0.342 ± 0.026

hendrycksTest-high_school_chemistry acc 0.251 ± 0.031 0.232 ± 0.031

acc_norm 0.291 ± 0.032 0.227 ± 0.032

hendrycksTest-high_school_computer_science acc 0.260 ± 0.044 0.280 ± 0.044

acc_norm 0.300 ± 0.046 0.260 ± 0.046

hendrycksTest-high_school_european_history acc 0.267 ± 0.035 0.309 ± 0.035

acc_norm 0.315 ± 0.036 0.321 ± 0.036

hendrycksTest-high_school_geography acc 0.227 ± 0.030 0.348 ± 0.030

acc_norm 0.278 ± 0.032 0.354 ± 0.032

hendrycksTest-high_school_government_and_politics acc 0.290 ± 0.033 0.332 ± 0.033

acc_norm 0.290 ± 0.033 0.321 ± 0.033

hendrycksTest-high_school_macroeconomics acc 0.279 ± 0.023 0.305 ± 0.023

acc_norm 0.267 ± 0.022 0.285 ± 0.022

hendrycksTest-high_school_mathematics acc 0.252 ± 0.026 0.278 ± 0.026

acc_norm 0.296 ± 0.028 0.304 ± 0.028

hendrycksTest-high_school_microeconomics acc 0.265 ± 0.029 0.256 ± 0.029

acc_norm 0.324 ± 0.030 0.273 ± 0.030

hendrycksTest-high_school_physics acc 0.205 ± 0.033 0.205 ± 0.033

acc_norm 0.232 ± 0.034 0.212 ± 0.034

hendrycksTest-high_school_psychology acc 0.251 ± 0.019 0.328 ± 0.019

acc_norm 0.270 ± 0.019 0.325 ± 0.019

hendrycksTest-high_school_statistics acc 0.319 ± 0.032 0.241 ± 0.032

acc_norm 0.319 ± 0.032 0.245 ± 0.032

hendrycksTest-high_school_us_history acc 0.265 ± 0.031 0.221 ± 0.031

acc_norm 0.260 ± 0.031 0.230 ± 0.031

hendrycksTest-high_school_world_history acc 0.283 ± 0.029 0.371 ± 0.029

acc_norm 0.266 ± 0.029 0.380 ± 0.029

hendrycksTest-human_aging acc 0.296 ± 0.031 0.296 ± 0.031

acc_norm 0.274 ± 0.030 0.291 ± 0.030

hendrycksTest-human_sexuality acc 0.351 ± 0.042 0.290 ± 0.042

acc_norm 0.282 ± 0.039 0.290 ± 0.039

hendrycksTest-international_law acc 0.248 ± 0.039 0.322 ± 0.039

acc_norm 0.347 ± 0.043 0.331 ± 0.043

hendrycksTest-jurisprudence acc 0.269 ± 0.043 0.296 ± 0.043

acc_norm 0.370 ± 0.047 0.296 ± 0.047

hendrycksTest-logical_fallacies acc 0.202 ± 0.032 0.276 ± 0.032

acc_norm 0.270 ± 0.035 0.258 ± 0.035

hendrycksTest-machine_learning acc 0.295 ± 0.043 0.250 ± 0.043

acc_norm 0.330 ± 0.045 0.223 ± 0.045

hendrycksTest-management acc 0.282 ± 0.045 0.320 ± 0.045

acc_norm 0.272 ± 0.044 0.350 ± 0.044

hendrycksTest-marketing acc 0.303 ± 0.030 0.415 ± 0.030

acc_norm 0.329 ± 0.031 0.423 ± 0.031

hendrycksTest-medical_genetics acc 0.330 ± 0.047 0.300 ± 0.047

acc_norm 0.420 ± 0.050 0.300 ± 0.050

hendrycksTest-miscellaneous acc 0.319 ± 0.017 0.318 ± 0.017

acc_norm 0.319 ± 0.017 0.313 ± 0.017

hendrycksTest-moral_disputes acc 0.298 ± 0.025 0.341 ± 0.025

acc_norm 0.318 ± 0.025 0.344 ± 0.025

hendrycksTest-moral_scenarios acc 0.267 ± 0.015 0.240 ± 0.015

acc_norm 0.265 ± 0.015 0.238 ± 0.015

hendrycksTest-nutrition acc 0.278 ± 0.026 0.330 ± 0.026

acc_norm 0.337 ± 0.027 0.350 ± 0.027

hendrycksTest-philosophy acc 0.251 ± 0.025 0.315 ± 0.025

acc_norm 0.293 ± 0.026 0.325 ± 0.026

hendrycksTest-prehistory acc 0.244 ± 0.024 0.352 ± 0.024

acc_norm 0.250 ± 0.024 0.361 ± 0.024

hendrycksTest-professional_accounting acc 0.287 ± 0.027 0.213 ± 0.027

acc_norm 0.248 ± 0.026 0.216 ± 0.026

hendrycksTest-professional_law acc 0.273 ± 0.011 0.267 ± 0.011

acc_norm 0.269 ± 0.011 0.269 ± 0.011

hendrycksTest-professional_medicine acc 0.301 ± 0.028 0.301 ± 0.028

acc_norm 0.268 ± 0.027 0.327 ± 0.027

hendrycksTest-professional_psychology acc 0.279 ± 0.018 0.304 ± 0.018

acc_norm 0.284 ± 0.018 0.310 ± 0.018

hendrycksTest-public_relations acc 0.327 ± 0.045 0.345 ± 0.045

acc_norm 0.309 ± 0.044 0.336 ± 0.044

hendrycksTest-security_studies acc 0.265 ± 0.028 0.331 ± 0.028

acc_norm 0.208 ± 0.026 0.290 ± 0.026

hendrycksTest-sociology acc 0.269 ± 0.031 0.393 ± 0.031

acc_norm 0.249 ± 0.031 0.383 ± 0.031

hendrycksTest-us_foreign_policy acc 0.290 ± 0.046 0.320 ± 0.046

acc_norm 0.320 ± 0.047 0.320 ± 0.047

hendrycksTest-virology acc 0.289 ± 0.035 0.349 ± 0.035

acc_norm 0.265 ± 0.034 0.355 ± 0.034

hendrycksTest-world_religions acc 0.374 ± 0.037 0.345 ± 0.037

acc_norm 0.409 ± 0.038 0.351 ± 0.038

logiqa acc 0.255 ± 0.017 0.273 ± 0.017

acc_norm 0.272 ± 0.017 0.280 ± 0.017

mathqa acc 0.256 ± 0.008 0.253 ± 0.008

acc_norm 0.258 ± 0.008 0.240 ± 0.008

mnli acc 0.338 ± 0.005 0.801 ± 0.005

mnli_mismatched acc 0.362 ± 0.005 0.811 ± 0.005

mrpc acc 0.571 ± 0.025 0.750 ± 0.025

f1 0.689 ± 0.022 0.841 ± 0.022

multirc acc 0.047 ± 0.007 0.012 ± 0.007

openbookqa acc 0.222 ± 0.019 0.268 ± 0.019

acc_norm 0.346 ± 0.021 0.344 ± 0.021

piqa acc 0.726 ± 0.010 0.714 ± 0.010

acc_norm 0.736 ± 0.010 0.718 ± 0.010

qnli acc 0.504 ± 0.007 0.788 ± 0.007

qqp acc 0.534 ± 0.002 0.847 ± 0.002

f1 0.372 ± 0.004 0.793 ± 0.004

race acc 0.352 ± 0.015 0.355 ± 0.015

record f1 0.843 ± 0.004 0.778 ± 0.004

em 0.835 ± 0.004 0.771 ± 0.004

rte acc 0.491 ± 0.030 0.747 ± 0.030

sciq acc 0.930 ± 0.008 0.939 ± 0.008

acc_norm 0.938 ± 0.008 0.935 ± 0.008

sst acc 0.492 ± 0.017 0.916 ± 0.017

webqs acc 0.054 ± 0.005 0.095 ± 0.005

wic acc 0.472 ± 0.020 0.539 ± 0.020

winogrande acc 0.582 ± 0.014 0.571 ± 0.014

wnli acc 0.380 ± 0.058 0.549 ± 0.058

wsc acc 0.365 ± 0.047 0.365 ± 0.047

lambada ppl 6.423 ± 0.162 20.150 ± 0.162

acc 0.576 ± 0.007 0.394 ± 0.007

pubmedqa acc 0.529 ± 0.016 0.479 ± 0.016

coqa f1 0.606 ± 0.018 0.581 ± 0.018

em 0.484 ± 0.020 0.472 ± 0.020

drop em 0.001 ± 0.000 0.001 ± 0.000

f1 0.039 ± 0.001 0.031 ± 0.001

math_algebra acc 0.016 ± 0.004 0.024 ± 0.004

math_counting_and_prob acc 0.023 ± 0.007 0.030 ± 0.007

math_geometry acc 0.006 ± 0.004 0.021 ± 0.004

math_intermediate_algebra acc 0.020 ± 0.005 0.029 ± 0.005

math_num_theory acc 0.037 ± 0.008 0.039 ± 0.008

math_prealgebra acc 0.023 ± 0.005 0.041 ± 0.005

math_precalc acc 0.015 ± 0.005 0.022 ± 0.005

The model can be downloaded here, though I don't recommend using it for anything.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

model

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Finetuning …modelEleutherAI …

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 160 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models