Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodalAI’s Uncertain Cost Effects in Health Care - American Enterprise Institute - AEIGNews AI healthcareSony's gaming division just bought an AI startup that turns photos into 3D volumesEngadgetMulti-Model AI Orchestration for Software Development: How I Ship 10x Faster with Claude, Codex, and GeminiDEV CommunityMigrating a Webpack-Era Federated Module to Vite Without Breaking the Host ContractDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodalAI’s Uncertain Cost Effects in Health Care - American Enterprise Institute - AEIGNews AI healthcareSony's gaming division just bought an AI startup that turns photos into 3D volumesEngadgetMulti-Model AI Orchestration for Software Development: How I Ship 10x Faster with Claude, Codex, and GeminiDEV CommunityMigrating a Webpack-Era Federated Module to Vite Without Breaking the Host ContractDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

[D] Make. Big. Batch. Size.

Reddit r/MachineLearningby /u/Lines25 https://www.reddit.com/user/Lines25April 2, 20261 min read0 views
Source Quiz

It's something between vent and learning. I tried training RWKV v6 model by my own code on my RTX 4050. I trained over 50k steps on batch_size=2 and gradient_accumulation=4 (effective_batch=2*4=8). It got up to 50 PPL (RWKV v6, ~192.8M model) and it just won't get less, I changed lr, time_decay lr (RWKV attention replacement) etc - but it got only worse or didn't changed anything at all.. and then... I just tried setting gradient_accumulation to 32. After one "epoch" (it's pseudo-epochs in my code, equals to 10k steps) it got to 40 PPL... Then I tried changing to 64 and tried 3 epochs. My PPL dropped up to freaking 20 PPL. I trained this model for over a 4 FULL DAYS non-stop and only when I did all that stuff, after like 2-3 hours of training with effective_batch=64 (and 128) I got PPL dro

Could not retrieve the full article text.

Read on Reddit r/MachineLearning →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
[D] Make. B…modeltrainingfine-tuningReddit r/Ma…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 107 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!