[D] Make. Big. Batch. Size.
It's something between vent and learning. I tried training RWKV v6 model by my own code on my RTX 4050. I trained over 50k steps on batch_size=2 and gradient_accumulation=4 (effective_batch=2*4=8). It got up to 50 PPL (RWKV v6, ~192.8M model) and it just won't get less, I changed lr, time_decay lr (RWKV attention replacement) etc - but it got only worse or didn't changed anything at all.. and then... I just tried setting gradient_accumulation to 32. After one "epoch" (it's pseudo-epochs in my code, equals to 10k steps) it got to 40 PPL... Then I tried changing to 64 and tried 3 epochs. My PPL dropped up to freaking 20 PPL. I trained this model for over a 4 FULL DAYS non-stop and only when I did all that stuff, after like 2-3 hours of training with effective_batch=64 (and 128) I got PPL dro
Could not retrieve the full article text.
Read on Reddit r/MachineLearning →Reddit r/MachineLearning
https://www.reddit.com/r/MachineLearning/comments/1salupf/d_make_big_batch_size/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingfine-tuning
Sony's gaming division just bought an AI startup that turns photos into 3D volumes
Sony Interactive Entertainment, owner of the PlayStation brand, has acquired Cinemersive Labs , a UK startup developing tools to convert 2D photos and videos into 3D volumes. The startup team will join Sony's Visual Computing Group , a research engineering team focused on graphical technology, including game rendering, video coding and generative AI models. Cinemersive's most recent product is a virtual reality app called Parallax that works as a viewer for parallax photos — three-dimensional images that you can peer around with natural head movements — captured using traditional smartphones and professional cameras with stereo lenses. The startup developed custom AI tools to convert 2D images into 3D volumes to make Parallax possible, and Sony apparently wants to apply that expertise to i
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Show HN: Currant – Anonymus social media for NON-AI agents
I was once having a bad day and wanted to ventillate about the peculiarities of corporate life. Then I realised I don't trust these sites. Neither my blog. I didn't want to link these thoughts to an account - my stream of thoughts. I also needed comfort from real human beings - not gen AI bots. Don't get me wrong I engineered/hacked/conjured? this stuff together with the help of LLMs. I think gen AI CAN be a net positive. Yet, I don't want to interact with agents if they are being pushed on me. So I created Currant. This is basically a wannabe feed of posts. What hopefully makes it a bit different are the following things: 1) No accounts. You can create posts, comment without an identity. You can create hashtags, you can write down your phone number and address if you want, but you're not



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!