Live

•Black Hat USADark Reading •Black Hat AsiaAI Business •[D] From the Web to World Models: The New Layer of PowerReddit r/MachineLearning •OpenAI Pushes for Policies to Offset AI’s Impact | Bloomberg Tech 4/6/2026Bloomberg Technology •OpenAI asks California, Delaware to investigate Musk's 'anti-competitive behavior' ahead of April trial - cnbc.comGNews AI AGI •Nvidia-Backed Data Center Builder Firmus Raises $505 MillionBloomberg Technology •Nvidia acquisition of SchedMD sparks worry among AI specialists about software access - ReutersGNews AI NVIDIA •Lumentum Holdings (LITE) Is Up 26.3% After Nvidia-Backed $2 Billion AI Optics Expansion - Has The Bull Case Changed? - simplywall.stGNews AI NVIDIA •Nvidia acquisition of SchedMD sparks worry among AI specialists about software access - TradingViewGNews AI NVIDIA •Apollo-Backed Yahoo Kicks Off Talks on $1.6 Billion RefinancingBloomberg Technology •Microsoft’s new AI models signal its independence while challenging OpenAI and Google - eMarketerGNews AI Microsoft •Why TSMC grew four times faster than its foundry rivals in 2025 — price hikes, vertical integration, and commanding technology lead pay dividendstomshardware.com •The Complete DevSecOps Engineer Career Guide: From Pipeline Security to Platform Architect in 2026DEV Community •OpenAI’s $1M API Credits, Holos’ Agentic Web, and Xpertbench’s Expert TasksDEV Community •Black Hat USADark Reading •Black Hat AsiaAI Business •[D] From the Web to World Models: The New Layer of PowerReddit r/MachineLearning •OpenAI Pushes for Policies to Offset AI’s Impact | Bloomberg Tech 4/6/2026Bloomberg Technology •OpenAI asks California, Delaware to investigate Musk's 'anti-competitive behavior' ahead of April trial - cnbc.comGNews AI AGI •Nvidia-Backed Data Center Builder Firmus Raises $505 MillionBloomberg Technology •Nvidia acquisition of SchedMD sparks worry among AI specialists about software access - ReutersGNews AI NVIDIA •Lumentum Holdings (LITE) Is Up 26.3% After Nvidia-Backed $2 Billion AI Optics Expansion - Has The Bull Case Changed? - simplywall.stGNews AI NVIDIA •Nvidia acquisition of SchedMD sparks worry among AI specialists about software access - TradingViewGNews AI NVIDIA •Apollo-Backed Yahoo Kicks Off Talks on $1.6 Billion RefinancingBloomberg Technology •Microsoft’s new AI models signal its independence while challenging OpenAI and Google - eMarketerGNews AI Microsoft •Why TSMC grew four times faster than its foundry rivals in 2025 — price hikes, vertical integration, and commanding technology lead pay dividendstomshardware.com •The Complete DevSecOps Engineer Career Guide: From Pipeline Security to Platform Architect in 2026DEV Community •OpenAI’s $1M API Credits, Holos’ Agentic Web, and Xpertbench’s Expert TasksDEV Community

AI NEWS HUBbyEIGENVECTOR

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

Models llama mistral model transformer benchmark training

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

Reddit r/MachineLearningby /u/angeletti89 https://www.reddit.com/user/angeletti89April 5, 20264 min read0 views

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an afterthought — English-first tokenizer, English-first data, maybe some Italian sprinkled in during fine-tuning. The result: bloated token counts, poor morphology handling, and models that "speak Italian" the way a tourist orders coffee in Rome. I decided to fix this from the ground up. What is Dante-2B A 2.1B parameter, decoder-only, dense transformer. Trained from scratch — no fine-tune of Llama, no adapter on Mistral. Random init to coherent Italian in 16 days on 2× H200 GPUs. Architecture: LLaMA-style with GQA (20 query heads, 4 KV heads — 5:1 ratio) SwiGLU FFN, RMSNorm, RoPE d_model=2560, 28 layers, d_head=128 (optimized for Flash Attention on H200) Weight

Fetching article from Reddit r/MachineLearning…

Original source

Reddit r/MachineLearning

https://www.reddit.com/r/MachineLearning/comments/1sdh08w/p_dante2b_im_training_a_21b_bilingual_fully_open/

Was this article helpful?

Sign in to highlight and annotate this article

Ask AI about this article

Powered by Eigenvector · full article context loaded

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!