Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessChina cuts cost of military-grade infrared chips to as little as a few dozen USDSCMP Tech (Asia AI)MethodologyDEV CommunityHow to Create a Pipeline with Dotflow in PythonDEV CommunityJava + AI: Beyond APIs: into runtime, performance, and system designDEV Communityv0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)Ollama ReleasesThe Indianapolis Data Center Shooting Is a Local Bug ReportDEV CommunityWriting Self-Documenting TypeScript: Naming, Narrowing, and Knowing When to StopDEV CommunityDiscussion: AI and Privacy-First DevelopmentDEV CommunityDiscussion: AI & Machine Learning CategoryDEV CommunitySecuring Plex on Synology NAS with Post-Quantum Cryptography via Cloudflare TunnelDEV CommunityResume Skills Section: Best Layout + Examples (2026)DEV CommunityHow AI Is Transforming Cybersecurity and Compliance — A Deep Dive into PCI DSSDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessChina cuts cost of military-grade infrared chips to as little as a few dozen USDSCMP Tech (Asia AI)MethodologyDEV CommunityHow to Create a Pipeline with Dotflow in PythonDEV CommunityJava + AI: Beyond APIs: into runtime, performance, and system designDEV Communityv0.20.3-rc0: model/parsers: add gemma4 tool call repair (#15374)Ollama ReleasesThe Indianapolis Data Center Shooting Is a Local Bug ReportDEV CommunityWriting Self-Documenting TypeScript: Naming, Narrowing, and Knowing When to StopDEV CommunityDiscussion: AI and Privacy-First DevelopmentDEV CommunityDiscussion: AI & Machine Learning CategoryDEV CommunitySecuring Plex on Synology NAS with Post-Quantum Cryptography via Cloudflare TunnelDEV CommunityResume Skills Section: Best Layout + Examples (2026)DEV CommunityHow AI Is Transforming Cybersecurity and Compliance — A Deep Dive into PCI DSSDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

Reddit r/LocalLLaMAby /u/angeletti89 https://www.reddit.com/user/angeletti89April 5, 20264 min read1 views
Source Quiz

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an afterthought — English-first tokenizer, English-first data, maybe some Italian sprinkled in during fine-tuning. The result: bloated token counts, poor morphology handling, and models that "speak Italian" the way a tourist orders coffee in Rome. I decided to fix this from the ground up. What is Dante-2B A 2.1B parameter, decoder-only, dense transformer. Trained from scratch — no fine-tune of Llama, no adapter on Mistral. Random init to coherent Italian in 16 days on 2× H200 GPUs. Architecture: LLaMA-style with GQA (20 query heads, 4 KV heads — 5:1 ratio) SwiGLU FFN, RMSNorm, RoPE d_model=2560, 28 layers, d_head=128 (optimized for Flash Attention on H200) Weight

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Dante-2B: I…llamamistralmodeltransformerbenchmarktrainingReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 208 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models