Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen the Scraper Breaks Itself: Building a Self-Healing CSS Selector Repair SystemDEV CommunitySelf-Referential Generics in Kotlin: When Type Safety Requires Talking to YourselfDEV CommunitySources: Amazon is in talks to acquire Globalstar to bolster its low Earth orbit satellite business; Apple's 20% stake in Globalstar is a complicating factor (Financial Times)TechmemeZ.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywhereMarkTechPostHow I Started Using AI Agents for End-to-End Testing (Autonoma AI)DEV CommunityHow AI Is Changing PTSD Recovery — And Why It MattersDEV CommunityYour Company’s AI Isn’t Broken. Your Data Just Doesn’t Know What It Means.Towards AIDeepSource vs Coverity: Static Analysis ComparedDEV CommunityClaude Code's Source Didn't Leak. It Was Already Public for Years.DEV CommunityStop Accepting BGP Routes on Trust Alone: Deploy RPKI ROV on IOS-XE and IOS XR TodayDEV CommunityI Built 5 SaaS Products in 7 Days Using AIDEV CommunitySingle-cell imaging and machine learning reveal hidden coordination in algae's response to light stress - MSNGoogle News: Machine LearningBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen the Scraper Breaks Itself: Building a Self-Healing CSS Selector Repair SystemDEV CommunitySelf-Referential Generics in Kotlin: When Type Safety Requires Talking to YourselfDEV CommunitySources: Amazon is in talks to acquire Globalstar to bolster its low Earth orbit satellite business; Apple's 20% stake in Globalstar is a complicating factor (Financial Times)TechmemeZ.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywhereMarkTechPostHow I Started Using AI Agents for End-to-End Testing (Autonoma AI)DEV CommunityHow AI Is Changing PTSD Recovery — And Why It MattersDEV CommunityYour Company’s AI Isn’t Broken. Your Data Just Doesn’t Know What It Means.Towards AIDeepSource vs Coverity: Static Analysis ComparedDEV CommunityClaude Code's Source Didn't Leak. It Was Already Public for Years.DEV CommunityStop Accepting BGP Routes on Trust Alone: Deploy RPKI ROV on IOS-XE and IOS XR TodayDEV CommunityI Built 5 SaaS Products in 7 Days Using AIDEV CommunitySingle-cell imaging and machine learning reveal hidden coordination in algae's response to light stress - MSNGoogle News: Machine Learning

I built a self-hosted RAG system that actually works — here's how to run it in one command

DEV Communityby Francesco MarchettiApril 1, 20265 min read1 views
Source Quiz

<p>I'll be honest: I spent weeks trying to make existing RAG tools work for my use case. AnythingLLM kept needing cloud APIs. RAGFlow was hard to self-host cleanly. Perplexity-style tools were completely off the table for anything with sensitive documents.</p> <p>So I built my own.</p> <p><strong>RAG Enterprise</strong> is a 100% local RAG system — no data leaves your server, no external APIs, no hidden telemetry. It runs on your hardware with a single setup script. Here's how to get it running.</p> <h2> Why another RAG tool? </h2> <p>Because my clients have real constraints:</p> <ul> <li>Legal documents that can't touch US servers (hello, GDPR)</li> <li>IT departments that won't approve "just use OpenAI" </li> <li>Budgets that don't include $500/month SaaS subscriptions</li> </ul> <p>I ne

I'll be honest: I spent weeks trying to make existing RAG tools work for my use case. AnythingLLM kept needing cloud APIs. RAGFlow was hard to self-host cleanly. Perplexity-style tools were completely off the table for anything with sensitive documents.

So I built my own.

RAG Enterprise is a 100% local RAG system — no data leaves your server, no external APIs, no hidden telemetry. It runs on your hardware with a single setup script. Here's how to get it running.

Why another RAG tool?

Because my clients have real constraints:

  • Legal documents that can't touch US servers (hello, GDPR)

  • IT departments that won't approve "just use OpenAI"

  • Budgets that don't include $500/month SaaS subscriptions

I needed something that runs on-prem, handles PDFs and DOCX files well, supports multiple users with proper roles, and doesn't require a PhD to install.

After building and iterating on this for a few months, it now handles 10,000+ documents comfortably, supports 29 languages, and the whole stack is containerized.

What's under the hood

The architecture is pretty standard but well-wired:

React Frontend (Port 3000)  │  │ REST API  ▼ FastAPI Backend (Port 8000)

  • LangChain RAG pipeline
  • JWT auth + RBAC
  • Apache Tika + Tesseract OCR
  • BAAI/bge-m3 embeddings │ ┌────┴────┐ ▼ ▼ Qdrant Ollama (vectors) (LLM inference)`

Enter fullscreen mode

Exit fullscreen mode

The LLM runs via Ollama locally — by default Mistral 7B Q4 or Qwen2.5:14b depending on your VRAM. Embeddings use BAAI/bge-m3 which is multilingual and genuinely good.

Everything is Docker containers. No dependency hell.

Prerequisites

Before you start, make sure you have:

  • Ubuntu 20.04+ (22.04 recommended)

  • NVIDIA GPU with 8-16GB VRAM, drivers installed

  • 16GB RAM minimum (32GB recommended)

  • 50GB+ free disk space

  • A decent internet connection for the initial download (~80 Mbit/s or faster)

The setup downloads Docker images, the LLM model, and the embedding model. On a fast connection it takes 15-20 minutes. On a slower one, about an hour. You do it once.

Installation

# 1. Clone the repo git clone https://github.com/I3K-IT/RAG-Enterprise.git cd RAG-Enterprise/rag-enterprise-structure

2. Run the setup script

./setup.sh standard`

Enter fullscreen mode

Exit fullscreen mode

The script handles everything:

  • Docker Engine + Docker Compose

  • NVIDIA Container Toolkit

  • Ollama with your chosen LLM

  • Qdrant vector database

  • Backend + frontend services

At one point during setup it'll ask you to log out and back in (for Docker group permissions). Just do it and re-run the script — it picks up where it left off.

First startup

After setup completes, the backend downloads the embedding model on first run. This takes a few minutes. Check progress with:

docker compose logs backend -f

Enter fullscreen mode

Exit fullscreen mode

When you see Application startup complete, open your browser at http://localhost:3000.

Get your admin password from the logs:

docker compose logs backend | grep "Password:"

Enter fullscreen mode

Exit fullscreen mode

Login with admin and that password.

Uploading documents

The role system works like this:

  • User → can query, can't upload

  • Super User → can upload and delete documents

  • Admin → full access including user management

Login as Admin, go to the admin panel, create a Super User account. Then upload your documents.

Supported formats: PDF (with OCR), DOCX, PPTX, XLSX, TXT, MD, ODT, RTF, HTML, XML.

Processing takes 1-2 minutes per document. After that, you can start querying.

Querying your documents

Just type your question in plain language. The RAG pipeline:

  • Embeds your query with bge-m3

  • Searches Qdrant for semantically similar chunks

  • Passes relevant context to the LLM

  • Returns an answer grounded in your documents

Response time is 2-4 seconds. Generation speed around 80-100 tokens/second on an RTX 4070.

Switching the LLM model

Edit docker-compose.yml:

environment:  LLM_MODEL: qwen2.5:14b-instruct-q4_K_M # or mistral:7b-instruct-q4_K_M  EMBEDDING_MODEL: BAAI/bge-m3  RELEVANCE_THRESHOLD: "0.35"

Enter fullscreen mode

Exit fullscreen mode

Then restart the backend:

docker compose restart backend

Enter fullscreen mode

Exit fullscreen mode

If you're getting too few results, lower RELEVANCE_THRESHOLD to 0.3 or even 0.25.

Useful commands

# Check all services docker compose ps

Follow logs

docker compose logs -f

Restart everything

docker compose restart

Stop

docker compose down

Health check

curl http://localhost:8000/health`

Enter fullscreen mode

Exit fullscreen mode

If the backend shows "unhealthy" on first start, just wait — it's still downloading the embedding model.

What I'm working on next

The community edition uses Qdrant for vector search. The Pro version I'm building adds a hybrid SQL-Vector engine — combining traditional keyword search with semantic search for better precision on structured documents like contracts and regulatory texts. It also adds a 6-stage retrieval pipeline (query expansion → retrieval → reranking → fusion → filtering → generation).

But for most use cases, the community edition is more than enough.

Try it, break it, contribute

The repo is at github.com/I3K-IT/RAG-Enterprise. It's AGPL-3.0 — free to use, modify, and self-host. If you offer it as a service you need to share modifications, which I think is fair.

If you're building something on top of this, or hit issues during setup, open an issue or drop a comment here. Happy to help.

And if you're interested in the EU sovereignty angle — keeping AI infrastructure inside European jurisdiction — check out EuLLM, a project I'm building in parallel: a Rust-based alternative to Ollama with an EU-hosted model registry and built-in AI Act compliance. RAG Enterprise will integrate with it natively.

Built by Francesco Marchetti @ I3K Technologies, Milan.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamistralmodel

Knowledge Map

Knowledge Map
TopicsEntitiesSource
I built a s…llamamistralmodelversionapplicationserviceDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products