Building Production RAG Systems in .NET 10: The Complete Guide to Embeddings
<h1> Building Production RAG Systems in .NET 10: The Complete Guide to Embeddings </h1> <h2> The Hallucination Problem </h2> <p>Your company spent $50K building an internal chatbot. It tells customers "yes, we ship internationally" when you only ship to the US. Your support team is drowning in corrections.</p> <p>Sound familiar?</p> <p>This happens because traditional LLMs generate responses from training data patterns, not your actual data. They hallucinate. They confidently state false information.</p> <p><strong>RAG (Retrieval-Augmented Generation) fixes this.</strong> Instead of hoping the LLM knows about your data, you explicitly feed it your documents first.</p> <h2> What Are Embeddings? </h2> <p>Think of embeddings as a way to convert text into mathematics.</p> <h3> The Simple Versi
The Hallucination Problem
Your company spent $50K building an internal chatbot. It tells customers "yes, we ship internationally" when you only ship to the US. Your support team is drowning in corrections.
Sound familiar?
This happens because traditional LLMs generate responses from training data patterns, not your actual data. They hallucinate. They confidently state false information.
RAG (Retrieval-Augmented Generation) fixes this. Instead of hoping the LLM knows about your data, you explicitly feed it your documents first.
What Are Embeddings?
Think of embeddings as a way to convert text into mathematics.
The Simple Version
Text: "The quick brown fox" ↓ Embedding (float array, 1536 dimensions) [0.234, -0.156, 0.892, ..., 0.421] ↓ This vector captures semantic meaningText: "The quick brown fox" ↓ Embedding (float array, 1536 dimensions) [0.234, -0.156, 0.892, ..., 0.421] ↓ This vector captures semantic meaningEnter fullscreen mode
Exit fullscreen mode
Why Vectors Matter
Two sentences with different words can have similar embeddings if they mean the same thing:
Sentence A: "Our Q3 revenue exceeded $5 million" Embedding A: [0.234, -0.156, 0.892, ...]Sentence A: "Our Q3 revenue exceeded $5 million" Embedding A: [0.234, -0.156, 0.892, ...]Sentence B: "Q3 generated more than $5M in sales" Embedding B: [0.235, -0.154, 0.894, ...]
← Very similar! The model understands they mean the same thing.`
Enter fullscreen mode
Exit fullscreen mode
But this completely different sentence:
Sentence C: "I like coffee" Embedding C: [0.892, 0.234, -0.156, ...]Sentence C: "I like coffee" Embedding C: [0.892, 0.234, -0.156, ...]← Very different vector! Different meaning.`
Enter fullscreen mode
Exit fullscreen mode
This is how RAG systems find relevant documents by meaning, not just keyword matches.
The RAG Pipeline in .NET 10
Step 1: Generate Embeddings from Your Documents
// In .NET 10 with Microsoft.Extensions.AI public class DocumentEmbedder { private readonly EmbeddingsClient _embeddingClient; private readonly VectorStore _vectorStore;// In .NET 10 with Microsoft.Extensions.AI public class DocumentEmbedder { private readonly EmbeddingsClient _embeddingClient; private readonly VectorStore _vectorStore;public DocumentEmbedder(EmbeddingsClient client, VectorStore store) { _embeddingClient = client; _vectorStore = store; }
// Embed your documents once public async Task IndexDocumentsAsync(List documents) { var embeddings = await embeddingClient.GenerateAsync(documents);
var vectors = embeddings.Value.Select((e, i) => new VectorDocument { Id = Guid.NewGuid().ToString(), Content = documents[i], Vector = e.Vector.ToArray(), Metadata = new { Source = "DocumentLibrary" } }).ToList();
await vectorStore.UpsertAsync(vectors); } }`
Enter fullscreen mode
Exit fullscreen mode
Key point: You embed documents once and store them. Embeddings are deterministic—same document = same vector, every time.
Step 2: When User Asks, Search Semantically
public class RAGResponseGenerator { private readonly VectorStore _vectorStore; private readonly EmbeddingsClient _embeddingClient; private readonly ChatClient _chatClient;_public class RAGResponseGenerator { private readonly VectorStore _vectorStore; private readonly EmbeddingsClient _embeddingClient; private readonly ChatClient _chatClient;_public async Task AnswerAsync(string userQuestion) { // 1. Embed the question var queryEmbedding = await embeddingClient .GenerateAsync(new[] { userQuestion });
// 2. Search vector database for similar documents var relevantDocs = await vectorStore.SearchAsync( vector: queryEmbedding.Value[0].Vector.ToArray(), topK: 5, threshold: 0.7 // Similarity score );
// 3. Build context from relevant documents var context = string.Join("\n\n", relevantDocs .Select(d => $"Source: {d.Metadata["Source"]}\n{d.Content}"));
// 4. Generate response grounded in real data var response = await chatClient.CompleteAsync( new ChatMessage(ChatRole.System, "You are a helpful assistant. Answer using ONLY the provided context. " + "If the context doesn't contain the answer, say 'I don't have that information.'"), new ChatMessage(ChatRole.User, $"Context:\n{context}\n\nQuestion: {userQuestion}") );
return response.Content[0].Text; } }`
Enter fullscreen mode
Exit fullscreen mode
Real-World Use Cases
1. Enterprise Document Search
Problem: "Find all contracts where we agreed to 30-day payment terms"
Keyword search fails. It finds "30 days" but also matches "30-day warranty" in unrelated docs.
RAG solution:
// Semantic search understands intent var searchResults = await _vectorStore.SearchAsync( query: "payment terms agreements", topK: 20 );_// Semantic search understands intent var searchResults = await _vectorStore.SearchAsync( query: "payment terms agreements", topK: 20 );_// Returns contracts actually discussing payment terms // Not just keyword matches`
Enter fullscreen mode
Exit fullscreen mode
2. Customer Support Automation
Problem: Support tickets are repetitive. Your FAQ is massive.
RAG solution:
public class SupportChatbot { public async Task AnswerSupportQuestionAsync(string question) { // Search FAQ, past tickets, knowledge base var relevantArticles = await _vectorStore.SearchAsync( query: question, filter: new { Type = "FaqOrTicket" } );_public class SupportChatbot { public async Task AnswerSupportQuestionAsync(string question) { // Search FAQ, past tickets, knowledge base var relevantArticles = await _vectorStore.SearchAsync( query: question, filter: new { Type = "FaqOrTicket" } );_// Generate response from actual support history var response = await chatClient.CompleteAsync( context: relevantArticles, prompt: $"Customer asks: {question}" );
return response; } }`
Enter fullscreen mode
Exit fullscreen mode
Result: Consistent answers based on real support history, not hallucinated solutions.
3. Technical Documentation Assistant
Problem: Your API docs are 500 pages. Developers give up.
RAG solution:
// "How do I paginate API results?" // Search finds: Authentication docs, Pagination section, Examples // Returns: Exactly what the developer needs// "How do I paginate API results?" // Search finds: Authentication docs, Pagination section, Examples // Returns: Exactly what the developer needsvar docSearch = await vectorStore.SearchAsync( query: "pagination API results", filter: new { DocumentType = "ApiDocs" }, topK: 3 );`
Enter fullscreen mode
Exit fullscreen mode
4. Code Analysis & Documentation
Problem: Onboarding takes weeks. New devs can't find relevant code examples.
RAG solution:
public class CodebaseAssistant { // Embed your entire codebase // "Show me examples of dependency injection usage" var examples = await _codeVectorStore.SearchAsync( query: "dependency injection usage examples", topK: 10 );_public class CodebaseAssistant { // Embed your entire codebase // "Show me examples of dependency injection usage" var examples = await _codeVectorStore.SearchAsync( query: "dependency injection usage examples", topK: 10 );_// Returns actual code from your repo }`
Enter fullscreen mode
Exit fullscreen mode
DO's and DON'Ts for RAG in .NET
✅ DO
- Chunk documents smartly. 512-1024 token chunks work best. Too small = lost context. Too large = expensive embeddings.
var chunks = ChunkDocument(doc, chunkSize: 512, overlap: 100);
Enter fullscreen mode
Exit fullscreen mode
- Store metadata. Source, date, version - makes results traceable.
var vector = new VectorDocument { Content = text, Vector = embedding, Metadata = new { Source = "SalesReport", Date = DateTime.Now } };var vector = new VectorDocument { Content = text, Vector = embedding, Metadata = new { Source = "SalesReport", Date = DateTime.Now } };Enter fullscreen mode
Exit fullscreen mode
- Monitor similarity scores. Not all search results are good results.
var results = await vectorStore.SearchAsync(query, topK: 5); var confident = results.Where(r => r.SimilarityScore > 0.75);var results = await vectorStore.SearchAsync(query, topK: 5); var confident = results.Where(r => r.SimilarityScore > 0.75);Enter fullscreen mode
Exit fullscreen mode
- Regenerate embeddings when documents change significantly.
❌ DON'T
- Embed raw PDFs. Extract text first. Preserve structure.
// Bad var embedding = await client.GenerateAsync(pdfBytes);// Bad var embedding = await client.GenerateAsync(pdfBytes);// Good var text = ExtractTextFromPdf(pdf); var embedding = await client.GenerateAsync(text);`
Enter fullscreen mode
Exit fullscreen mode
- Trust low similarity scores. If your search returns 0.45 relevance, it's basically random.
// Bad: Use anything over 0.5 // Good: Use results > 0.7, fall back to "I don't know"// Bad: Use anything over 0.5 // Good: Use results > 0.7, fall back to "I don't know"Enter fullscreen mode
Exit fullscreen mode
-
Use outdated embeddings for new documents. Inconsistent results.
-
Forget about cost. Embedding a million documents is expensive. Plan your chunk strategy.
Vector Databases for .NET
Database .NET Support Best For Cost
Azure Cosmos DB ✅ Native Enterprise, serverless
Azure OpenAI ✅ Built-in Quick start, OpenAI models
pgvector (PostgreSQL) ✅ Npgsql Self-hosted, low cost $
Milvus ✅ Community Open source, scalable $
Pinecone ✅ REST API Managed, serverless
Minimal Example with Cosmos DB
services.AddAzureOpenAIClient(endpoint, credentials); services.AddScoped(); services.AddScoped();services.AddAzureOpenAIClient(endpoint, credentials); services.AddScoped(); services.AddScoped();// Dependency injection handles the rest`
Enter fullscreen mode
Exit fullscreen mode
Measuring RAG Quality
Retrieval Metrics
Precision: Of top-5 results, how many are relevant?
var relevant = searchResults.Count(r => r.IsRelevant); var precision = relevant / searchResults.Count; // Target: > 0.8var relevant = searchResults.Count(r => r.IsRelevant); var precision = relevant / searchResults.Count; // Target: > 0.8Enter fullscreen mode
Exit fullscreen mode
Recall: Of all relevant documents, did we find them?
var foundRelevant = relevantDocuments .Count(d => searchResults.Contains(d)); var recall = foundRelevant / totalRelevantDocuments; // Target: > 0.7var foundRelevant = relevantDocuments .Count(d => searchResults.Contains(d)); var recall = foundRelevant / totalRelevantDocuments; // Target: > 0.7Enter fullscreen mode
Exit fullscreen mode
Conclusion
RAG eliminates hallucinations by grounding AI in your actual data.
Key Takeaways:
-
Embeddings = Text as math. They capture semantic meaning.
-
RAG pipeline = Search → Feed → Generate. Find relevant docs, include them, answer based on reality.
-
.NET 10 + Microsoft.Extensions.AI makes this native and simple.
-
Vector databases store and search embeddings at scale.
-
Production-ready requires chunking strategy, metadata, similarity thresholds.
Next Steps:
-
Review Generative AI for Beginners .NET v2 - Lesson 3 covers RAG
-
Choose your vector database (start with pgvector for simplicity)
-
Extract and chunk your documents
-
Build your first RAG pipeline
Resources
-
Microsoft.Extensions.AI Docs
-
Generative AI for Beginners .NET v2
-
RAG vs Fine-tuning
-
pgvector for PostgreSQL
-
Azure OpenAI Embeddings
What's your biggest question about RAG? Drop it below!
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingversion![Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]](https://preview.redd.it/ts58xykt7dtg1.jpeg?width=640&crop=smart&auto=webp&s=3685db6fe7cef0a3641f3d2fd3523708af058962)
Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]
Hey guys, I’m the same creator of Netryx V2, the geolocation tool. I’ve been working on something new called COGNEX. It learns how a person reacts to situations, then uses that pattern to simulate how they would respond to something new. You collect real stimulus and response pairs. A stimulus is an event. A response is what they said or did. The key is linking them properly. Then you convert both into structured signals instead of raw text. This is where TRIBE v2 comes in. It was released by Meta about two weeks ago, trained on fMRI scan data, and it can take text, audio, images, and video and estimate how a human brain would process that input. On its own, it reflects an average brain. It does not know the individual. COGNEX uses TRIBE to first map every stimulus and response into this s
![[D] ML researcher looking to switch to a product company.](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-wave-pattern-4YWNKzoeu65vYpqRKWMiWf.webp)
[D] ML researcher looking to switch to a product company.
Hey, I am an AI researcher currently working in a deep tech company as a data scientist. Prior to this, I was doing my PhD. My current role involves working ok physics related problems and the project life cycle could be 2-4 years and the change comes in my company very slowly. The problems are quite interesting but because of the slow pace of development, I find myself getting often frustrated. As a byproduct, I don’t think that I am learning as much as I can. Because of these reasons, I want to move to a company where the development cycles are short and you have the flexibility to iterate and test quickly. Ideally a company which directly interacts with customers, like uber. The problem I am facing is that in the interview processes, a lot of these companies require you to have a lot of
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products

The New York Times drops freelancer whose AI tool copied from an existing book review
AI tools can speed up journalism until they backfire. Two recent cases show what happens when writers don't understand how their AI tools work: copied passages and made-up quotes. The article The New York Times drops freelancer whose AI tool copied from an existing book review appeared first on The Decoder .

Trump to Axios: Iran deal possible by Tues., otherwise "I am blowing up everything"
President Trump claimed in an interview with Axios that the U.S. is "in deep negotiations" with Iran and that a deal can be reached before his deadline expires on Tuesday. "There is a good chance, but if they don't make a deal, I am blowing up everything over there," he said. Why it matters: The mediators are less optimistic that a deal is close but say they will work to the last minute to reach at least a partial agreement to delay Trump's ultimatum. Trump has threatened to destroy infrastructure that is vital to Iranian civilians if he is unable to reach a deal with their leaders. Tehran has accused Trump of planning to commit war crimes and threatened to retaliate with similar attacks against infrastructure in Israel and the Gulf states. Asked by Axios whether he worried he would be har





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!