Building RAG-based AI Applications with DataStax and Fiddler
DataStax and Fiddler empower AI teams to deliver scalable, responsible, and helpful RAG-based AI Applications.
We’ve seen a tremendous uptick in enterprises adopting Large Language Models (LLMs) to power various knowledge reasoning applications like workplace assistants and chatbots over the past year. These applications range from product documentation to customer service to help end-users increase productivity, streamline automation, and make better decisions. Enterprises can launch their LLMs using 4 different LLM deployment methods depending on the nature of their business use case and are increasingly choosing retrieval-augmented generation (RAG)-based LLM applications, as it’s an efficient and cost-effective deployment method.
RAG enables AI teams to build applications on top of existing open-source LLMs or LLMs provided by the likes of OpenAI, Cohere, or Anthropic. Additionally, with RAG, enterprises can process time-sensitive and private information not possible with foundation models alone.
We’re excited to announce a partnership that enables enterprises and startups to put accurate, RAG applications in production more quickly with DataStax Astra DB and the Fiddler AI Observability platform.
Why DataStax and Fiddler AI together?
Enterprises and smaller organizations need LLM observability to meet accuracy and control requirements for putting RAG applications into production. Here are some reasons:
- Enterprise-level requirements: Cost, data privacy, and lack of control will still push enterprises to host their own data in a vector database.
- Evolving complexities in LLMs: Building with LLMs is no longer just a simple LLM call and getting back a text completion. These LLM APIs will now involve complex components like retrievers, threads, prompt chains, and access to tools — all of which need to be logged and monitored.
- LLM deployment method selection: Prompt engineering, RAG, and fine-tuning can all be leveraged, but which one to use and when to use it depends on the task at hand. Enterprises might take different approaches based on whether the LLM application they are building is internal or external. What is the risk vs reward tradeoff?
- Continuous LLM monitoring: Lastly, evaluation is still pretty hard! Regardless of how you use and apply LLMs, it won’t matter much if you are not consistently evaluating performance. With many changes on the horizon, customers should always continue with LLM monitoring after they launch their AI application.
In short, getting started with RAG applications can be done in minutes. However, as the enterprise consumers of these applications demand more accuracy, safety, and transparency from these business-critical applications, enterprises will naturally gravitate toward the stack that provides the most control and deep feature set required.
A Simple Recipe for Deploying RAG-based LLM Applications
What’s been so surprising about the proliferation of LLM-based applications over the past year is how powerful they are proving to be for a variety of knowledge reasoning tasks, while, at the same time, proving extremely simple architecturally. The benefits of RAG-based LLM applications have been well-understood for some time now.
To build these “reasoning applications” only requires a few key ingredients:
- A LLM foundation model
- Documents stored in a vector database for retrieval
- A LLM observability layer to tune the system, detect issues, and ensure proper performance
- LangChain or a LlamaIndex toolkit to orchestrate the workflow and data movement
Yet, as with any recipe, the final product is only as good as the quality of the ingredients we choose.
The Best-of-Breed RAG Ingredients
DataStax Astra DB
Astra DB is a Database-as-a-Service (DBaaS) that enables vector search and gives you the real-time vector and non-vector data to quickly build accurate generative AI applications and deploy them in production. Built on Apache Cassandra®, Astra DB adds real-time vector capabilities that can scale to billions of vectors and embeddings; as such, it’s a critical component in a GenAI application architecture. Real-time data reads, writes, and availability are critical to prevent AI hallucinations. As a serverless, distributed database, Astra DB supports replication over a wide geographic area, supporting extremely high availability. When ease-of-use and relevance at scale matter, Astra DB is the vector database of choice.
The Fiddler AI Observability Platform
The Fiddler AI Observability platform helps customers address the concerns surrounding generative AI. Whether AI teams are launching AI applications using open source, in-house-built LLMs, or closed LLMs provided by OpenAI, Anthropic, or Cohere, Fiddler equips users across the organization with an end-to-end LLMOps experience, from pre-production to production. With Fiddler, users can validate, monitor, analyze, and improve RAG applications. The platform offers many out-of-the-box enrichments that produce metrics to identify safety and privacy issues like toxicity and PII-leakage as well as correctness metrics like faithfulness and hallucinations.
LLM Observability: Analyze unstructured data on the 3D UMAP to identify data patterns and problematic prompts and responses
Use Case: The Fiddler AI Documentation Chatbot
Fiddler built an AI chatbot for our documentation site to help improve the customer experience of the Fiddler AI Observability platform. The chatbot answers queries for using Fiddler for ML and LLM monitoring.
Fiddler chose Astra DB as the chatbot’s vector database and was able to quickly set up an environment that had immediate access to multiple API endpoints. Using Astra’s Python libraries, Fiddler stored prompt history along with the embeddings for the documents in their data set. Key benefits were realized right away and we continue to monitor and improve our chatbot.
- After publishing the chatbot conversations to the Fiddler platform, chatbot performance is analyzed along with multiple key metrics; including cost, hallucinations, correctness, toxicity, data drift, and more.
- The Fiddler platform offers out-of-the-box dashboards that use multiple visualizations to track the chatbot performance over time under different load scenarios and compares responses with different cohorts of users.
- Fiddler LLM Observability also allows the chatbot operators to conduct root cause analysis when issues are detected.
You can learn more about Fiddler’s experience of developing this chatbot at the recent AI Forward 2023 Summit session Chat on Chatbots: Tips and Tricks. You can also request a demo of the Fiddler AI Observability platform for for ML and LLMOps.
LLM Observability: Monitor, report, and collaborate with technical and business teams using Dashboards and Charts
Subscribe to our newsletter
Monthly curated AI content, Fiddler updates, and more.
Fiddler AI Blog
https://www.fiddler.ai/blog/building-rag-based-ai-applications-with-datastax-and-fiddlerSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
applicationLLM Cost Tracking and Spend Management for Engineering Teams
<p>Your team ships a feature using GPT-4, it works great in staging, and then production traffic hits. Suddenly you are burning through API credits faster than anyone expected. Multiply that across three providers, five teams, and a few hundred thousand requests per day. Good luck figuring out where the money went.</p> <p>We built <a href="https://git.new/bifrost" rel="noopener noreferrer">Bifrost</a>, an open-source LLM gateway in Go, and cost tracking was one of the first problems we had to solve properly. This post covers what we learned, how we designed spend management into the gateway layer, and what the alternatives look like. You can get started with the <a href="https://docs.getbifrost.ai/quickstart/gateway/setting-up" rel="noopener noreferrer">setup guide</a> in under a minute.</
How Bifrost Reduces GPT Costs and Response Times with Semantic Caching
<h2> TL;DR </h2> <p>Every GPT API call costs money and takes time. If your app sends the same (or very similar) prompts repeatedly, you are paying full price each time for answers you already have. <a href="https://git.new/bifrost" rel="noopener noreferrer">Bifrost</a>, an open-source LLM gateway, ships with a <a href="https://docs.getbifrost.ai/features/semantic-caching" rel="noopener noreferrer">semantic caching</a> plugin that uses dual-layer caching: exact hash matching plus <a href="https://docs.getbifrost.ai/architecture/framework/vector-store" rel="noopener noreferrer">vector similarity search</a>. Cache hits cost zero. Semantic matches cost only the embedding lookup. This post walks you through how it works and how to set it up.</p> <h2> The cost problem with GPT API calls </h2> <p
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
arXiv:2510.14710v2 Announce Type: replace-cross Abstract: Datasets often possess an intrinsic multiscale structure with meaningful descriptions at different levels of coarseness. Such datasets are naturally described as multi-resolution clusterings, i.e., not necessarily hierarchical sequences of partitions across scales. To analyse and compare such sequences, we use tools from topological data analysis and define the Multiscale Clustering Bifiltration (MCbiF), a 2-parameter filtration of abstract simplicial complexes that encodes cluster intersection patterns across scales. The MCbiF is a complete invariant of (non-hierarchical) sequences of partitions and can be interpreted as a higher-order extension of Sankey diagrams, which reduce to dendrograms for hierarchical sequences. We show tha
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Scientists create smart synthetic skin that can hide images and change shape
Inspired by the shape-shifting skin of octopuses, Penn State researchers developed a smart hydrogel that can change appearance, texture, and shape on command. The material is programmed using a special printing technique that embeds digital instructions directly into the skin. Images and information can remain invisible until triggered by heat, liquids, or stretching.
[New Research] You need Slack to be an effective agent
Purchasesforce Superintelligence is excited to announce some new research. While we do not generally share research on LessWrong, this work was particularly influenced by prior work on LessWrong, so we found it appropriate to share back. As you know, Purchasesforce Superintelligence is a leading AI R&D laboratory. Recently, our research has focused on enhancing agentic capabilities. Here at Purchasesforce, we believe that autonomous AI agents, fully integrated into modern enterprise tools, will drive the future of enterprise operations. After reading the nascent literature on LessWrong describing the relationship between Slack and AI Agents, we were shocked by how closely it related with our own research directions. Of course, as the world's leading AI-first productivity platform, we have
LLM Cost Tracking and Spend Management for Engineering Teams
<p>Your team ships a feature using GPT-4, it works great in staging, and then production traffic hits. Suddenly you are burning through API credits faster than anyone expected. Multiply that across three providers, five teams, and a few hundred thousand requests per day. Good luck figuring out where the money went.</p> <p>We built <a href="https://git.new/bifrost" rel="noopener noreferrer">Bifrost</a>, an open-source LLM gateway in Go, and cost tracking was one of the first problems we had to solve properly. This post covers what we learned, how we designed spend management into the gateway layer, and what the alternatives look like. You can get started with the <a href="https://docs.getbifrost.ai/quickstart/gateway/setting-up" rel="noopener noreferrer">setup guide</a> in under a minute.</
The Role of AI in Today's Business Landscape
<p>The Role of AI in Today's Business Landscape</p> <p>In the rapidly evolving landscape of technology, <strong>AI-driven solutions</strong> have emerged as a cornerstone for businesses aiming to enhance their operations and customer engagement. From automating mundane tasks to providing deep insights into consumer behavior, AI is transforming industries and reshaping the future of eCommerce.</p> <p>Understanding AI-Driven Solutions</p> <p>AI-driven solutions involve the use of artificial intelligence technologies to improve business functions. These solutions can analyze vast amounts of data, identify patterns, and generate actionable insights that drive strategic decision-making. Companies leveraging AI can boost their efficiency and gain a competitive edge in their respective markets.</

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!