Fiddler LLM Enrichments Framework for LLM Monitoring
Learn how the Fiddler LLM Enrichments Framework uses Trust Models to enrich prompts and responses for accurate large language model (LLM) monitoring.
We are thrilled to share major upgrades to the Fiddler AI Observability Platform to support Large Language Model Operations (LLMOps)! Effective monitoring is essential for maintaining the correctness, safety, scalability, and cost of LLM applications. Our team recently showcased the Fiddler LLM Enrichments Framework, which enables high-accuracy LLM monitoring.
How LLM Observability can Help Reduce AI Risks
According to a recent McKinsey & Company survey, the adoption of generative AI has increased significantly over the past year, with 65 percent of respondents indicating that their organizations regularly use generative AI (GenAI) in at least one business function. As the adoption of GenAI accelerates, enterprises must safeguard their company, employees, and customers from associated AI risks and challenges.
Consider the following real-life examples that highlight the risks posed by GenAI and LLM applications and how LLM Observability could have helped mitigate the risks:
Hallucinations in Legal Contexts
One of the most glaring issues with LLMs is their tendency to "hallucinate" — generating responses that appear plausible but are factually incorrect or entirely fabricated. A stark example of this occurred in New York, where a lawyer used ChatGPT to assist with legal research. ChatGPT provided six sources of information to support the lawyer's case. Unfortunately, it was later discovered that all six sources were entirely fabricated.
This not only jeopardized the lawyer's case but also highlighted the critical need for accurate monitoring of LLM outputs to prevent such potentially disastrous errors.
Jailbreak in Business Applications
Another example that underscores the necessity of LLM monitoring involves a chatbot deployed by a car dealership. A user manipulated the chatbot into stating that the purchase of a truck would only cost one dollar and claimed this response was legally binding. This situation, known as a "jailbreak," occurs when users exploit vulnerabilities in the LLM to generate unintended responses that negatively impact the company or customers.
Effective monitoring can detect and prevent such jailbreak attempts, protecting businesses from financial loss and reputational damage.
Bias in GenAI Applications
Bias in GenAI applications is another significant concern, as it can lead to discriminatory outcomes that affect marginalized groups. An example of this is a chatbot that provided biased responses, discriminating against certain protected groups. This not only poses ethical and legal challenges but also undermines the trust of GenAI applications.
Monitoring LLM metrics is critical to developing inclusive and equitable GenAI/LLM applications.
LLMs operate on unstructured data like text, which is more nuanced than structured data. In addition, the accuracy of an LLM's output is highly context-dependent. What may be considered a good response in one context may be incorrect in another. As a result, monitoring the quality, correctness, and safety of outputs requires sophisticated techniques beyond simple accuracy metrics.
Fiddler’s LLM Enrichments Framework enables high-quality and highly accurate LLM monitoring
We’ve built the LLM Enrichments Framework to address this complexity. The LLM Enrichments Framework augments LLM inputs and outputs with scores (or "enrichments") for monitoring purposes.
The framework serves a wide range of scores for hallucination and safety metrics, such as faithfulness, answer relevance, toxicity, jailbreak, PII, and other LLM metrics, as seen on the diagram below. Whether it's a topic poorly covered by a chatbot's knowledge base or a vulnerability to specific prompt-injection attacks, the Fiddler LLM Enrichments Framework offers high accuracy LLM monitoring to enhance LLM application performance, correctness, and safety.
Fiddler monitors a comprehensive library of LLM metrics, and the Fiddler LLM Enrichments Framework is utilized to enrich hallucination and safety metrics
How the LLM Enrichments Framework Works
The Fiddler LLM Enrichments Framework generates enrichments to calculate prompts and responses and improve the quality of monitoring. Enrichments enhance user prompts and LLM responses with supplementary information for accurate scoring.
Fiddler’s LLM Enrichments Framework augments prompts and responses by enriching them for LLM monitoring
This process involves the following steps:
- Data Ingestion: Data, user prompts and LLM responses, are ingested into the Fiddler platform
- Data Scoring: Supplementary information is added to the data through specific calculations for the LLM metric being monitored, enhancing the quality and accuracy of the data
- LLM Metrics Monitoring: Enrichments use Fiddler Trust Models to understand the context behind the prompts and responses, providing an accurate score for LLM metrics monitoring in Fiddler
Fiddler Trust Models for Enterprise Scale LLM Monitoring
Behind the Fiddler LLM Enrichments Framework are the Fiddler Trust Models, our proprietary fine-tuned models, that quickly calculate scores for user prompts and LLM responses.
A common approach to score prompts and responses is using closed-source LLMs. However, this is a short-term solution that hinders enterprises from scaling their LLM application deployments in a cost-effective manner. Closed-source LLMs, with their hundreds of millions of parameters, are highly expressive and designed for general purposes, which increases the latency, and limits their effectiveness and efficiency in scoring specific metrics.
Unlike other LLMs, the Fiddler Trust Models are optimized for speed, safety, cost, and task-specific accuracy, delivering near real-time calculations. This efficiency helps enterprises scale their GenAI and LLM deployments across their organization effectively.
Fiddler’s Trust Models are fast, safe, scalable and cost effective
Key advantages of the Trust Models are:
- Fast: Obtain faster calculations to detect hallucinations, toxicity, PII leakage, prompt-injection attacks, and other metrics in near real-time
- Safe: Ensure data is secure and never leaves the premises, even in air gapped environments
- Scalable: Scale LLM applications as they get more user traffic
- Cost-effective: Maintain the costs to operate LLMs low
We are excited to share our latest platform upgrades to help enterprises scale their production GenAI applications. Watch the full webinar to learn more about Fiddler’s LLM Enrichments Framework.
Subscribe to our newsletter
Monthly curated AI content, Fiddler updates, and more.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelThink Anywhere in Code Generation
arXiv:2603.29957v1 Announce Type: new Abstract: Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training
Automatic Identification of Parallelizable Loops Using Transformer-Based Source Code Representations
arXiv:2603.30040v1 Announce Type: new Abstract: Automatic parallelization remains a challenging problem in software engineering, particularly in identifying code regions where loops can be safely executed in parallel on modern multi-core architectures. Traditional static analysis techniques, such as dependence analysis and polyhedral models, often struggle with irregular or dynamically structured code. In this work, we propose a Transformer-based approach to classify the parallelization potential of source code, focusing on distinguishing independent (parallelizable) loops from undefined ones. We adopt DistilBERT to process source code sequences using subword tokenization, enabling the model to capture contextual syntactic and semantic patterns without handcrafted features. The approach is
SkillReducer: Optimizing LLM Agent Skills for Token Efficiency
arXiv:2603.29919v1 Announce Type: new Abstract: LLM-based coding agents rely on \emph{skills}, pre-packaged instruction sets that extend agent capabilities, yet every token of skill content injected into the context window incurs both monetary cost and attention dilution. To understand the severity of this problem, we conduct a large-scale empirical study of 55,315 publicly available skills and find systemic inefficiencies: 26.4\% lack routing descriptions entirely, over 60\% of body content is non-actionable, and reference files can inject tens of thousands of tokens per invocation. Motivated by these findings, we present \textsc{SkillReducer}, a two-stage optimization framework. Stage~1 optimizes the routing layer by compressing verbose descriptions and generating missing ones via advers
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

From Kindergarten to Career Change: How CMU Designs Education for a Lifetime
<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/250516B_Surprise_EM_053.jpg.webp?itok=Ipq3jUzk" width="900" height="508" alt="Sharon Carver with students"> </p> CMU’s learning initiatives are shaped by research on how people learn, rather than by any single discipline. That approach shows up in K–12 classrooms, college courses, and workforce training programs, where learning science and AI are used to support evolving educational needs.
Think Anywhere in Code Generation
arXiv:2603.29957v1 Announce Type: new Abstract: Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training
Automatic Identification of Parallelizable Loops Using Transformer-Based Source Code Representations
arXiv:2603.30040v1 Announce Type: new Abstract: Automatic parallelization remains a challenging problem in software engineering, particularly in identifying code regions where loops can be safely executed in parallel on modern multi-core architectures. Traditional static analysis techniques, such as dependence analysis and polyhedral models, often struggle with irregular or dynamically structured code. In this work, we propose a Transformer-based approach to classify the parallelization potential of source code, focusing on distinguishing independent (parallelizable) loops from undefined ones. We adopt DistilBERT to process source code sequences using subword tokenization, enabling the model to capture contextual syntactic and semantic patterns without handcrafted features. The approach is
AI-Programmable Wireless Connectivity: Challenges and Research Directions Toward Interactive and Immersive Industry
arXiv:2603.29752v1 Announce Type: new Abstract: This vision paper addresses the research challenges of integrating traditional signal processing with Artificial Intelligence (AI) to enable energy-efficient, programmable, and scalable wireless connectivity infrastructures. While prior studies have primarily focused on high-level concepts, such as the potential role of Large Language Model (LLM) in 6G systems, this work advances the discussion by emphasizing integration challenges and research opportunities at the system level. Specifically, this paper examines the role of compact AI models, including Tiny and Real-time Machine Learning (ML), in enhancing wireless connectivity while adhering to strict constraints on computing resources, adaptability, and reliability. Application examples are

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!