Prevent Hallucinated Responses from any AI Agent

AI agents are common in customer support but still hallucinate, even with tools. One wrong answer about refunds or delays can permanently damage trust. This article demonstrates how Cleanlab scores each response in real time, to stop bad AI agent outputs before they reach your users.

Preventing hallucinated responses from any AI agent

AI agents are becoming central to modern customer support, replacing traditional workflows with systems that can search, summarize, respond, and take action. But in production, an important issue remains: these agents often produce incorrect or misleading responses. Even in well-controlled environments, hallucinations still happen.

Agents are built on top of LLMs which are fundamentally brittle and randomly make errors. Connecting an LLM to external data sources or giving it control over tool invocation does not eliminate this fragility.

When agents hallucinate or return misleading information, especially in high-stakes situations like customer support, trust erodes quickly. If your AI gives the wrong answer about refund eligibility, flight delays, or health advisories, your customer may never return. Worse, they may post the error publicly.

To avoid this, your AI agent needs more than intelligence. It needs trustworthiness.

Building a Reliable Customer Support Agent with LangGraph

This case study focuses on AI agents that are designed to gather and synthesize information before responding. Let’s consider a sustomer support AI agent built using LangGraph, a flexible open-source framework for managing agentic LLM workflows.

This type of AI agent typically uses tools to look up policies or fetch real-time data, then summarizes that information for the user. But when issues occur in the tools/data or the LLM hallucinates, your agent may generate a confident-sounding but incorrect response.

Our corresponding tutorial shows a full implementation of this agent and the example runs below.

AI Agent Failure Examples in Customer Support

Failure example: Incomplete Tool Output

Customer query	AI tool calls	(Incorrect) AI response	What happened
What are the details of the “five” discount plan from Swiss Airlines?	Tool Calls: get_discount_plan Tool Message: `Returning details for the 'five' discount plan:` <end of output>	Here are the details for the “five” discount plan: Discount Level: 5% off on all flights. Validity: The discount can be applied to any flight booked through Swiss Airlines. Usage: Can be used multiple times, but must be specified at the time of booking.	The agent calls the right tool, but the response contains very little information. The LLM fills in the gaps by inventing a discount structure that sounds legitimate but is not based on actual data.

Failure example: Misinterpreted Tool Result

Customer query	AI tool calls	(Incorrect) AI response	What happened
Is there a health advisory in Basel?	Tool Calls: get_travel_advisory Tool Message: `{"level": "Level 2 – Exercise Increased Caution", "notes": "Travelers to Switzerland should be aware of petty crime and take precautions."}`	There is a health advisory for Switzerland, including Basel: Advisory Level: Level 2 – Exercise Increased Caution Notes: Travelers are advised to be aware of petty crime and take necessary precautions.	The travel advisory tool returns a general notice about petty crime in Switzerland. The LLM misreads the result and generates a response about nonexistent health risks.

Failure example: Incorrect Assumptions

Customer query	AI tool calls	(Incorrect) AI response	What happened
What is my arrival time in their time zone?	Tool Calls: fetch_user_flight_information Tool Message: `[{"ticket_no": "7240005432906569", "book_ref": "C46E9F", "flight_no": "LX0112", "departure_airport": "CDG", "arrival_airport": "BSL", "departure": "2025-05-30 01:36:24.169593-04:00", "arrival": "2025-05-30 03:06:24.169593-04:00"}]`	Your flight LX0112 from Paris Charles de Gaulle (CDG) to EuroAirport Basel-Mulhouse-Freiburg (BSL) is scheduled to arrive at 03:06 AM (local time) on May 30, 2025.	The agent retrieves UTC timestamps for the flight but assumes the origin and destination are in the same time zone. This leads to a response that appears precise but is wrong by several hours.

In each of these scenarios, the response may appear helpful, but it is flawed. The result is an agent that erodes trust instead of building it.

How Cleanlab Catches Agent Failures in Real Time

To solve these problems, Cleanlab’s detection capability provides a real-time trustworthiness scoring system. It analyzes the full user prompt, the AI’s response, relevant tool outputs, and all internal LLM calls made during the agent’s run to generate a trust score between 0 and 1. Higher scores indicate more reliable responses.

Unlike simple heuristics or basic LLM-as-a-judge models, Cleanlab combines statistical confidence estimation, model behavior profiling, and linguistic analysis into a single metric. It also offers natural-language explanations for why a response might be risky, helping your team trace issues quickly.

Cleanlab is benchmarked as the most accurate real-time method for detecting bad responses from any LLM. It works out of the box and requires neither labeled data nor model training infrastructure.

Using the same customer support failure examples from above, let’s see how applying trust scores can flag incorrect responses.

Preventable example: Incomplete Tool Output

Customer query	AI tool calls	(Incorrect) AI response	Cleanlab Score (0-1)
What are the details of the “five” discount plan from Swiss Airlines?	Tool Calls: get_discount_plan Tool Message: `Returning details for the 'five' discount plan:` <end of output>	Here are the details for the “five” discount plan: Discount Level: 5% off on all flights. Validity: The discount can be applied to any flight booked through Swiss Airlines. Usage: Can be used multiple times, but must be specified at the time of booking.	`0.43` If the tool response lacks detail and the LLM hallucinates the rest, Cleanlab flags the response with a low trust score. This signals that the answer is untrustworthy and should be intercepted.

Preventable example: Misinterpreted Tool Result

Customer query	AI tool calls	(Incorrect) AI response	Cleanlab Score (0-1)
Is there a health advisory in Basel?	Tool Calls: get_travel_advisory Tool Message: `{"level": "Level 2 – Exercise Increased Caution", "notes": "Travelers to Switzerland should be aware of petty crime and take precautions."}`	There is a health advisory for Switzerland, including Basel: Advisory Level: Level 2 – Exercise Increased Caution Notes: Travelers are advised to be aware of petty crime and take necessary precautions.	`0.72` When the travel advisory about petty crime is misread as a health warning, Cleanlab detects the mismatch. The response receives a middling score which is enough to trigger fallback logic in sensitive domains.

Preventable example: Incorrect Assumptions

Customer query	AI tool calls	(Incorrect) AI response	Cleanlab Score (0-1)
“What is my arrival time in their time zone?”	Tool Calls: fetch_user_flight_information Tool Message: `[{"ticket_no": "7240005432906569", "book_ref": "C46E9F", "flight_no": "LX0112", "departure_airport": "CDG", "arrival_airport": "BSL", "departure": "2025-05-30 01:36:24.169593-04:00", "arrival": "2025-05-30 03:06:24.169593-04:00"}]`	Your flight LX0112 from Paris Charles de Gaulle (CDG) to EuroAirport Basel-Mulhouse-Freiburg (BSL) is scheduled to arrive at 03:06 AM (local time) on May 30, 2025.	`0.37` An incorrect time zone conversion results in an arrival time that’s off by several hours. Cleanlab flags this with a very low score, ensuring the bad answer never reaches the user.

These examples show how Cleanlab adds a vital trust layer between your AI system and your customers.

Using Fallbacks for Safer Output

When Cleanlab assigns a low trust score (below 0.9), you can route the conversation to a fallback strategy.

One common fallback is a generic but safe response, such as:
“Sorry, I cannot answer that based on the available information. Please try rephrasing your question or providing more details.”

Other fallback options include:

Escalating the case to a human agent (e.g., via LangGraph’s interrupt() human in the loop capability)
Re-running the query with a revised prompt for the agent
Logging the incident for future fine-tuning or tool improvements

Easy Integration with LangGraph and Other Frameworks

Adding Cleanlab to your LangGraph agent only takes a few lines of code. You wrap your existing assistant node with Cleanlab’s TrustworthyAssistant, which automatically intercepts each LLM response and calculates its trust score.

python

0
1
2
3
4
5
6
7
from cleanlab_tlm import TLM
tlm = TLM()  trustworthy_assistant = TrustworthyAssistant(    assistant=existing_llm_node,    tools=tools, tlm=tlm)

You then update your graph to use the trustworthy_assistant node in place of your existing LLM node. That’s it. Your LangGraph agent now scores every LLM response in real time and stores the result in the graph state.

See our tutorial notebook for full implementation details.

Prevent bad responses from reaching users

Modern AI agents need to do more than generate responses. They must be safe, accurate, and reliable in real-world scenarios. Cleanlab provides this safeguard by scoring every AI response in real time and flagging those that are likely to be incorrect. It works without manual review or rule-based filters. When integrated with LangGraph or any other agentic framework, Cleanlab helps ensure that flawed responses never reach your users.

Browse all Next

A personal perspective on the importance of clean data as Cleanlab announces $30M in funding to bring automated data curation to enterprise AI.

Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data

A simple method to determine if a dataset violates the IID assumption in common ways (e.g. temporal drift, or interaction between almost adjacent datapoints).

How to detect bad data in your instruction tuning dataset (for better LLM fine-tuning)

Overview of automated tools for catching: low-quality responses, incomplete/vague prompts, and other problematic text (toxic language, PII, informal writing, bad grammar/spelling) lurking in a instruction-response dataset. Here we reveal findings for the Dolly dataset.

Prevent Hallucinated Responses from any AI Agent

Building a Reliable Customer Support Agent with LangGraph

AI Agent Failure Examples in Customer Support

How Cleanlab Catches Agent Failures in Real Time

Using Fallbacks for Safer Output

Easy Integration with LangGraph and Other Frameworks

Prevent bad responses from reaching users

Platform

Resources

Community

Company