Prevent Hallucinated Responses from any AI Agent

June 30, 2025
  • Gordon LimGordon Lim
  • Dave KongDave Kong

AI agents are common in customer support but still hallucinate, even with tools. One wrong answer about refunds or delays can permanently damage trust. This article demonstrates how Cleanlab scores each response in real time, to stop bad AI agent outputs before they reach your users.

Preventing hallucinated responses from any AI agent

AI agents are becoming central to modern customer support, replacing traditional workflows with systems that can search, summarize, respond, and take action. But in production, an important issue remains: these agents often produce incorrect or misleading responses. Even in well-controlled environments, hallucinations still happen.

Agents are built on top of LLMs which are fundamentally brittle and randomly make errors. Connecting an LLM to external data sources or giving it control over tool invocation does not eliminate this fragility.

When agents hallucinate or return misleading information, especially in high-stakes situations like customer support, trust erodes quickly. If your AI gives the wrong answer about refund eligibility, flight delays, or health advisories, your customer may never return. Worse, they may post the error publicly.

To avoid this, your AI agent needs more than intelligence. It needs trustworthiness.

Building a Reliable Customer Support Agent with LangGraph

This case study focuses on AI agents that are designed to gather and synthesize information before responding. Let’s consider a sustomer support AI agent built using LangGraph, a flexible open-source framework for managing agentic LLM workflows.

This type of AI agent typically uses tools to look up policies or fetch real-time data, then summarizes that information for the user. But when issues occur in the tools/data or the LLM hallucinates, your agent may generate a confident-sounding but incorrect response.

Our corresponding tutorial shows a full implementation of this agent and the example runs below.

AI Agent Failure Examples in Customer Support

Failure example: Incomplete Tool Output

Customer queryAI tool calls(Incorrect) AI responseWhat happened
What are the details of the “five” discount plan from Swiss Airlines?Tool Calls:
get_discount_plan

Tool Message:
Returning details for the 'five' discount plan:

<end of output>
Here are the details for the “five” discount plan:
  • Discount Level: 5% off on all flights.
  • Validity: The discount can be applied to any flight booked through Swiss Airlines.
  • Usage: Can be used multiple times, but must be specified at the time of booking.
The agent calls the right tool, but the response contains very little information. The LLM fills in the gaps by inventing a discount structure that sounds legitimate but is not based on actual data.

Failure example: Misinterpreted Tool Result

Customer queryAI tool calls(Incorrect) AI responseWhat happened
Is there a health advisory in Basel?Tool Calls:
get_travel_advisory

Tool Message:
{"level": "Level 2 – Exercise Increased Caution", "notes": "Travelers to Switzerland should be aware of petty crime and take precautions."}
There is a health advisory for Switzerland, including Basel:
  • Advisory Level: Level 2 – Exercise Increased Caution
  • Notes: Travelers are advised to be aware of petty crime and take necessary precautions.
The travel advisory tool returns a general notice about petty crime in Switzerland. The LLM misreads the result and generates a response about nonexistent health risks.

Failure example: Incorrect Assumptions

Customer queryAI tool calls(Incorrect) AI responseWhat happened
What is my arrival time in their time zone?Tool Calls:
fetch_user_flight_information

Tool Message:
[{"ticket_no": "7240005432906569", "book_ref": "C46E9F", "flight_no": "LX0112", "departure_airport": "CDG", "arrival_airport": "BSL", "departure": "2025-05-30 01:36:24.169593-04:00", "arrival": "2025-05-30 03:06:24.169593-04:00"}]
Your flight LX0112 from Paris Charles de Gaulle (CDG) to EuroAirport Basel-Mulhouse-Freiburg (BSL) is scheduled to arrive at 03:06 AM (local time) on May 30, 2025.The agent retrieves UTC timestamps for the flight but assumes the origin and destination are in the same time zone. This leads to a response that appears precise but is wrong by several hours.

In each of these scenarios, the response may appear helpful, but it is flawed. The result is an agent that erodes trust instead of building it.

How Cleanlab Catches Agent Failures in Real Time

To solve these problems, Cleanlab’s detection capability provides a real-time trustworthiness scoring system. It analyzes the full user prompt, the AI’s response, relevant tool outputs, and all internal LLM calls made during the agent’s run to generate a trust score between 0 and 1. Higher scores indicate more reliable responses.

Unlike simple heuristics or basic LLM-as-a-judge models, Cleanlab combines statistical confidence estimation, model behavior profiling, and linguistic analysis into a single metric. It also offers natural-language explanations for why a response might be risky, helping your team trace issues quickly.

Cleanlab is benchmarked as the most accurate real-time method for detecting bad responses from any LLM. It works out of the box and requires neither labeled data nor model training infrastructure.

Using the same customer support failure examples from above, let’s see how applying trust scores can flag incorrect responses.

Preventable example: Incomplete Tool Output

Customer queryAI tool calls(Incorrect) AI responseCleanlab Score (0-1)
What are the details of the “five” discount plan from Swiss Airlines?Tool Calls:
get_discount_plan

Tool Message:
Returning details for the 'five' discount plan:

<end of output>
Here are the details for the “five” discount plan:
  • Discount Level: 5% off on all flights.
  • Validity: The discount can be applied to any flight booked through Swiss Airlines.
  • Usage: Can be used multiple times, but must be specified at the time of booking.
0.43 If the tool response lacks detail and the LLM hallucinates the rest, Cleanlab flags the response with a low trust score. This signals that the answer is untrustworthy and should be intercepted.

Preventable example: Misinterpreted Tool Result

Customer queryAI tool calls(Incorrect) AI responseCleanlab Score (0-1)
Is there a health advisory in Basel?Tool Calls:
get_travel_advisory

Tool Message:
{"level": "Level 2 – Exercise Increased Caution", "notes": "Travelers to Switzerland should be aware of petty crime and take precautions."}
There is a health advisory for Switzerland, including Basel:
  • Advisory Level: Level 2 – Exercise Increased Caution
  • Notes: Travelers are advised to be aware of petty crime and take necessary precautions.
0.72 When the travel advisory about petty crime is misread as a health warning, Cleanlab detects the mismatch. The response receives a middling score which is enough to trigger fallback logic in sensitive domains.

Preventable example: Incorrect Assumptions

Customer queryAI tool calls(Incorrect) AI responseCleanlab Score (0-1)
“What is my arrival time in their time zone?”Tool Calls:
fetch_user_flight_information

Tool Message:
[{"ticket_no": "7240005432906569", "book_ref": "C46E9F", "flight_no": "LX0112", "departure_airport": "CDG", "arrival_airport": "BSL", "departure": "2025-05-30 01:36:24.169593-04:00", "arrival": "2025-05-30 03:06:24.169593-04:00"}]
Your flight LX0112 from Paris Charles de Gaulle (CDG) to EuroAirport Basel-Mulhouse-Freiburg (BSL) is scheduled to arrive at 03:06 AM (local time) on May 30, 2025.0.37 An incorrect time zone conversion results in an arrival time that’s off by several hours. Cleanlab flags this with a very low score, ensuring the bad answer never reaches the user.

These examples show how Cleanlab adds a vital trust layer between your AI system and your customers.

Using Fallbacks for Safer Output

When Cleanlab assigns a low trust score (below 0.9), you can route the conversation to a fallback strategy.

One common fallback is a generic but safe response, such as:
“Sorry, I cannot answer that based on the available information. Please try rephrasing your question or providing more details.”

Other fallback options include:

  • Escalating the case to a human agent (e.g., via LangGraph’s interrupt() human in the loop capability)
  • Re-running the query with a revised prompt for the agent
  • Logging the incident for future fine-tuning or tool improvements

Easy Integration with LangGraph and Other Frameworks

Adding Cleanlab to your LangGraph agent only takes a few lines of code. You wrap your existing assistant node with Cleanlab’s TrustworthyAssistant, which automatically intercepts each LLM response and calculates its trust score.

python
from cleanlab_tlm import TLM

tlm = TLM()  
trustworthy_assistant = TrustworthyAssistant(
    assistant=existing_llm_node,
    tools=tools, tlm=tlm
)

You then update your graph to use the trustworthy_assistant node in place of your existing LLM node. That’s it. Your LangGraph agent now scores every LLM response in real time and stores the result in the graph state.

See our tutorial notebook for full implementation details.

Prevent bad responses from reaching users

Modern AI agents need to do more than generate responses. They must be safe, accurate, and reliable in real-world scenarios. Cleanlab provides this safeguard by scoring every AI response in real time and flagging those that are likely to be incorrect. It works without manual review or rule-based filters. When integrated with LangGraph or any other agentic framework, Cleanlab helps ensure that flawed responses never reach your users.

Related Blogs
TLM Lite: High-Quality LLM Responses with Efficient Trust Scores
TLM Lite allows you to generate high-quality responses using advanced LLMs while employing smaller models for fast and cost-effective trustworthiness scoring.
Read more
Finding Label Issues in Audio Classification Datasets
Learn how to find label issues in any audio classification dataset.
Read more
Benchmarking Hallucination Detection Methods in RAG
Evaluating state-of-the-art tools to automatically catch incorrect responses from a RAG system.
Read more
Get started today
TLM is free to try and adds a reliabilty layer to RAG and GenAI systems in a few lines of code.
More resources
Explore applications of Cleanlab via blogs, tutorials, videos, and read the research that powers this next-generation platform.
Join us on Slack
Join the Cleanlab Community to ask questions and see how scientists and engineers are practicing Data-Centric AI.