The Cleanlab Studio Audit uses AI to auto-detect problems in given data. Here we report issues found in the Food-101N image classification dataset, which can impair downstream modeling efforts and business analytics.

Whisking Away Errors: How Cleanlab Studio Served Up Fixes for the Food-101N Computer Vision Dataset

The Cleanlab Studio Audit uses AI to auto-detect problems in given data. Here we report issues found in the famous Fashion MNIST image classification dataset, which can impair product identification and other business intelligence efforts.

The Fashion MNIST Dataset (cited in 2,200+ papers) contains Hundreds of Miscategorized Items

The Cleanlab Studio Audit uses AI to auto-detect problems in given data. Here we report issues found in the Stanford Cars196 image classification dataset, which can impair product categorization, product identification, and other business intelligence efforts.

The Stanford Cars Dataset aka Cars196 (cited in 1000+ papers) contains many Fine-Grained Errors

The Cleanlab Studio Audit uses AI to auto-detect problems in given data -- here we report findings for the Office-Home image classification dataset.

The Office-Home Dataset (cited by 600+ papers) contains hundreds of incorrect labels and outliers.

The Cleanlab Studio Audit uses AI to auto-detect problems in given data -- here we report findings for a popular Reinforcement Learning from Human Feedback dataset.

Use Cleanlab to Improve LLMs: Find Errors in Human Feedback in the Anthropic RLHF Dataset

How enterprises can use LLMs to reliably catch compliance violations like GDPR from log files.

Safeguard Customer Data via Log Compliance Monitoring with the Trustworthy Language Model

A personal perspective on the importance of clean data as Cleanlab announces $30M in funding to bring automated data curation to enterprise AI.

Letter from the CEO: Announcing our Series A and Cleanlab's Trustworthy Language Model

With cleanlab v2.6, the most popular library for Data-Centric AI now offers more comprehensive data audits including new checks for underperforming groups, null values, imbalanced classes, and more.

An open-source platform to catch all sorts of issues in all sorts of datasets

Understanding cleanlab's new methods for multi-annotator data and what makes them effective.

CROWDLAB: Simple and effective algorithms to handle data labeled by multiple annotators

In this tutorial, learn how to use Cleanlab Studio to automatically correct multi-label classification data for image and document tagging, content curation, NLP, and more!

Automatically Find and Fix Issues in Image/Document Tags and other Multi-Label Datasets

Understanding cleanlab's new methods for text-based token classification tasks.

Detecting Label Errors in Entity Recognition Data

Introducing an automated solution to ensure high-quality image data, for both content moderation and boosting engagement. Easily curate any product/content catalog or photo gallery to delight your customers.

How to Filter Unsafe and Low-Quality Images from any Dataset: A Product Catalog Case Study

Benchmarking LLM trustworthiness scoring mechanisms to improve LLM abstention and response-generation.

Automatically Reduce Incorrect LLM Responses across OpenAI's SimpleQA Benchmark via Trustworthiness Scoring

Introducing new methods for estimating labeling quality in image segmentation datasets.

Detecting Annotation Errors in Semantic Segmentation Data

Cleanlab Studio for Enterprise launches to automate data curation for LLMs and the modern AI stack with $5 million in seed funding from Bain Capital Ventures.

Letter from the CEO: Announcing Our Seed Funding and the Launch of Cleanlab Studio for Enterprise

Exploring new ways to identify outliers based on probabilistic predictions from a trained classifier.

A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier

You may choose suboptimal prompts for your LLM (or make other suboptimal choices via model evaluation) unless you clean your test data.

Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5

Demonstrating how the Trustworthy Language Model system can produce better responses from a wide variety of LLMs

Automatically boost the accuracy of any LLM, without changing your prompts or the model

How we built an in-browser visualization of Cleanlab's Confident Learning algorithm.

How we built Cleanlab Vizzy

Introducing an open-source Python package to automatically identify common issues in image datasets.

CleanVision: Audit your Image Data for better Computer Vision

Benchmarking hallucination detection via the Trustworthy Language Model, now using the newest models from OpenAI and Anthropic.

Automatically detecting LLM hallucinations with models like GPT-4o and Claude

Catch issues in your data/labels. This unified audit uses your ML model to automatically detect various problems in real-world datasets that can be fixed to produce a better model.

Datalab: A Linter for ML Datasets

Introducing new data quality algorithms for multi-label classification in cleanlab v2.2

Automatic Error Detection for Image/Text Tagging and Multi-label Datasets

Generate AI, not headaches. Automate annotation with AI.

Reduce Your Data Annotation Costs by 80% with Cleanlab Studio

Announcing Auto-Labeling Agent: Your Assistant for Rapid and High Quality Labeling

Introducing cleanlab's dual new methods to detect outliers and how they perform on real image data.

Out-of-Distribution Detection via Embeddings or Predictions

A comprehensive benchmark of evaluation models to automatically catch incorrect responses across five RAG applications.

Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?

Use ActiveLab to efficiently choose which data to (re)label to train the best Transformer model.

Effectively Annotate Text Data for Transformers via Active Learning + Re-labeling

An open-source module to detect spurious correlations between dataset labels and features that will not generalize to real-world deployment.

Automatically catching spurious correlations in ML datasets

Use AI to measure the quality of LLM-generated data, automatically detecting unrealistic synthetic examples and underrepresented tails of the real data distribution.

Assessing the Quality of Synthetic Data with Cleanlab Studio

A case study on a reliable Customer Support Agent built with LangGraph and automated trustworthiness scoring

Prevent Hallucinated Responses from any AI Agent

Learn how to easily make any Tensorflow/Keras model compatible with scikit-learn.

Training Transformer Networks in Scikit-Learn?!

Highlighting new features available in cleanlab 2.1

cleanlab 2.1 adds Multi-Annotator Analysis and Outlier Detection: toward a broad framework for Data-Centric AI

Systematically evaluate synthetic datasets via quantitative scores. Use these scores to guide prompt engineering and other synthetic data generator optimizations.

How to Generate Better Synthetic Image Datasets with Stable Diffusion

A legal sector case study using Cleanlab Studio to produce better models for making predictions (eg. of final judgements) based on court case documents.

Improving Legal Judgement Prediction with Data-Centric AI

Learn data-centric techniques for better few-shot prompting when applying LLMs to noisy real-world data.

Ensuring Reliable Few-Shot Prompt Selection for LLMs

Introducing an entirely automated solution to: train cutting-edge ML models on raw data, use these models to detect various issues in the data, correct these issues, train better models on the improved data, and deploy them to serve reliable predictions in applications.

How To Train and Deploy Reliable Models on Messy Real-World Data With a Few Clicks

See results from using the Trustworthy Language Model to: detect hallucinations/errors from the o1 model and improve its response accuracy.

OpenAI's o1 surpassed using the Trustworthy Language Model

What's the next-generation platform for Data Science?  A data-centric AI system that can automatically: find and fix data issues, label data, and train/deploy reliable models.

Comparing tools for Data Science, Data Quality, Data Annotation, and AI/ML

Data is the fuel for AI (and Analytics), but is messy in real enterprise applications. Here’s how to use AI to also refine it, allowing your company to build a Data Engine as powerful as those at the heart of today’s biggest tech companies.

Most AI & Analytics are impaired by data issues. Now AI can help you fix them.

Use AI to measure the quality of satellite imagery data, automatically detecting mislabeled examples, outliers, ambiguous examples, and (near) duplicate examples.

Automated Correction of Satellite Imagery Data

Introducing cleanlab v2.5, the long-awaited release that adds support for practicing Data-Centric AI in ML tasks requested by the most users.

cleanlab now supports all major ML tasks — including Regression, Object Detection, and Image Segmentation

Learn how to automatically find label issues in any image classification dataset.


Finding Label Issues in Image Classification Datasets

TLM Lite allows you to generate high-quality responses using advanced LLMs while employing smaller models for fast and cost-effective trustworthiness scoring.

TLM Lite: High-Quality LLM Responses with Efficient Trust Scores

Generate AI, not headaches. Automate heterogenous data source curation with Cleanlab document support.

Curate large scale document collections - Cleanlab

Don’t Let Your Messy Documents Run You RAG-Ged. Announcing Document Curation in Cleanlab Studio

Announcing cleanlab 2.0: an open-source framework for machine learning and analytics with messy, real-world data.

cleanlab 2.0: Automatically Find Errors in ML Datasets

A simple method to determine if a dataset violates the IID assumption in common ways (e.g. temporal drift, or interaction between almost adjacent datapoints).

Detecting Dataset Drift and Non-IID Sampling: A k-Nearest Neighbors approach that works for Image/Text/Audio/Numeric Data

Learn how to reduce prediction errors by 70% using data-centric techniques with cleanlab.

Handling Mislabeled Tabular Data to Improve Your XGBoost Model

Ensure reliable answers in Retrieval-Augmented Generation, while also ensuring that latency and compute costs do not exceed the processing needed to accurately respond to complex queries.

Reliable Agentic RAG with LLM Trustworthiness Estimates

New algorithms to identify values in a numerical data column that are likely incorrect (eg. due to noise from erroneous sensors, data entry/processing mistakes, imperfect human estimates).

Detecting Errors in Numerical Data via any Regression Model

CROWDLAB improves your team's LLM Evals process by automatically producing reliable ratings and flagging which outputs need further review.

CROWDLAB: The Right Way to Combine Humans and AI for LLM Evaluation

Using AI to analyze product listings for errors, and how this boosts the accuracy of product categorization and analytics efforts.

Enhancing Product Analytics and E-commerce with Data-Centric AI

Introducing AI text audits for automated content moderation and curation, including the detection of: toxic, non-English, and informal language, as well as personally identifiable information.

Automatically Detect Problematic Content in any Text Dataset

cleanlab 2.3 adds support for Active Learning, Tensorflow/Keras models made sklearn-compatible, and highly scalable Label Error Detection

Introducing new data quality algorithms to systematically detect errors in object detection datasets.

Automated Quality Assurance for Object Detection Datasets

How automated quality assurance can help data annotation teams ensure accurate data with less work.

Ensure high-quality data quickly via AI validation of which data is Well Labeled

ActiveLab helps you optimally choose which data to (re)label, lowering the cost to train an accurate ML model.

ActiveLab: Active Learning with Data Re-Labeling

How an MIT grad student project became a company with tech used by Google, Amazon, Tesla, Uber, Facebook, and companies around the world.

Cleanlab: The History, Present, and Future

Learn how to find label issues in text datasets and improve NLP models.


Handling Label Errors in Text Classification Datasets

Overview of automated tools for catching: low-quality responses, incomplete/vague prompts, and other  problematic text (toxic language, PII, informal writing, bad grammar/spelling) lurking in a instruction-response dataset. Here we reveal findings for the Dolly dataset.

How to detect bad data in your instruction tuning dataset (for better LLM fine-tuning)

How law firms have adopted data-centric AI software to catch miscategorized legal documents and instantly obtain accurate relevance determinations for every case.

Reduce Legal Discovery Work by 10x with AI that Curates Documents and Fixes Errors

Evaluating state-of-the-art tools to automatically catch incorrect responses from a RAG system.

Benchmarking Hallucination Detection Methods in RAG

Accelerate Time Series Modeling with Cleanlab Studio AutoML. Predictable results in a few clicks."

Robust and Accurate AutoML for Time Series in Quick Production Deployment | Cleanlab Studio

Accelerate Time Series Modeling with Cleanlab Studio AutoML: Train and Deploy in Minutes

Reduce LLM prediction error by 37% via data-centric AI.

Improving any OpenAI Language Model by Systematically Improving its Data

Learn how to find label issues in any audio classification dataset.


Finding Label Issues in Audio Classification Datasets

Announcing Cleanlab's Trustworthy Language Model. TLM overcomes hallucinations, the biggest barrier to productionizing GenAI, by adding a trust score to every LLM output.

Overcoming Hallucinations with Cleanlab

A fully-automated analysis of errors in the ImageNet training set.


Use Cleanlab to Improve LLMs: Find Errors in Human Feedback in the Anthropic RLHF Dataset

Reinforcement Learning from Human Feedback Data

Example 1

Example 2

Platform

Resources

Community

Company