How it Works
Cleanlab Open-Source
Our Cleanlab Open-Source package is the most popular software framework for practicing Data-Centric AI today. Most of our open-source functionality comes from novel data quality algorithms invented by our team and published in research papers for transparency and scientific rigor. At a high-level, Cleanlab Open-Source is used like this:
  • You provide a ML model trained in a reasonable manner on your dataset.
  • Cleanlab Open-Source runs data quality algorithms on the outputs from your model to automatically detect various common issues in your dataset (label errors, outliers, near duplicates, drift, etc).
Read the quickstart guide
arrow
Community Slack
Ask questions, get support, and see how your peers, scientists, and engineers are practicing Data-Centric AI.
Join the conversation
arrow

features
Cleanlab Open-Source capabilities
Cleanlab helps you improve the quality and reliability of your data by automatically detecting and fixing issues in your ML dataset. To facilitate machine learning with messy, real-world data, Cleanlab Open-Source uses your existing models to estimate dataset problems that can be fixed to train even better models.
Cleanlab Open-Source includes:
  • API Access
  • Data and label issue detection
  • Support for image, text, and tabular data Support for audio and PDF
  • Image segmentation and object detection
Compare plans
arrow
Cleanlab Open-Source Resources
GitHub
arrow
Documentation
arrow
Explore Examples
arrow
Blogs
arrow
Loved by Data Scientist and ML Engineers
Cleanlab is being used by individuals and enterprises across industries to turn unreliable data into reliable models, and find and fix errors for LLMs and the modern AI stack.
Sanjeev Suresh
Sanjeev Suresh
ML Learning Engineer at UberAI
“Recently took part in a new kind of ML competition based on Andrew Ng’s idea of shifting focus from model-centric to data-centric AI. Found cleanlab, a useful package in supporting this data-centric movement. It is based on the field of confident learning and helps to detect and learn in the presence of noisy real world labels.”
Travis Tang
Travis Tang
Data Scientist at Gojek, in Towards AI
“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”
NorthyFN
NorthyFN
Cleanlab Open-Source user in a discussion on GitHub
“I’m just starting to get the hand of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Karl Schliep
Karl Schliep
Senior Data Scientist at Berkeley Research Group
“We’re looking into making Cleanlab a standard processing step whenever we get labels.”
Daniel Vila Suero
Daniel Vila Suero
Co-founder of Recognai
“Improving your training data is more important that using the latest ‘state-of-the-art’ model. Here’s a very simple trick: use cleanlab, a Python package for machine learning with noisy labels and finding mislabeled data.”
Madhava Jay
Madhava Jay
Engineer at OpenMined, posted on Twitter
“I’m just starting to get the hang of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Explore the blog
Read and contribute to our community blog featuring projects, tutorials, industry applications. company news, and more.
Automated Quality Assurance for Object Detection Datasets
Ulyana Tkachenko
Aditya Thyagarajan
Jonas Mueller
Ulyana Tkachenko , Aditya Thyagarajan and Jonas Mueller | September 26, 2023
Most AI & Analytics are impaired by data issues. Now AI can help you fix them.
Jonas Mueller
Curtis Northcutt
Anish Athalye
Jonas Mueller , Curtis Northcutt and Anish Athalye | July 31, 2023
Browse all blog posts
arrow

Let’s make Cleanlab better, together!

A huge shoutout to all our vocal users and supporters! Your questions, feedback, and suggestions make Cleanlab better every day. If you find Cleanlab useful in a particular project, please tell us and your coworkers. If you find anything unclear, just ask about it. Let’s continue to build and grow together.Contact us
arrow