How it Works
Cleanlab Open-Source
Our Cleanlab Open-Source package is the most popular software framework for practicing Data-Centric AI today. Most of our open-source functionality comes from novel data quality algorithms developed by our team and published in research papers for transparency and scientific rigor.
What Cleanlab Open-Source offers:
  • Limited Python API Access: Implement the Cleanlab open-source library for your existing pipelines and models to detect issues in your data.
  • Support for Popular Data Types: Works with image, text, tabular, audio, and PDF data formats.
Upgrade to Cleanlab Studio for advanced automated data quality improvements and analysis, seamless integration into your enterprise-level data storage and cloud platforms, instant AutoML/deployment scaled to massive datasets, full Python API access and more. Request a demo from our engineers or sign up for a free trial.
Community Slack
Ask questions, get support, and see how your peers, fellow scientists, and engineers are practicing Data-Centric AI.
Join the conversation
Cleanlab Open-Source Resources
GitHub
Documentation
Explore Examples
Blogs
Loved by Data Scientist and ML Engineers
Cleanlab is being used by individuals and enterprises across industries to turn unreliable data into reliable models, and find and fix errors for LLMs and the modern AI stack.
Sanjeev Suresh
Sanjeev Suresh
ML Learning Engineer at UberAI
“Recently took part in a new kind of ML competition based on Andrew Ng’s idea of shifting focus from model-centric to data-centric AI. Found cleanlab, a useful package in supporting this data-centric movement. It is based on the field of confident learning and helps to detect and learn in the presence of noisy real world labels.”
Travis Tang
Travis Tang
Data Scientist at Gojek, in Towards AI
“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”
NorthyFN
NorthyFN
Cleanlab Open-Source user in a discussion on GitHub
“I’m just starting to get the hand of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Karl Schliep
Karl Schliep
Senior Data Scientist at Berkeley Research Group
“We’re looking into making Cleanlab a standard processing step whenever we get labels.”
Daniel Vila Suero
Daniel Vila Suero
Co-founder of Recognai
“Improving your training data is more important that using the latest ‘state-of-the-art’ model. Here’s a very simple trick: use cleanlab, a Python package for machine learning with noisy labels and finding mislabeled data.”
Madhava Jay
Madhava Jay
Engineer at OpenMined, posted on Twitter
“I’m just starting to get the hang of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Explore the blog
Read and contribute to our community blog featuring projects, tutorials, industry applications. company news, and more.
Automated Quality Assurance for Object Detection Datasets
Ulyana Tkachenko
Aditya Thyagarajan
Jonas Mueller
Ulyana Tkachenko , Aditya Thyagarajan and Jonas Mueller | September 26, 2023
Most AI & Analytics are impaired by data issues. Now AI can help you fix them.
Jonas Mueller
Curtis Northcutt
Anish Athalye
Jonas Mueller , Curtis Northcutt and Anish Athalye | July 31, 2023
Browse all blog posts
arrow

Let’s make Cleanlab better, together!

A huge shoutout to all our vocal users and supporters! Your questions, feedback, and suggestions make Cleanlab better every day. If you find Cleanlab useful in a particular project, please tell us and your coworkers. If you find anything unclear, just ask about it. Let’s continue to build and grow together.Contact us
arrow