How it Works
Cleanlab Open-Source
Our Cleanlab Open-Source package is the most popular software framework for practicing Data-Centric AI today. Most of our open-source functionality comes from novel
invented by our team and published in research papers for transparency and scientific rigor. At a high-level, Cleanlab Open-Source is used like this:- You provide a ML model trained in a reasonable manner on your dataset.
- Cleanlab Open-Source runs data quality algorithms on the outputs from your model to automatically detect various common issues in your dataset (label errors, outliers, near duplicates, drift, etc).
Community Slack
Ask questions, get support, and see how your peers, scientists, and engineers are practicing Data-Centric AI.
Join the conversation features
Cleanlab Open-Source capabilities
Cleanlab helps you improve the quality and reliability of your data by automatically detecting and fixing issues in your ML dataset. To facilitate machine learning with messy, real-world data, Cleanlab Open-Source uses your existing models to estimate dataset problems that can be fixed to train even better models.
Cleanlab Open-Source includes:
- API Access
- Data and label issue detection
- Support for image, text, and tabular data Support for audio and PDF
- Image segmentation and object detection
Loved by Data Scientist and ML Engineers
Cleanlab is being used by individuals and enterprises across industries to turn unreliable data into reliable models, and find and fix errors for LLMs and the modern AI stack.
Sanjeev Suresh
ML Learning Engineer at UberAI
“Recently took part in a new kind of ML competition based on Andrew Ng’s idea of shifting focus from model-centric to data-centric AI. Found cleanlab, a useful package in supporting this data-centric movement. It is based on the field of confident learning and helps to detect and learn in the presence of noisy real world labels.”
Travis Tang
Data Scientist at Gojek, in Towards AI
“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”
NorthyFN
Cleanlab Open-Source user in a discussion on GitHub
“I’m just starting to get the hand of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Karl Schliep
Senior Data Scientist at Berkeley Research Group
“We’re looking into making Cleanlab a standard processing step whenever we get labels.”
Daniel Vila Suero
Co-founder of Recognai
“Improving your training data is more important that using the latest ‘state-of-the-art’ model. Here’s a very simple trick: use cleanlab, a Python package for machine learning with noisy labels and finding mislabeled data.”
Madhava Jay
Engineer at OpenMined, posted on Twitter
“I’m just starting to get the hang of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Discover
Deep dive into resources to learn more.
Documentation
Get started, learn about capabilities, and follow tutorials to improve your own Data and Models.
Introduction to Data-Centric AI Course
The first-ever course on data-centric AI. Learn how you can train better ML models by improving the data.
Solutions
See how to use cleanlab with specific real-world models and datasets.
Cleanlab Slack Community
Join the conversation on Slack.
Tutorials and Videos
Learn how to use Cleanlab with tutorial and demo videos on YouTube.
Community Guidelines
Explore the rules and guidelines for participating in Cleanlab community spaces.
Explore the blog
Read and contribute to our
featuring projects, tutorials, industry applications. company news, and more.Automated Quality Assurance for Object Detection Datasets
Ulyana Tkachenko , Aditya Thyagarajan and Jonas Mueller | September 26, 2023
Most AI & Analytics are impaired by data issues. Now AI can help you fix them.
Jonas Mueller , Curtis Northcutt and Anish Athalye | July 31, 2023
Improving any OpenAI Language Model by Systematically Improving its Data
Browse all blog posts Chris Mauck and Jonas Mueller | June 1, 2023
Let’s make Cleanlab better, together!
A huge shoutout to all our vocal users and supporters! Your questions, feedback, and suggestions make Cleanlab better every day. If you find Cleanlab useful in a particular project, please tell us and your coworkers. If you find anything unclear, just ask about it. Let’s continue to build and grow together.Contact us