How We Started
Open-source since the beginning
The three founders of Cleanlab—Curtis Northcutt, Anish Athalye, and Jonas Mueller—honed their expertise while pursuing PhDs at MIT, renowned for its robust open-source community. When they established Cleanlab, they committed to maintaining their contributions to the open-source ecosystem even as the company grows. One way they honor this commitment is by teaching and . Additionally, they continuously invest in fostering an open-source culture within Cleanlab, exemplified by open-sourcing aspects of the company's own .“Part of the reason we've chosen to continue to invest in our open-source users at Cleanlab is because they believed in us before anyone else did.”
Star us on GitHubHow it Works
Cleanlab Open-Source
Our Cleanlab Open-Source package is the most popular software framework for practicing Data-Centric AI today. Most of our open-source functionality comes from novel
developed by our team and for transparency and scientific rigor.What Cleanlab Open-Source offers:
- Limited Python API Access: Implement the Cleanlab open-source library for your existing pipelines and models to detect issues in your data.
- Support for Popular Data Types: Works with image, text, tabular, audio, and PDF data formats.
Upgrade to Cleanlab Studio for advanced automated data quality improvements and analysis, seamless integration into your enterprise-level data storage and cloud platforms, instant AutoML/deployment scaled to massive datasets, full Python API access and more.
or .Community Slack
Ask questions, get support, and see how your peers, fellow scientists, and engineers are practicing Data-Centric AI.
Join the conversation Loved by Data Scientist and ML Engineers
Cleanlab is being used by individuals and enterprises across industries to turn unreliable data into reliable models, and find and fix errors for LLMs and the modern AI stack.
Sanjeev Suresh
ML Learning Engineer at UberAI
“Recently took part in a new kind of ML competition based on Andrew Ng’s idea of shifting focus from model-centric to data-centric AI. Found cleanlab, a useful package in supporting this data-centric movement. It is based on the field of confident learning and helps to detect and learn in the presence of noisy real world labels.”
Travis Tang
Data Scientist at Gojek, in Towards AI
“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”
NorthyFN
Cleanlab Open-Source user in a discussion on GitHub
“I’m just starting to get the hand of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Karl Schliep
Senior Data Scientist at Berkeley Research Group
“We’re looking into making Cleanlab a standard processing step whenever we get labels.”
Daniel Vila Suero
Co-founder of Recognai
“Improving your training data is more important that using the latest ‘state-of-the-art’ model. Here’s a very simple trick: use cleanlab, a Python package for machine learning with noisy labels and finding mislabeled data.”
Madhava Jay
Engineer at OpenMined, posted on Twitter
“I’m just starting to get the hang of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Discover
Deep dive into resources to learn more.
Documentation
Get started, learn about capabilities, and follow tutorials to improve your own Data and Models.
Introduction to Data-Centric AI Course
The first-ever course on data-centric AI. Learn how you can train better ML models by improving the data.
Solutions
See how to use cleanlab with specific real-world models and datasets.
Cleanlab Slack Community
Join the conversation on Slack.
Tutorials and Videos
Learn how to use Cleanlab with tutorial and demo videos on YouTube.
Community Guidelines
Explore the rules and guidelines for participating in Cleanlab community spaces.
Explore the blog
Read and contribute to our
featuring projects, tutorials, industry applications. company news, and more.CROWDLAB: The Right Way to Combine Humans and AI for LLM Evaluation
Nelson Auner | August 6, 2024
Improving any OpenAI Language Model by Systematically Improving its Data
Chris Mauck and Jonas Mueller | June 1, 2023
Datalab: A Linter for ML Datasets
Browse all blog posts Elías Snorrason , Sanjana Garg , Hui Wen Goh , Jesse Cummings and Jonas Mueller | May 16, 2023
Let’s make Cleanlab better, together!
A huge shoutout to all our vocal users and supporters! Your questions, feedback, and suggestions make Cleanlab better every day. If you find Cleanlab useful in a particular project, please tell us and your coworkers. If you find anything unclear, just ask about it. Let’s continue to build and grow together.Contact us