How We Started
Open-source since the beginning
The three founders of Cleanlab—Curtis Northcutt, Anish Athalye, and Jonas Mueller—honed their expertise while pursuing PhDs at MIT, renowned for its robust open-source community. When they established Cleanlab, they committed to maintaining their contributions to the open-source ecosystem even as the company grows. One way they honor this commitment is by teaching MIT's course on Data-centric AI and open-sourcing the course content. Additionally, they continuously invest in fostering an open-source culture within Cleanlab, exemplified by open-sourcing aspects of the company's own culture.
“Part of the reason we've chosen to continue to invest in our open-source users at Cleanlab is because they believed in us before anyone else did.”
Curtis Northcutt, CEO of Cleanlab, on a LinkedIn post.
Star us on GitHub
How it Works
Cleanlab Open-Source
Our Cleanlab Open-Source package is the most popular software framework for practicing Data-Centric AI today. Most of our open-source functionality comes from novel data quality algorithms developed by our team and published in research papers for transparency and scientific rigor.
What Cleanlab Open-Source offers:
  • Limited Python API Access: Implement the Cleanlab open-source library for your existing pipelines and models to detect issues in your data.
  • Support for Popular Data Types: Works with image, text, tabular, audio, and PDF data formats.
Upgrade to Cleanlab Studio for advanced automated data quality improvements and analysis, seamless integration into your enterprise-level data storage and cloud platforms, instant AutoML/deployment scaled to massive datasets, full Python API access and more. Request a demo from our engineers or sign up for a free trial.
Community Slack
Ask questions, get support, and see how your peers, fellow scientists, and engineers are practicing Data-Centric AI.
Join the conversation
Cleanlab Open-Source Resources
GitHub
Documentation
Explore Examples
Blogs
Loved by Data Scientist and ML Engineers
Cleanlab is being used by individuals and enterprises across industries to turn unreliable data into reliable models, and find and fix errors for LLMs and the modern AI stack.
Sanjeev Suresh
Sanjeev Suresh
ML Learning Engineer at UberAI
“Recently took part in a new kind of ML competition based on Andrew Ng’s idea of shifting focus from model-centric to data-centric AI. Found cleanlab, a useful package in supporting this data-centric movement. It is based on the field of confident learning and helps to detect and learn in the presence of noisy real world labels.”
Travis Tang
Travis Tang
Data Scientist at Gojek, in Towards AI
“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”
NorthyFN
NorthyFN
Cleanlab Open-Source user in a discussion on GitHub
“I’m just starting to get the hand of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Karl Schliep
Karl Schliep
Senior Data Scientist at Berkeley Research Group
“We’re looking into making Cleanlab a standard processing step whenever we get labels.”
Daniel Vila Suero
Daniel Vila Suero
Co-founder of Recognai
“Improving your training data is more important that using the latest ‘state-of-the-art’ model. Here’s a very simple trick: use cleanlab, a Python package for machine learning with noisy labels and finding mislabeled data.”
Madhava Jay
Madhava Jay
Engineer at OpenMined, posted on Twitter
“I’m just starting to get the hang of this and read on how it works. But right now from the first results it looks like pure black magic... So thank you for this!!”
Explore the blog
Read and contribute to our community blog featuring projects, tutorials, industry applications. company news, and more.
Most AI & Analytics are impaired by data issues. Now AI can help you fix them.
Jonas Mueller
Curtis Northcutt
Anish Athalye
Jonas Mueller , Curtis Northcutt and Anish Athalye | July 31, 2023
Datalab: A Linter for ML Datasets
Elías Snorrason
Sanjana Garg
Hui Wen Goh
Jesse Cummings
Jonas Mueller
Elías Snorrason , Sanjana Garg , Hui Wen Goh , Jesse Cummings and Jonas Mueller | May 16, 2023
Browse all blog posts
arrow

Let’s make Cleanlab better, together!

A huge shoutout to all our vocal users and supporters! Your questions, feedback, and suggestions make Cleanlab better every day. If you find Cleanlab useful in a particular project, please tell us and your coworkers. If you find anything unclear, just ask about it. Let’s continue to build and grow together.Contact us
arrow