Why Users Love Cleanlab

Testimonials from Google, Tencent, and others:

Case Studies | Cleanlab Technology
“Cleanlab is well-designed, scalable and theoretically grounded: it accurately finds data errors, even on well-known and established datasets. After using it for a successful pilot project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”

Resources:

Learn how companies (and you!) use Cleanlab:

Google Senior Software Engineer uses Cleanlab to find data errors at scale:

“Cleanlab is well-designed, scalable and theoretically grounded: it accurately finds data errors, even on well-known and established datasets. After using it for a successful pilot project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”
- Patrick Violette, Senior Software Engineer, Google
Google testimonial

Berkeley Research Group increases ML model accuracy by 15% and reduces time spent by 1/3 using Cleanlab Studio

Berkeley Research Group

Banco Bilbao Vizcaya Argentaria (BBVA)

BBVA, one of the largest financial institutions in the world, used Cleanlab in :

an update of one of the functionalities offered by the BBVA app: the categorization of financial transactions. These categories allow users to group their transactions to better control their income and expenses, and to understand the overall evolution of their finances. This service is available to all users in Spain, Mexico, Peru, Colombia, and Argentina.
Banco Bilbao Vizcaya Argentaria (BBVA)
Banco Bilbao Vizcaya Argentaria (BBVA)
We used AL [Active Learning] in combination with Cleanlab

This was necessary because, although we had defined and unified annotation criteria for transactions, some could be linked to several subcategories depending on the annotator’s interpretation. To reduce the impact of having different subcategories for similar transactions, we used Cleanlab for discrepancy detection.

With the current model, we were able to improve accuracy by 28%, while reducing the number of labeled transactions required to train the model by more than 98%

CleanLab assimilates input from annotators and corrects any discrepancies between similar samples.

CleanLab helped us reduce the uncertainty of noise in the tags. This process enabled us to train the model, update the training set, and optimize its performance. The goal was to reduce the number of labeled transactions and make the model more efficient, requiring less time and dedication. This allows data scientists to focus on tasks that generate greater value for customers and organizations.

The Stakeholder Company (Singapore) saves 8x time with Cleanlab!

“We used Cleanlab to quickly validate one of our classifier models’ predictions for a dataset. This is typically a very time-consuming task since we would have to check thousands of examples by hand. However, since Cleanlab helped us identify the data points that were most likely to have label errors, we only had to inspect an eighth of our dataset to see that our model was problematic. We later realized that this was due to a post-processing error in the dataset — something that would otherwise have taken a much longer time to notice.”
- Seah Bei Ying, Data Analyst, The Stakeholder Company

Anaconda, OpenTeams, NumPy, SciPy Founder Travis Oliphant uses Cleanlab

“We use this successfully at OpenTeams”
OpenTeams uses Cleanlab

Data Scientist @ Gojek — Travis Tang

“I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The [ResNet] model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data).”

Travis Tang. Cleanlab: Correct your data labels automatically and quickly. Towards AI, 2022.

Head of Data Science at Gavagai.io uses Cleanlab Studio for automated quality assurance of their in-house NLP annotation pipeline

Gavagai.io Testimonial

Head of Data Science at e2f finds a 5 percent model improvement using Cleanlab, calling it “breakthrough work”

e2f Testimonial
e2f Testimonial

Shands Hospital at the University of Florida uses Cleanlab for real-time AI monitoring of ICU patients

“we have developed pervasive sensing and data processing system which collects data from multiple modalities depth images, color RGB images, accelerometry, electromyography, sound pressure, and light levels in ICU for developing intelligent monitoring systems for continuous and granular acuity, delirium risk, pain, and mobility assessment”

“Our approach is based on the Cleanlab implementation of active learning for data annotation”

“Our datasets include over 18 million depth image frames and 22 million patient face image frames extracted from videos. It is not practical to annotate the entirety of these massive datasets. Active learning is an important machine learning technique that involves an iterative process to choose most informative data samples to be labeled”

“Another important aspect is the annotator quality, which can significantly impact the training effectiveness of the machine learning model.”

Head of AI Engineering @ harrison.ai — Suneeta Mall

“I demonstrate the use of cleanlab, a confident learning implementation, to easily find noise in the data.”

”Confident learning provides a solid foundation for analyzing a dataset of noisy or OOD samples — a technique that’s quite effective for multi-class approaches, with the evolving support for multi-label classification.”

Suneeta Mall. Are Label Errors Imperative? Is Confident Learning Useful? Towards Data Science, 2022.

Researcher in AI for manufacturing at AIMotion Institute (Bavaria)

“My Cleanlab Studio experience was very positive. Very surprised how fast and easy it was to get results. Most work was transforming metadata into CSV file. You have really great product here, formatting the data for upload is really the only work needed to analyze/improve any data.”

“You can take somebody who has no Computer Science background and they can have big impact where they previously could not play with and improve data directly.”

“Customer support experience was also great, all of my questions/issues were quickly resolved by Cleanlab engineering team.”

“I got significantly better results using Cleanlab Studio than the cleanlab open-source package, mainly because it’s so much easier to use.”
- Lukas Lodes

Berkeley Research Group Senior Data Scientist Karl Schliep on Cleanlab

“we’re looking into making Cleanlab a standard processing step whenever we get labels.”
- Karl Schliep, Senior Data Scientist
Berkeley Research Group

Andrew Ng on deeplearning.ai — recognizes Cleanlab founders

Andrew Ng on deeplearning.ai — recognizes Cleanlab founders

Delivery Hero Data Scientist Hagop Dippel - Cleanlab “Works like a charm”

“Works like a charm”
Delivery Hero testimonial

Rubrix software library depends on Cleanlab to find label errors in text data

Rubrix testimonial

Amazon AWS Principal Solutions Architect Cher Simon & Chief Evangelist Jeff Barr publish textbook that features Cleanlab in hands on exercises

”Manually inspecting and fixing potential label errors can be time-consuming. We can train a better model using Cleanlab to filter noisy data.”

This textbook walks through hands on coding exercise with Cleanlab

Cleanlab used to win 4th place (out of 1165 teams) in Kaggle competition: Google - Isolated Sign Language Recognition (with $100k prize)

CleanLab was used to remove approximately 5,000 scenes that were considered noise.

I did some experiments, including some that weren't included in the final submission. The following two points can be made from this evaluation:
- CleanLab is effective
- Large effect of ensembling

Sage AI builds MLOps Pipelines to automatically relabel data using Cleanlab and DVC

Cleaning Data Labels — A Problem for Today and Tomorrow
[…]
since we don’t have an a priori gold label set to evaluate our confidence scores, we’ll use Cleanlab’s processing of the predicted probabilities from cross validation segments to arrive at a reasonable approximation for a set of gold labels. Thus, we can reduce the downstream burden of human evaluation by finding and relabeling the worst performers automatically.
[…]
Our pipeline and Cleanlab’s algorithm detected between 1,354 and 1,993 label issues (depending on the classifier used) which were then relabeled, or moved to an unknown category for further inspection.

Scale AI ML Engineer Will Levine on Cleanlab

“theoretically justified” and “work in practice”
Scale AI ML Engineer testimonial

VAST-OSINT CTO uses Cleanlab to build toxic-language detection model: “results look amazing”

VAST-OSINT CTO uses Cleanlab to build toxic-language detection model

Recognai Co-founder: Cleanlab is a ‘simple trick’ for ML with noisy labels

Recognai o-founder testimonial

Explosion AI ML Engineer Vincent advocates Cleanlab to fix datasets

Explosion AI

Oracle PM Govind Nair shares model improvements with Cleanlab

Oracle PM Govind Nair shares model improvements with Cleanlab

OpenMined Engineer Madava Jay — “only a cleanlab away from SOTA”

OpenMined Engineer testimonial

Red Hat Principal Software Engineer, Manikandan Sivanesan, uses Cleanlab to improve disaster model accuracy from 79% —> 85%

Red Hat Principal Software Engineer's Testimonial

ML Engineer at LifeOmic on the usefulness of Cleanlab

ML Engineer at LifeOmic on the usefulness of Cleanlab
Source: MLOps slack community (comment by: Evan Peterson, Machine Learning Engineer @LifeOmic)

Ping An Insurance (China) uses Cleanlab to find 10% noise in their data, remove bad data, and train their e-commerce product classifier

Ping An Insurance (China) uses Cleanlab

Github user of Cleanlab open-source

“Pure black magic”
Github user of Cleanlab open-source

Renumics uses Cleanlab and CleanVision for Data-centric AI playbook (useful data curation workflows on unstructured data)

Renumics uses Cleanlab and CleanVision

Brandon Rohrer, Principal Data Scientist at iRobot:

Brandon Rohrer, Principal Data Scientist at iRobot

Machine Learning Engineer @ BinIt — George Pearse

Recommends Cleanlab as a good way “to get started” with dataset cleaning to build a Data Engine/Flywheel.
George Pearse. Data Engine Design. Medium, 2022.

Kaggle Competitor “Sky walker” - 28th place (out of 757 teams) in Kaggle competition on Single Cell Classification using Cleanlab:

Engineer at Appen Todd Cook uses Cleanlab to create ML-You-Can-Use Wikidata Occupations labeled dataset.

Petco NLP Researcher and Engineer Xiaoyao Xi (NLP Researcher & Engineer) writes blogpost on Cleanlab technology

Tobias Sterbak (Consulting Data Scientist / Machine Learning Engineer) blogpost on finding label errors in NLP with Cleanlab

Calmcode.io tutorial on label errors features Cleanlab as the go-to solution

AMBL blog: Detect images with Noisy labels with Cleanlab

It takes time and effort to check each image and manually remove the noise image, but it seems to be easy by using cleanlab. I think it is also convenient that it can be used regardless of the framework

Translated quotes from: https://colors.ambl.co.jp/cleanlab-image-noisy-label/

Canadian government begins using Cleanlab for census data

Canadian government  begins using Cleanlab for census data

Ludwig-Maximilians-Universität München student uses Cleanlab for PhD

“Cleanlab is immensely helpful for my work. Thank you for that. Love the story behind the company and your work! Keep going with this great tool!”
- Dietrich Trautmann (while completing final year of his phD)
Ludwig-Maximilians-Universität München student uses Cleanlab for PhD

Tencent (Tencent Jarvis Lab) uses Confident Learning for MTCL

Tencent (Tencent Jarvis Lab) uses Confident Learning for MTCL

Berkeley Research Group on using Cleanlab Studio: “loved the ease and expedient delivery of results”

Computer Vision Engineer at Imagem uses CleanVision to improve the quality of a Radar imagery dataset

Translated quotes:

“CleanVision helped me improve the quality of my image data and, as a result, the accuracy of my model."

"This tool has proven to be invaluable to me, it is helping me to improve the data quality of computer vision projects, allowing us to effectively address a variety of common issues in our imagery dataset."
CleanVision helped me improve the quality of my image data and, as a result, the accuracy of my model.

Twitter user wannabemonk thoughts on CleanVision library

I collected custom image data from the internet for one of my pet projects. When i went through the data, i saw a lot of duplicate images. Initially i was deleting them all manually (not fun at all). This library was a game changer. Just one function and everything is done.

Participant of Data-Centric AI course shares impact of course

Participant of Data-Centric AI course shares impact of course

ByteDance Computer Vision Engineer uses Cleanlab to deliver high quality video tagging models

”At TikTok, I deploy models for video tagging at an enormous scale. My expertise lies in Large-scale ML Ops operations. I've witnessed the transformative impact of enhancing data quality, often overshadowed by flashier methods. At TikTok, I actively utilize Cleanlab to swiftly identify incorrect annotations, consistently delivering high-quality models on schedule.”
- Computer Vision Engineer, TikTok Video Understanding Team

❤️ Please let us know how you are using Cleanlab via email: team@cleanlab.ai ❤️