How Top Organizations
Use Cleanlab Software
Technology
Google used Cleanlab to find and fix label errors in millions of speech samples across different languages, to quantify annotator accuracy, and provide clean data for training speech models.
“Cleanlab is well-designed, scalable and theoretically grounded: it accurately finds data errors, even on well-known and established datasets. After using it for a successful pilot project at Google, Cleanlab is now one of my go-to libraries for dataset cleanup.”
Amazon AWS Principal Solutions Architect Cher Simon & Chief Evangelist Jeff Barr publish
that features Cleanlab in hands on exercises.Manually inspecting and fixing potential label errors can be time-consuming. We can train a better model using Cleanlab to filter noisy data.
Learn how Amazon also uses Cleanlab to improve Alexa
Financial Services
One of the largest financial institutions in the world, Banco Bilbao Vizcaya Argentaria, uses Cleanlab to
.“Cleanlab helped us reduce the uncertainty of noise in the tags. This process enabled us to train the model, update the training set, and optimize its performance. The goal was to reduce the number of labeled transactions and make the model more efficient, requiring less time and dedication. With the current model, we were able to improve accuracy by 28%, while reducing the number of labeled transactions required to train the model by more than 98%.”
Technology Consulting
Berkeley Research Group
using Cleanlab Studio.“We've started relying on Cleanlab to improve our ML and AI models at Berkeley Research Group LLC for over a month... I have to say, I'm impressed. Here's what we found:
- Increased model accuracy by 15%
- Improved explainability & addressed performance impediments
- Cut out training iterations by one-third
- Overall performance improvement for our Data Science team.”
Business Intelligence
The Stakeholder Company reduced time spent by 8x in their ML data workflow by using Cleanlab to order data by label quality.
“We used Cleanlab to quickly validate one of our classifier models’ predictions for a dataset. This is typically a very time-consuming task since we would have to check thousands of examples by hand. However, since Cleanlab helped us identify the data points that were most likely to have label errors, we only had to inspect an eighth of our dataset to see that our model was problematic. We later realized that this was due to a post-processing error in the dataset — something that would otherwise have taken a much longer time to notice.”
Healthcare
Shands Research Hospital at UF
to build datasets for real-time AI monitoring of ICU patients.We have developed pervasive sensing and data processing system which collects data from multiple modalities depth images, color RGB images, accelerometry, electromyography, sound pressure, and light levels in ICU for developing intelligent monitoring systems for continuous and granular acuity, delirium risk, pain, and mobility assessment.
Our approach is based on the Cleanlab implementation of active learning for data annotation
Our datasets include over 18 million depth image frames and 22 million patient face image frames extracted from videos. It is not practical to annotate the entirety of these massive datasets. Active learning is an important machine learning technique that involves an iterative process to choose most informative data samples to be labeled.
Another important aspect [of active learning] is the annotator quality, which can significantly impact the training effectiveness of the machine learning model.
Our approach is based on the Cleanlab implementation of active learning for data annotation
Our datasets include over 18 million depth image frames and 22 million patient face image frames extracted from videos. It is not practical to annotate the entirety of these massive datasets. Active learning is an important machine learning technique that involves an iterative process to choose most informative data samples to be labeled.
Another important aspect [of active learning] is the annotator quality, which can significantly impact the training effectiveness of the machine learning model.
Startups and Innovators
The CEO of OpenTeams and Founder of Anaconda
OpenTeams success using Cleanlab for data preparation.“I am excited by the cleanlab 2.0 project. We use this successfully at OpenTeams. I haven’t seen something this interesting in the space of data-preparation and labeling since snorkel.”
Gavagai
to improve text analytics for customer insights.“At Gavagai, we rely on labeled data to train our models, both publicly available datasets and data we have annotated ourselves. We know that the quality of the data is paramount when it comes to creating machine learning models that can produce business value for our customers.
Cleanlab Studio is a very effective solution to calm my nerves when it comes to label noise! The tool allows me to upload a dataset and obtain a ranked list of all the potential label issues in the data in just a few clicks. The label issues can then be assessed and fixed right away in the GUI.
Cleanlab should be a go-to tool in every ML practitioners toolbox!”
Cleanlab Studio is a very effective solution to calm my nerves when it comes to label noise! The tool allows me to upload a dataset and obtain a ranked list of all the potential label issues in the data in just a few clicks. The label issues can then be assessed and fixed right away in the GUI.
Cleanlab should be a go-to tool in every ML practitioners toolbox!”
AI company
uses Cleanlab to improve the quality of heterogeneous datasets across their cutting-edge computer vision platform.+
many more users who love to use Cleanlab ❤️