CleanlabCleanlab

Blog

Company updates, tutorials, research, and more!

CleanVision: Audit your Image Data for better Computer Vision

CleanVision: Audit your Image Data for better Computer Vision

03/22/2023

Introducing an open-source Python package to automatically identify common issues in image datasets.

  • Sanjana GargSanjana Garg
  • Ulyana TkachenkoUlyana Tkachenko
  • Yiming ChenYiming Chen
  • Elías SnorrasonElías Snorrason
  • Jonas MuellerJonas Mueller
ActiveLab: Active Learning with Data Re-Labeling

ActiveLab: Active Learning with Data Re-Labeling

03/02/2023

ActiveLab helps you optimally choose which data to (re)label, lowering the cost to train an accurate ML model.

  • Hui Wen GohHui Wen Goh
  • Jonas MuellerJonas Mueller
Handling Mislabeled Tabular Data to Improve Your XGBoost Model

Handling Mislabeled Tabular Data to Improve Your XGBoost Model

02/06/2023

Learn how to reduce prediction errors by 70% using data-centric techniques with cleanlab.

  • Chris MauckChris Mauck
CROWDLAB: Simple and effective algorithms to handle data labeled by multiple annotators

CROWDLAB: Simple and effective algorithms to handle data labeled by multiple annotators

10/05/2022

Understanding cleanlab's new methods for multi-annotator data and what makes them effective.

  • Hui Wen GohHui Wen Goh
  • Ulyana TkachenkoUlyana Tkachenko
  • Jonas MuellerJonas Mueller
Cleanlab: The History, Present, and Future

Cleanlab: The History, Present, and Future

04/01/2022

How an MIT grad student project became a company with tech used by Google, Amazon, Tesla, Uber, Facebook, and companies around the world.

  • Curtis NorthcuttCurtis Northcutt
Training Transformer Networks in Scikit-Learn?!

Training Transformer Networks in Scikit-Learn?!

03/08/2023

Learn how to easily make any Tensorflow/Keras model compatible with scikit-learn.

  • Hui Wen GohHui Wen Goh
cleanlab 2.3 adds support for Active Learning, Tensorflow/Keras models made sklearn-compatible, and highly scalable Label Error Detection
Automatic Error Detection for Image/Text Tagging and Multi-label Datasets

Automatic Error Detection for Image/Text Tagging and Multi-label Datasets

11/29/2022

Introducing new data quality algorithms for multi-label classification in cleanlab v2.2

  • Aditya ThyagarajanAditya Thyagarajan
  • Elías SnorrasonElías Snorrason
  • Curtis NorthcuttCurtis Northcutt
  • Jonas MuellerJonas Mueller
Out-of-Distribution Detection via Embeddings or Predictions

Out-of-Distribution Detection via Embeddings or Predictions

10/19/2022

Introducing cleanlab's dual new methods to detect outliers and how they perform on real image data.

  • Ulyana TkachenkoUlyana Tkachenko
  • Jonas MuellerJonas Mueller
A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier

A Simple Adjustment Improves Out-of-Distribution Detection for Any Classifier

10/19/2022

Exploring new ways to identify outliers based on probabilistic predictions from a trained classifier.

  • Ulyana TkachenkoUlyana Tkachenko
  • Jonas MuellerJonas Mueller
  • Curtis NorthcuttCurtis Northcutt
Detecting Label Errors in Entity Recognition Data

Detecting Label Errors in Entity Recognition Data

10/12/2022

Understanding cleanlab's new methods for text-based token classification tasks.

  • Wei-Chen (Eric) WangWei-Chen (Eric) Wang
  • Elías SnorrasonElías Snorrason
  • Jonas MuellerJonas Mueller
cleanlab 2.1 adds Multi-Annotator Analysis and Outlier Detection: toward a broad framework for Data-Centric AI

cleanlab 2.1 adds Multi-Annotator Analysis and Outlier Detection: toward a broad framework for Data-Centric AI

09/21/2022

Highlighting new features available in cleanlab 2.1

  • Curtis NorthcuttCurtis Northcutt
  • Jonas MuellerJonas Mueller
How we built Cleanlab Vizzy

How we built Cleanlab Vizzy

08/17/2022

How we built an in-browser visualization of Cleanlab's Confident Learning algorithm.

  • Caleb ChiamCaleb Chiam
  • Luke MainwaringLuke Mainwaring
  • Yiming ChenYiming Chen
Handling Label Errors in Text Classification Datasets

Handling Label Errors in Text Classification Datasets

05/10/2022

Learn how to find label issues in text datasets and improve NLP models.

  • Wei Jing LokWei Jing Lok
  • Jonas MuellerJonas Mueller
  • Hui Wen GohHui Wen Goh
Finding Label Issues in Audio Classification Datasets

Finding Label Issues in Audio Classification Datasets

04/27/2022

Learn how to find label issues in any audio classification dataset.

  • Johnson KuanJohnson Kuan
  • Jonas MuellerJonas Mueller
  • Anish AthalyeAnish Athalye
Finding Label Issues in Image Classification Datasets

Finding Label Issues in Image Classification Datasets

04/21/2022

Learn how to automatically find label issues in any image classification dataset.

  • Wei Jing LokWei Jing Lok
  • Jonas MuellerJonas Mueller
cleanlab 2.0: Automatically Find Errors in ML Datasets

cleanlab 2.0: Automatically Find Errors in ML Datasets

04/21/2022

Announcing cleanlab 2.0: an open-source framework for machine learning and analytics with messy, real-world data.

  • Curtis NorthcuttCurtis Northcutt
  • Jonas MuellerJonas Mueller
  • Anish AthalyeAnish Athalye