⚪️🟠️ Edge#120: How to Leverage Open-Source Data Labeling for your Business
Real World Use Cases
We go on with our new format which is often requested by our readers:
✔️ Real-World Use Cases
In these additional issues of TheSequence, we will discuss how different machine learning (ML) concepts are applied in real-world solutions implemented by enterprises and startups. The idea is that, by presenting real-world examples of ML implementations, we can help you better understand how different ML technologies can be practically applied in specific scenarios. Send us your feedback!
🏷 Open-Source Data Labeling with Flexible Customization in Real World
In Edge #119, we talked about the challenges that data science teams encounter when performing data labeling, especially when determining whether to build, buy, or download an open-source tool to use. The “build” vs “buy” dilemma gets easier to solve when the solutions available include enhanced customization options, providing the best of both worlds.
The goal of any data labeling process is to produce high-quality data. So, it follows that the easier it is to craft a data labeling solution that fits your machine learning project use case, the better. In today’s Edge, we have asked our partner Label Studio, an open-source data labeling software with flexible customization, to help us with some real-world use cases. We couldn’t name some of the companies, but the cases are interesting, nonetheless.
1️⃣️️️ Use Case: Perform object detection and image classification
Initial setting and a problem to solve: An insurance company that uses images to assess how best to underwrite home insurance policies might struggle to train a machine learning model for the specific types of situations that it encounters.
With hyper-specific use cases and expert in-house annotators, they can label images using both segmentations to identify objects that might indicate added risk for a policy and classify the images according to the type of insurance category that might be appropriate to recommend.
Using Label Studio, they selected the existing Object Detection with Bounding Boxes template and then modified it to include image classification options to suit their personalized use case and accelerate labeling.
It is straightforward to customize the labeling interface using XML-like tags, choosing between labeling regions, or classifying tasks with choices, or even ranking or rating the content of an image or text passage.
2️⃣ Use Case: Correct predictions while labeling
Initial setting and a problem to solve: Bombora, an internet marketing company, built a Natural Language Processing (NLP) model to process customer sentiment on the web about companies and organizations. When they wanted to improve that model, they did not want to start from scratch labeling data for a new challenger model. Instead, they wanted to assess the performance of their existing model by reviewing and correcting predictions as part of the annotation process, then use those corrected predictions for model training.
Using Label Studio, Bombora was able to display predictions in the labeling interface to accelerate labeling and allow their annotators to focus on validating or correcting the lowest-confidence predictions. The flexibility of the labeling interface means that you can improve model quality quickly and easily.
It is also possible to perform more complex and advanced labeling tasks by identifying relationships between specific labeled regions and building elaborate nested labeling options with options that control whether or not a choice is visible to select, taxonomies of labels, and filtering options for large numbers of labels.
3️⃣ Use Case: Transcribe ancient manuscripts
Initial setting and a problem to solve: Some folks wanting to give back to their community are working to transcribe ancient manuscripts. Faced with hundreds of documents, optical character recognition was the best way to transcribe the contents. But given the subject matter, they would need to build their own machine learning model to recognize the text in the documents and transcribe it.
Other tools were difficult to integrate with their specialized needs for native machine learning model integration for predictions. Label Studio made that straightforward, saving time by drawing bounding boxes around regions of text and manually transcribing the text. It is also possible to double-click bounding box creation and hotkeys to accelerate more tedious labeling tasks.
4️⃣ Use Case: Track online discussions
Initial setting and a problem to solve: A researcher wanting to track the discussions about new technologies on social media to inform purchasing decisions needed a tool that would perform different types of NLP on HTML-based content.
With a limited budget and the possibility of expanding into image-based labeling in the future, the researcher opted to use Label Studio due to its support of common NLP tasks such as named entity recognition on HTML and text, as well as image classification tasks that they might need to perform in the future. Because Label Studio supports a lot of data types including audio, images, plain text, HTML, video, and time series, you have the flexibility to label whatever data is needed for your machine learning project.
5️⃣ Use Case: Improve automated transcripts
Initial setting and a problem to solve: A labeling outsourcing company wanted to expand the services that it offers its clients. The company needed a tool that would support the existing services like image segmentation and image classification and allow for future expansion into services like automatic transcribing of customer support calls and interviews.
With support for a multitude of projects and data types, Label Studio was a natural choice for the company, allowing it to support its existing business and plan to expand without needing to change the tool it was using to provide labeling services.
No matter the type of labeling you are performing, open-source solutions are easier to integrate. You can customize Label Studio to fit your needs and simplify your overall machine learning workflow.