🎙 Hyun Kim/CEO of Superb AI About Challenges with Data Labeling in Computer Vision
It’s so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you like it. No subscription is needed.
👤 Quick bio / Hyun Kim
Tell us a bit about yourself. Your background, current role and how did you get started in machine learning?
Hyun Kim (HK): I am the co-founder and CEO of Superb AI, an ML DataOps platform that helps computer vision teams automate and manage the full data pipeline: from ingestion and labeling to data quality assessment and delivery. I initially studied Biomedical Engineering and Electrical Engineering at Duke but shifted from genetic engineering to robotics and deep learning. I then pursued a Ph.D. in computer science at Duke with a focus on Robotics and Deep Learning but ended up taking leave to further immerse myself in the world of AI R&D at a corporate research lab. During this time, I started to experience the bottlenecks and obstacles that many companies still face to this day: data labeling and management were very manual, and the available solutions were nowhere near sufficient.
🛠 ML Work
Could you tell us about the vision and inspiration for the Superb AI Suite platform? What makes data preparation such a challenging problem in computer vision tasks?
HK: When you look at some of the amazing AI technologies that are being developed in research and compare them to the current array of public applications of AI, one starts to wonder what can be done to accelerate the adoption of cutting edge AI in current real-world environments. That is, in essence, the vision and mission of Superb AI. We still feel, to this day, that a big reason why deep tech has not been able to permeate itself faster is inherently due to the cumbersome nature of data operations. Data is what will continue to drive AI development and deployment, and there is still a lot of reticence on how to build highly efficient data pipelines, from start to finish. We aim to change that for the better.
Superb AI specializes in computer vision which is based on two fundamental types of datasets: images and videos. What are the fundamental differences and challenges between automated data labeling techniques for image and video datasets?
HK: First, I’d like to preface this with a quick comparison between unstructured and structured data. For structured or non-computer vision use cases, it’s relatively easier to automate labeling. This can come in the form of hand-designing a set of rules or heuristics that can ultimately be used to define labeling functions and/or “weak classifiers” for auto-labeling. We have seen some amazing developments in this space through programmatic labeling, for example.
However, this approach can not be applied to computer vision as it’s impractical to hand-design rules for object detection. This is due to the massive amounts of visual variance within the same object class that cannot be covered via a set of rules.
Also, the definition/data labeling specifications are different for every organization, even for the same object class. For example, even in a very well known use-case such as autonomous driving, different companies will exhibit organizational nuances regarding data labeling and heuristics. Some teams will want to include side mirrors, some organizations are building use cases that need to include open trunks, and so on. These visual variances within what is perceived to be the same object class can not be covered with a simple set of rules. The environment of the real world constantly changes, as should the dependencies and requirements of these detection models.
Video adds are another layer of challenges because each object needs to be tracked across frames. We have seen some naive approaches to this challenge, such as linear interpolation. However, because objects typically do not move at constant speeds, interpolation does not satisfy the requirements of most use cases. This often leads to more time spent QA’ing errors from this approach. Rather than using linear interpolation, our video labeling AI automatically tracks objects across frames. Especially in circumstances where objects appear and disappear between frames, our AI is able to track those objects, and our platform makes it almost too easy to track those instances as one single object. We also provide the ability to customize our AI based on organizational requirements. We’ve seen cases where teams need the ability to define temporal variables, such as assigning a max number of frames after an object disappears for the platform to assign that object a new ID.
Outside of just images and video, we also see a lot of unique challenges with other popular data types, such as 3D point cloud. Of course, we will be releasing automation tools that will help clients build better 3D detection models faster, but like everything else, we are assessing the challenges that 3D point cloud brings and approaching it in a specific way to this use case.
The data labeling space has expanded immensely in the last couple years. What will be Superb AI core differentiators?
HK: Our core focus has and will continue to be automation. We also think that the term AI or automation, even in our industry, has been overused, and there is just a lot of skepticism in the market due to this. It’s a shame because intelligent automation, we feel, is the key for the computer vision industry in taking that next leap. So market education, or re-education I should say, is a tangential challenge we face today.
Our automation journey started with labeling and is currently sitting as our flagship product. The core tech behind our customizable auto-labels utilizes few-shot and transfer learning, allowing teams to take both simple or complex use cases and train a very capable auto-label using just 100 data points or images. We also layer on uncertainty estimation AI to help teams quickly identify labels for QA because an auto-label is only as good as its QA process.
We released this about a year ago, and as expected, there has been some great feedback on how labeling automation has helped address cold start problems, edge case labeling, economies of scale, you name it.
But we are entering the next phase of automation, which is around data quality and curation. It’s not enough anymore to just label a large dataset and brute force it into model training. Teams require the practice of not just ensuring data quality but which labels are most beneficial for their model, how to find more of these high-quality and relevant labels, and most importantly, how to iterate faster. This piece of data preparation is going to be a critical pillar for any DataOps program for computer vision teams and, yes, we will be introducing some exciting new products in this realm that uses cutting-edge automation technology.
Regarding data prep/ops, what is one critical aspect that teams are currently overlooking that will eventually be common practice in the future?
HK: We think data curation is massively underrated and something that many teams do not currently do, and if they do, not very efficiently. There are a couple of reasons for this, with the main one being that data curation is a very time-consuming step. Curation, at its essence, should enable ML teams to understand the collected data, identify important subsets and edge cases, and curate custom training datasets to put back into their models. This is oversimplified, but the importance of curation can not be emphasized enough. A sophisticated curation workflow will, in my opinion, guide us to an era of less dependence on large volumes of data and shift focus to the quality of data. This is where the power of iteration will be unlocked.
In the case of automated data labeling, there are approaches that use symbolic representations (rules) vs. more sophisticated ML methods. In your experience, is there a clear delineation between symbolic and neural network approaches to data labeling or most scenarios can benefit from a combination of both?
HK: Within computer vision, if one were to use symbolic representations for automated data labeling, a few prerequisites need to be addressed, such as heavy data pre-processing and leaning on ML/AI methods. Basically, you won’t be able to encode rules such as “if pixel color is X, then classify as Y” as a base.
However, deep learning models can create powerful features and embeddings that can be used as metadata to which teams can apply rules and heuristics. So, in short, yes, rule-based or heuristic-based auto-labeling for computer vision may work, but a foundational piece of deep ML will be required. These are concepts our R&D has been exploring for quite some time, and we should be able to determine whether or not productizing this concept will become a reality.
Computer vision has become one of the most relevant domains for the applications of techniques such as self-supervised learning or pretrained models which can learn from large volumes of unlabeled data. How do you see the balance between these new types of techniques and traditional supervised learning models that relied heavily on labeled datasets?
HK: I love this question because we have been using self-supervised learning in our stack for quite some time. In research and the real world, it has been evident that models trained on self-supervised learning are more robust and perform better than those pre-trained only with labeled data.
In addition, being able to intelligently handle large amounts of unlabeled data will become very important and mission-critical, especially as teams continue to put an emphasis on economies of scale when building and deploying computer vision systems. More specifically, allowing teams to quickly and accurately understand gaps in datasets, identifying what to collect more of, what is causing poor model performance, how to bridge the gap between model and data observability; these are all things that we think are critical components to curating large volumes of unlabeled data and core pieces to our upcoming Curation product.
Subscribing you support our mission to simplify AI education, one newsletter at a time. You can also give TheSequence as a gift.
💥 Miscellaneous – a set of rapid-fire questions
Favorite math paradox?
HK: Simpson’s Paradox. I remember being baffled when I first learned about it as a high school student in my statistics class. It also reminds me to be very careful and unbiased when interpreting data.
Is the Turing Test still relevant? Any clever alternatives ?
HK: The Turing Test was a simple and elegant way for us to conceptualize AI and build a human-like conversational AI. And I think we’re close to passing the Turing Test with state-of-the-art works like GPT-3. But practically, AGI should be able to do much more than just trick a human evaluator – it should be able to do everything.
I had to do a bit of research, but there are clever alternatives like the Wozniak Test, where a robot makes coffee in a stranger’s home. It’s a funny test, but a true AGI should be able to pass a mixture of all these alternative tests!
What book you would recommend to aspiring data scientists?
HK: Assuming the person already has the math and statistics background, I’d recommend Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville. I’d also recommend PRML (Pattern Recognition and Machine Learning) by Christopher Bishop, but I’ve seen people (including myself, to be honest!) find Deep Learning focused books more interesting than those on classical machine learning.
Is P equals NP?
HK: Probably not. But I think we’re getting better at approximating NP problems with deep learning, so how’s P ≈ NP? :)