In TheSequence Guest Post, our partners explain in detail what machine learning (ML) challenges they help deal with. In this post, Toloka’s team offers you an insightful overview of the data labeling use cases in e-commerce.
AI and e-commerce are now inextricably intertwined – unbeknownst to most, the latter can no longer maintain a competitive edge without the former. According to Statista, this is so much the case that over 70% of all surveyed e-commerce business executives in Europe and North America believe AI to be the main “can’t-do-without” mechanism for all modern online retail.
Data labeling: use cases
A lot of AI projects are not possible without relevant, accurately labeled data. Work on your ML model as much as you like, but without the right data, there’s only so much your model can do. Because of this, data labeling has become the bedrock of AI, and thus also e-commerce.
Crowdsourcing, considered the quickest and most cost-effective method of data labeling that is also scalable, is how a lot of the data is labeled today. Which is in part what makes the AI-ecommerce alliance so vibrant. Knowing that, let’s see in more detail what exactly data labeling is doing to improve e-commerce through these recent use cases:
AliExpress and localization
AliExpress is one of the world’s e-commerce leaders. However, serious localization problems arose when AliExpress attempted to translate its platform into Russian, which resulted in numerous inaccurate product descriptions.
With a giant catalog, the company needed a quick and reliable solution. Toloka managed to offer an innovative crowd-based solution – instead of using MT or CAT combined with human-validated translations as is normally the case, Toloka’s crowd performers were asked to provide their own translations whenever applicable. A newly suggested version became a fixed multiple-choice option that the next crowd performer had to either choose and hence approve or replace with their own version. The cycle continued until the performers as a group had no further improvements to suggest. The end result proved to be affordable and, above all, very effective.
Ozon and search relevance
Ozon’s inventory rivals that of AliExpress and offers more than 9 million SKUs across 24 different categories.
The task Toloka was given had to do with evaluating the quality of the company’s search and determining the most effective product ranking model for the cataloged items.
The performers had to rate search engine results from best to worst in order to identify filter issues on the website and provide a fine-grained analysis from the perspective of UI and UX. As a result, Ozon’s search engine is more powerful today than it’s ever been, notes Ozon’s tech department.
Yandex.Market, recommender system sees 6x reduction in errors
Yandex.Market is an e-commerce marketplace with over one million items available in its product catalog. When the company needed to tune its product recommendation engine, they developed a data labeling pipeline in Toloka to get the data they needed.
An effective recommender system needs vast amounts of labeled data to support its ML model. Yandex.Market started out by using automated solutions to train their recommendation model, but the algorithm was not performing well enough. They developed a new strategy using the Toloka platform:
Label products with matching accessories and related items.
Train a gradient boosting model to apply filters based on the labeled dataset.
Measure precision and recall and then retrain the model until satisfied.
After integrating Toloka, the accuracy of the Yandex.Market recommender system went from a modest 40% to 90% overall, while recall rose from 20% to a solid 74% for accessories and 90% for related items.
Recommendations stay up to date — now that the pipeline is in place, it is quick and easy to get new labels and retrain the system whenever a new category of products is introduced. The marketplace is currently looking into using Toloka to boost other aspects of their business.
Neatsy and content collection
Neatsy is a relatively new but already popular app that e-commerce marketplaces and D2C manufacturers use to offer their customers a handy option of 3D-scanning their feet to find the best fitting shoes. To make this feature a tangible reality, the app’s 3D scanner that works as a type of neural network needed over 50,000 labeled images in order to train itself to do a better job of separating human feet from the floor.
Toloka was called to action again, and 3 weeks later the job was done. Neatsy confirmed that the app’s time to market was accelerated significantly, while its 3D scanner became 12% more accurate. The app is now trusted by famous brands like Nike, Reebok, Adidas, Puma, and Vans among others.
Conclusion
Looking at these examples, it becomes clear that AI will continue to play a huge role in the realm of online retail at an ever-increasing speed. As data-labeling techniques, namely crowdsourcing, will continue to become more and more time- and cost-effective, scalable, and capable of providing high-quality results, a great many more opportunities for business will fall into our lap. Both the AI and e-commerce markets will expand in parallel, bouncing off of each other to the point when e-commerce will become inseparable from the machine intellect at its core. So it’s becoming increasingly important to stay ahead of the AI curve with high-performance data to grow any e-commerce business.
Learn more about intelligent data solutions to transform e-commerce from Toloka.
"Of course, no AI is possible without relevant, accurately labeled data."
That statement is incorrect. Self-supervised learning, like what GPT-3 model uses, does not need labeled data.