🚀 The Emerging Market of Data Labeling
Weekly newsletter that discusses impactful ML research papers, cool tech releases, the money in AI, and real-life implementations
📝 Editorial
Metadata management has historically been one of the most boring markets in enterprise software. And it really is, until machine learning comes along. Supervised learning models need labeled datasets for training, and those are expensive to create and maintain. Suddenly, the boring metadata management space found a new purpose and as a result, a new generation of startups emerged trying to solve the problems of data labeling for machine learning models. Venture capital dollars have been flowing into the data labeling space, making it one of the few areas of the machine learning market in which startups have a chance to compete with technology giants like Google, Amazon or Microsoft.
Data labeling in machine learning is one of those things that is easy to trivialize until you need to do it at scale. Then the challenges are everywhere. Labeling text datasets is different from labeling images, and that is different from labeling video or audio. Furthermore, the processes for inspecting datasets with millions of records and attaching the appropriate labels have many scaling issues. Finally, data labeling is rarely an isolated process and requires collaboration between multiple teams. Those challenges require a new type of solution and we are seeing exciting platforms such as Labelbox, Snorkel.ai, and Scale AI drive innovation into the space. One thing is for certain, data labeling is becoming a standalone and highly competitive market in the machine learning space.
🗓 Next week in TheSequence Edge
Aug 11, Edge#11: the concept of meta-learning; Berkeley AI Research Lab’s famous paper about an algorithm for meta-learning that is model-agnostic; deep dive into Comet.ml, which many people called the GitHub of machine learning.
Aug 13, Edge#12: the concept of model serving; a paper in which Google Research outlines the architecture of a serving pipeline for TensorFlow models; review MLflow, one of the most complete machine learning lifecycle management frameworks in the market.
To stay up to date and receive TheSequence Edge every Tuesday and Thursday, please consider joining our community. Till August 15 you can subscribe with a permanent 20% discount. Sunday edition of TheSequence Scope is always free.
Now, let’s review the most important developments in the AI industry this week.
🔎 ML Research
Advancing Reinforcement Learning in Gaming
Microsoft Research published three different papers detailing advancements in reinforcement learning for gaming scenarios ->read more on Microsoft Research blog
A Better Benchmark for AI Assistants
Researchers from ElementAI and Stanford University published a paper demonstrating that the market needs a better benchmark and methodology for language user interfaces ->read more in the research paper
Fooling Facial Recognition Systems
Researchers from McAfee published a paper proposing a variation of generative adversarial neural networks (GANs) known as CycleGAN that can fool a modern face-recognition algorithm into seeing someone who isn’t there ->read more on McAfee Research blog
🤖 Cool AI Tech Releases
DeText
LinkedIn open-sources DeText, a flexible framework for different natural language understanding tasks ->read more in that post from the LinkedIn engineering team
TransCoder
Facebook AI Research open-sources TransCoder, a framework that uses self-supervised learning to translate code between different programming languages ->read more on Facebook AI blog
MediaPipe Iris
Google open-sourced Media Pipe Iris, a new machine learning model for iris estimation, which is essential in many vision analysis applications ->read more on Google AI blog
💬 Useful Tweet

💸 Money in AI
Expert System (founded in 1989), veteran in natural language understanding (NLU) technologies, raised $29.4 million in funding. Their flagship software — Cogito Discover — leverages the NLU engine to identify the content of documents in different formats and make them available for analyses and automation.
Health tech startup Infermedica raised $10.25 million in Series A funding. They offer symptom triage and advice to patients based on doctors’ expertise enhanced by their own ML algorithms. They also integrate with chatbots, patient portals, and EHRs.
Big data analytics platform StreetLight Data raised $15 million in its Series D round. It uses smartphones as sensors to measure activity on all streets, applying its ML algorithms to figure out how people move through the cities; foot and bicycle traffic, the busiest time for transportation, etc.
Another big data analytics startup, Isima, raised $10 million in funding to launch a data convergence platform called BiOS. The company asserts its solution can reduce or even eliminate disparate databases while improving overall speed and reliability. Its rival Quantexa recently raised $64.7 million.
Noise-canceling tech startup Krisp raised $5 million in Series A funding. Its ML system is trained to understand what is and isn’t a human voice in streaming audio and remove the rest, making the sound clearer.
Blood diagnostics startup Sight Diagnostics raised $71 million in funding. It “digitizes” blood into over 1,000 high-resolution colored microscope images, using its own machine-vision based technology trained on half a petabyte of anonymized data from four years of clinical studies to analyze such blood scans.
Zencity, a data-driven platform for municipalities, has just raised $13.5 million. Its algorithms analyze aggregated feedback from local communities to identify key topics and trends, in order to understand what impacts a community.
Deep learning tech startup Syntiant raised $35 million. It provides hardware that merges machine learning with semiconductor design for always-on voice applications. For better understanding, Syntiant creates the processors that are responsible for offering wake word, command word, event detection in your Alexa, and more.
If you find our newsletter useful, please consider supporting our efforts. Subscribe or make it a gift for those who can benefit from it. It’s the last week when you can get it with a permanent 20% discount.
TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, exploration of new ML frameworks, and platforms. It also keeps you up to date with the news, trends, and technology developments in the AI field.
5 minutes of your time, 3 times a week– you will steadily become knowledgeable about everything happening in the AI space.
