👩🏽‍🔧👨🏻‍🔧 Continuous Data Improvements and ML Performance

Feb 28, 2021

📝 Editorial

Better data leads to better models is one of the well-established axioms in the machine learning (ML) space. However, correlating the performance of ML models with the composition of training and test datasets is far from trivial. Small inconsistencies and edge cases in datasets regularly alter the performance of ML models. The more complex the models, the more difficult that process becomes. As a result, data science teams spend countless hours trying to use interpretability tools to discover clues that can improve the performance of models when, quite often, the best clues are in the datasets.

Systematic improvement to training and test datasets is one of the most effective and often ignored elements for improving the performance of models. Most ML interpretability stacks are not great at sophisticated data exploration and most data labeling and exploration platforms lack ML interpretability techniques. These two spaces are likely to collide in the near future, producing a new generation of platforms for the continuous improvement of ML models. This week, ML startup Aquarium raised a seed round for what seems to be a very intriguing platform that combines data exploration and interpretability. In modern ML pipelines, continuous dataset improvements and model interpretability should evolve together to enable better model performance.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#67: the dissection of Neural Architecture Search (NAS); Microsoft’s Project Petridish, and Microsoft’s Archai – an open-source NAS Framework.

Edge#68: deep dive into how to effectively virtualize and orchestrate AI workloads on Run:AI’s platform.

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Self-Supervised Pretraining for Data Augmentation

Microsoft Research published a paper exploring the role self-supervised pretraining methods play in data augmentation in image datasets ->read more on Microsoft Research blog

Self-Supervised Policy Adaptation

Berkeley AI Research Lab (BAIR) published an insightful blog post proposing a self-supervised method to continue training policies in reinforcement learning models after deployment ->read more on the BAIR team's blog

The ML Behind Cinematic Photos

Google Research published a detailed blog post discussing the machine learning techniques to enable cinematic photos in the Google Photos service ->read more on Google Research blog

🤖 Cool AI Tech Releases

Google’s Project Alto

Google open-sourced project Alto, a cool toolkit designed to teach developers how they can incorporate machine learning tasks in their next hardware project ->read more on Project Alto’s GitHub

W3C ML Chapter

The W3C consortium proposed a new machine learning chapter to enable inference tasks in the browser ->read more in the W3C documents

💬 Useful Tweet

💸 Money in AI

For Devs, ML, and data engineers:

AIOps and IT infrastructure monitoring startup ScienceLogic raised $105 million. Its solution sees everything across multi-cloud and distributed architectures, contextualizes data through relationship mapping, and acts on this insight through integration and automation.
A scale-out graph processing, AI, and analytics company Katana Graph raised $28.5 million series A. Using high-performance graph algorithms, it extracts actionable insights from massive unstructured data sets, helping people and businesses unleash the potential of their large-scale irregular and unstructured data.
Enterprise data engineering platform Prophecy.io raised $6.75 million. It allows data engineers to maintain separate environments for tests, integration, and production. For the enterprises, it modernizes their data engineering with open runtimes and hybrid cloud deployments.
Data management startup Aquarium raised $2.6 million in a seed round. Aquarium helps improve model performance by improving the datasets that it’s trained on. It makes it easy to find labeling errors and model failures, then helps curate the dataset to fix these problems and optimize the model performance.
Cloud observability startup OpsCruise emerged from stealth, announcing a $5 million round of funding. A deep understanding of Kubernetes, coupled with its unique contextual AI and ML-based behavior profiling empowers teams to predict performance degradations and instantly surface their cause.

AI implementation

Cybersecurity startup PerimeterX raised $57 million in growth capital. Its platform leverages over 120 machine learning algorithms and 165 models to profile bot behavior and client-side code activity in real-time to identify and defend against a wide spectrum of threats.
AI transcription startup Otter.ai raised $50 million in Series B. It develops speech-to-text transcription and translation applications using AI and ML.
Cybersecurity startup Armorblox raised $30 million in Series B. Using natural language understanding (NLU) and other AI tools, it analyzes the identity, behavior, and language in emails, protecting against business email compromise and targeted phishing attacks.
Customer data analytics company Blueshift raised $30 million in Series C. Analyzing unified customer data with its proprietary algorithms, it extracts insights about customers’ behavior and uses it to instantly trigger the next best action across channels.
AI-powered SaaS investment management platform FundGuard raised a $12 million funding round. With AI, it automates work patterns, helps identify unstructured financial and operational anomalies, and creates a transparent and intelligent investment administration enterprise.
Insurance tech startup Zelros raised $11 million in Series A. It employs AI to provide advisors and policyholders with advice on choosing the right coverage for their needs.
Medical management solution Medisafe raised $30 million in Series C. With AI, it creates a digital drug companion that is always on guard with your medicine routine.
AI-powered diabetes management startup January AI raised $8.8 million. It combines real-time glucose monitoring and personalized machine learning to accurately track and predict the effects of diet, exercise, and sleep on health.

TheSequence