🤓😡👹 Open Source ML from Large Tech Incumbents: The Good, The Bad and the Ugly

Get to know what happened in the ML world this week

May 16, 2021

📝 Editorial

The traditional evolution of technology trends follows Clay Christensen's famous Innovator’s dilemma, in which the majority of innovation in a new trend comes from startups trying to disrupt incumbents. Machine learning is proving to challenge that thesis. In the machine learning space, large technology companies, such as Google, Microsoft, Facebook, Amazon, and Uber, have been at the forefront of research and open-source contributions. This shouldn’t come as a surprise given that these companies are faced with some of the largest and most interesting machine learning scenarios in the current market. As a result, some of their open-source contributions address machine learning requirements that are really hard to envision by mainstream enterprises and startups. From that perspective, many of the machine learning open-source frameworks produced by tech incumbents delivered a lot of value. But the picture is not so rosy when it comes to using them in real-world production systems.

Almost every week, this newsletter covers new open-source ML tools or frameworks incubated in large technology companies. While most of these frameworks deliver very tangible innovations to machine learning workflows, they are hardly ready for primetime in mainstream scenarios. Keep in mind that most of these open-source ML stacks were incubated by some of the most skillful data science teams in the world, which are sometimes disconnected from the realities of ML solutions in most organizations. Complex programming models, limited support or versioning, difficult integrations with mainstream stacks are some of the most common challenges that you will encounter trying to incorporate those capabilities in real-world machine learning scenarios. Furthermore, many of these projects get abandoned after their original team moves on to other efforts as part of their full-time job. Organizations, such as the Linux AI Foundation, have been very efficient in providing a continuous path for many of these projects, such as Uber’s Ludwig, Horovod or Pyro, Facebook’s ONNX, Google-Gojek’s Feast, Lyft Amundsen, and many others.

While it’s definitely seductive to try to incorporate sexy open-source machine learning frameworks created by large tech companies, you should be really thoughtful about your evaluation.

🔺🔻TheSequence Scope – our Sunday edition with the industry’s development overview – is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, please subscribe to TheSequence Edge 🔺🔻

🗓 Next week in TheSequence Edge:

Edge#89: we discuss what makes some feature representations better than others; explore Uber’s architecture to discover optimal features; overview three architectures powering feature stores at large; Airbnb, Pinterest, and DoorDash.

Edge#90: OpenAI Safety Gym – an environment to improve safety in RL Models

Now, let’s review the most important developments in the AI industry this week

🔎 ML Research

Deep Learning for Football

DeepMind published an amazing paper discussing computer vision, game theory and statistical techniques to advance decision making in football (soccer) ->read more on DeepMind blog

Redefining PCA

DeepMind also published a groundbreaking paper proposing a method to redefine the iconic PCA optimization method as a multi-agent game ->read more on DeepMind blog

Teaching AI to forget

Facebook AI Research (FAIR) published an intriguing paper expanding transformer memory architectures with selective forgetting->read more on FAIR blog

🤖 Cool AI Tech Releases

Greykite

LinkedIn open-sourced Greykite, a framework for time-series forecasting ->read more on LinkedIn blog

Feature Discovery at Uber

The Uber engineering team published a blog post detailing the architecture used for optimizing feature discovery in their machine learning infrastructure->read more on Uber engineering blog

👩‍💻 Job Openings and something more

Our partner Snorkel.AI is looking for an AI evangelist. As the AI Evangelist, you will operate at the intersection of code, content, and community. You can apply here.
Recently, the editor of TheSequence was invited to become one of the first 100 curators for the new social network called Faves. Faves is a platform where leading creators (like Marissa Mayers, ex-Yahoo) share the content that fuels their thinking. On that platform, we share the articles, podcasts, tweets etc that inform our research about ML and AI. You can do two things: apply to become their founding ML engineer (they are also looking for data&analytics lead) or join the waitlist to get a taste of it (the waitlist is over 10,000, but via our invite, you get to try the platform immediately).

💸 Money in AI

For ML and AI teams:

Cloud data science and analytics platform Zepl has been acquired by DataRobot. The idea is to enable data scientists to customize AI models developed on its platform and collaborate using an open-source Apache Zeppelin notebook. The price of the acquisition has not been disclosed.
ML startup Sima.ai raised $80 million in a Series B round led by Fidelity Management & Research Company. The company transforms the embedded edge market through high-performance computing at the lowest power. It is also working on its first-generation ML SoC (System of Chip) platform.
AI analog chip startup Mythic raised$70 million in a funding round led by BlackRock and Hewlett Packard Enterprise. The company plans to fund the analog-based AI processors, launch it later this year, and fuel the mass production.
Container observability startup Sosivio raised a $4 million seed round led by Seamans Holdings. It applies AI to do the heavy lifting by providing eventless failure prediction and autonomous failure resolution.

TheSequence