➿ Does Machine Learning Requires Interoperability?
Weekly newsletter that discusses impactful ML research papers, cool tech releases, the money in AI, and real-life implementations
📝 Editorial
Machine learning frameworks are proliferating everywhere. From traditional machine learning frameworks like Scikit-Learn to popular deep learning stacks such as TensorFlow or PyTorch, there are a plethora of options for any given machine learning problem. In fact, most large data science infrastructures in the real world end up using multiple frameworks and platforms across different teams. As a result, the interoperability between these frameworks can become a real challenge. This line of thinking has triggered efforts within the machine learning community to create standards and frameworks that enable interoperability between machine learning stacks. But is that really necessary at this point?
The interoperability challenge in machine learning frameworks is very tangible from a technical standpoint but we have to wonder whether it is a problem for most organizations. The scenarios for interoperating between different machine learning frameworks seems like the type of requirement that only a handful of big tech companies in the world have. Furthermore, most machine learning frameworks are still evolving at a rapid pace, so we barely know what could be relevant in terms of interoperability. Compared to designing interoperability standards, efforts that try to simplify lower layers of the stack, such as data structures, seem to be more relevant. This week, a group of leading tech companies announced the launch of a new consortium (because we don’t have enough of those already 😉) to provide standards for data structures common in machine learning problems.
TheSequence Scope – our Sunday edition with industry’s development overview – is free. To become really knowledgeable and synced with machine learning and artificial intelligence, please subscribe to our educational ML newsletter TheSequence Edge that comes on Tuesdays and Thursdays.
🗓 Next week in TheSequence Edge:
Edge#15: the concept of machine teaching; a research paper that debates the concept of generative teaching networks and its usage at Uber; a deep dive into Snorkel-Flow, one of the most innovative data labeling platforms in the market.
Edge#16: the concept of probabilistic programming languages; the ideas behind MIT’s Gen, a new generation probabilistic programming language; an overview of some of the most popular probabilistic programming languages on the market.
Now, let’s review the most important developments in the AI industry this week.
🔎 ML Research
A Multi-Language Embedding for BERT
Google Researchers published a paper proposing a model called LaBSE – a multi-language BERT embedding model for over 100 languages ->read more on Google Research blog
Games to Boost Conversational AI
Facebook AI Research published a paper proposing a gamified method to boost data collection for conversational AI models ->read more in the original research paper
Deep Learning on Controlled Noisy Data
Google Research published a paper proposing a technique to understand how deep learning models work with synthetic data ->read more on Google Research blog
Computational Inference at Netflix
Netflix Research published a paper introducing the concept of computational inference as an interdisciplinary field across causal inference, algorithms design, and numerical computing->read more on Netflix Research blog
🤖 Cool AI Tech Releases
Consortium for Data API Standards
A few organizations launched a consortium to build standards for data structures across different Python data science frameworks. Founding sponsors include Intel, Microsoft, the D. E. Shaw group, Google Research, and Quansight ->read more in the announcement blog post
D4RL and CQL
To address the challenges of offline reinforcement learning, Google Research designed and released an open-source benchmarking framework, Datasets for Deep Data-Driven Reinforcement Learning (D4RL), and an offline RL algorithm, called conservative Q-learning (CQL) ->read more on Google Research blog
AI at Duolingo
Although not a tech release, VentureBeat published a very detailed article of Duolingo’s AI practices, containing lots of important lessons for data science practitioners. Fascinating reading ->read the article on VentureBeat
💬 Useful Tweet
💸 Money in AI
Medical technology startup Exo raised $40 million in a round of funding. Aside from a portable ultrasound device, the team works on computational photography algorithms, developing AI-powered cloud-based workflow software to simplify imaging and guiding the user to capture clinically relevant data.
Hour One, a startup that creates AI-driven synthetic characters, raised $5 million in its seed round. The idea is to create digital avatars based on real people and then turn text-based content into engaging, human-centered videos. The team states that such communication makes information more memorable and meaningful, ultimately leading to action. The startup is currently prospecting companies in the e-commerce, education, automotive, HR, and enterprise sectors.
Cybersecurity firm SpyCloud closed a $30 million round. The team builds big data analytics tools to detect and prevent fraud as well as investigate criminals attempting to harm the business.
No-code contract management software company Agiloft raised $45 million in its first round of growth funding since it was founded in 1991. The Agiloft AI core was introduced in February, the algorithms extract metadata like contract amount, renewal date, and more from scanned documents, and they help manage compliance and reduce risk by automatically identifying risky contracts and suggesting actions.
Robotic supply chain startup Attabotics raised $50 million in Series C. Inspired by ant colonies, their automation technology condenses rows and aisles of warehouse shelves into single, vertical storage structures. The fleet of robots minimizes people's intervention. Due to the pandemic reduction of human presence, investing in robotics and automation is getting hotter.
Earth observation startup Pixxel raised $5 million in seed funding. They are going to launch a constellation of Earth observation satellites that will provide higher-quality data with analysis driven by Pixxel’s own deep learning models. The team claims to be capable of extracting actionable insights, identifying and potentially predicting impactful events and phenomena.
Conversational AI software market Yalochat has raised$15 million in Series B. Considering that many startups are working on conversational commerce, Yalochat’s CEO shares that their strategy is to offer companies “something super simple but the high value that they could launch in a week.”
Data scientists, scholars, and developers from Microsoft Research, Intel Corporation, Linux Foundation AI, Google, Lockheed Martin, Cardiff University, Mellon College of Science, Warsaw University of Technology, Universitat Politècnica de València and other companies and universities are already subscribed to TheSequence.
TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, exploration of new ML frameworks, and platforms. It also keeps you up to date with the news, trends, and technology developments in the AI field.
5 minutes of your time, 3 times a week– you will steadily become knowledgeable about everything happening in the AI space.