💰 Edge#33: The millionaire’s problem and sMPC

TheSequence is a convenient way to build and reinforce your knowledge about machine learning and AI

Oct 27, 2020

In this issue:

we overview the concept of secure multi-party computation (sMPC);
we explore Microsoft’s CrypTFlow – an architecture for using sMPC in TensorFlow;
we explain Facebook’s CrypTen framework for sMPC implementations in PyTorch.

Enjoy the learning!

💡 ML Concept of the Day: Secure Multi-Party Computation

Continuing our series about privacy, we would like to discuss Secure Multi-Party Computation (sMPC) that has become a foundational technique in private machine learning. sMPC is a cryptographic technique that allows different parties to perform computations over inputs while maintaining those inputs private.

In computer science theory, sMPC is often seen as a solution to the famous Yao’s Millionaires’ Problem, introduced in the 1980s by computer scientist  Andrew Yao. The problem describes a setting in which multiple millionaires would like to know which of them is richer, without disclosing their actual wealth. The millionaire’s problem is present in many real-world scenarios such as auctions, elections and online gaming. Conceptually, sMPC replaces the need for a trusted intermediary with secured computations. The goal of sMPC is to enable a group of independent data owners, who do not trust each other or any common third party, to jointly compute a function that depends on all of their private inputs. In the sMPC model, a set of parties with private inputs computes distributed functions such as security properties while fairness, privacy and correctness are preserved.

In the last few years, sMPC protocols have evolved to become viable for usage in complex computations, such as the ones required in machine learning models. In the context of machine learning, sMPC enables building models that can perform computations over training datasets without having complete access to clear data. Although sMPC-based machine learning models are still in a very early stage, we are already seeing applications in regulated industries such as health care, defense and pharmaceuticals.

🔎 ML Research You Should Know: Microsoft CrypTFlow –an Architecture for Using sMPC in TensorFlow

In the paper, CrypTFlow: Secure TensorFlow Inference, Microsoft Research proposes a framework to seamlessly convert TensorFlow inference code into secure multi-party computation (sMPC) protocols.

The objective: Present a framework that abstracts the use of sMPC protocols from TensorFlow developers.

Why is it so important: Microsoft Research has an entire group dedicated to advance sMPC-based machine learning. CrypTFlow is one of the first projects produced by that group.

Diving deeper: sMPC is one of the most efficient techniques in privacy-preserving machine learning scenarios. One of the most common sMPC use-cases in machine learning is to enable secure execution of inference routines over machine learning algorithms when the model and the query are required to be hidden from the protocol’s participants. While conceptually trivial, the implementation of these types of sMPC use-cases remains complex and dependent on highly specialized knowledge. Furthermore, many of the research examples of sMPC in machine learning are very basic and use very small datasets that rarely resemble real-world scenarios.

CrypTFlow presents an architecture to convert TensorFlow inference code into sMPC protocols without requiring any major modifications. The approach guarantees to preserve the accuracy of the original TensorFlow model, which represents a major improvement over the alternative systems. The architecture of CrypTFlow is organized in four major components that, as a result of a poor marketing choice 😉, are named after the Three Musketeers.

Athos: The first part of CrypTFlow is based on Athos, a computer that transforms TensorFlow code into a variety of sMPC protocols while preserving accuracy.

Porthos: Porthos is a high-performance variation of 3-party sMPC protocol that is used as the sMPC backend for the code generated by Athos.

Aramis: This component modifies the code to guarantee that it is secure against malicious adversarial attacks.

Image credit: Microsoft

CrypTFlow’s research shows the potential of converting TensorFlow inference code into sMPC protocols without the need of major modifications. These ideas can represent the foundation of a new generation of sMPC frameworks for the deep learning space.

🤖 ML Technology to Follow: Facebook’s CrypTen Enables sMPC in PyTorch Models

Why should I know about this: Facebook open-sourced CrypTen to streamline research and implementation of sMPC techniques in PyTorch.

What is it: CrypTen is a new, easy-to-use software framework built on PyTorch to facilitate research in secure and privacy-preserving machine learning. CrypTen incorporates security and data privacy techniques as a native citizen of machine learning models, allowing researchers to leverage these methods without having to become an expert in cryptography. The core of CrypTen’s architecture is based on an implementation of sMPC protocols in PyTorch programs. CrypTen enables a very straightforward flow illustrated in the following figure:

Image credit: Facebook AI

Compared to other privacy libraries in the space, CrypTen brings some tangible benefits:

PyTorch-Based: Developers using CrypTen have access to the entire PyTorch stack. Also, CrypTen has been optimized for machine learning scenarios and doesn’t require any special adaptations.

Library-Based: CrypTen is implemented as a native PyTorch library, not as a compiler like most privacy frameworks in the market.

Real-World Machine Learning: CrypTen was built to address privacy in real-world machine learning scenarios. The framework supports privacy across different structures ranging from basic linear models to complex neural network architectures.

Developers can use CrypTen with a few lines of code, without requiring major modifications to their PyTorch programs. The current implementation of CrypTen is already integrated with large scale computing infrastructures such as AWS.

How can I use it: CrypTen is open source and available at https://github.com/facebookresearch/CrypTen

Share TheSequence

🧠 The Quiz

Are you ready to test your knowledge? Every ten quizzes, we randomly choose a few active participants and reward them. Participate!

How secure multi-party computation (sMPC) is relevant to private ML models?
What is the main use case for Facebook’s CrypTen framework?

Answer the questions

That was fun! Thank you. See you on Thursday 😉

TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. TheSequence keeps you up to date with the news, trends, and technology developments in the AI field.

5 minutes of your time, 3 times a week – you will steadily become knowledgeable about everything happening in the AI space.

TheSequence