Discover more from TheSequence
🎙 Chat with Justin Harris: "My vision is for people to be compensated for the data they provide while keeping models free to use"
TheSequence speaks to ML practitioners to merge you into the real world of machine learning and artificial intelligence
👤 Quick bio
Senior Software Developer at Microsoft, Justin specializes in Sharing Updatable Models (SUM) on Blockchain (check out the project on GitHub). Justin is currently using his experience in machine learning and crowdsourcing to implement a framework for ML in smart contracts to collect quality data and provide models that are free to use. A proof of concept has been done using the Ethereum blockchain. Justin is now interested in applying the same incentives for good quality data in a Federated Learning setting. Learn more about Justin at https://aka.ms/hodl. The views expressed below are Harris’s own.
Tell us a bit about yourself. Your background, current role and how did you get started in machine learning research?
JH: I really enjoy skiing and I care a lot about the environment. I just got a new puppy and I named him Skywalker. I’ve always been interested in sci-fi so working on AI was just natural for me. I started at a small startup called Maluuba in 2011 while I was finishing my bachelor’s degree. They needed help and were willing to teach me about ML. Before finishing my bachelor’s, I was interviewing people with PhDs. We developed virtual personal assistants for phones, TV, cars, etc. I worked on building models and frameworks to support Natural Language Understand systems in many languages. In early 2017, Microsoft purchased Maluuba and I have been able to continue similar work there at a much larger scale.
A lot of your most prominent work focuses on the area of decentralized AI. Why do you feel decentralization is important for the future of AI and how would you compare it with traditional centralized models?
JH: Decentralizing AI is about democratizing AI. Models are trained using data that we all provide so it makes sense for us all to benefit from these models. The best way to do that is to remove the centralized authority controlling models and for us to find ways to share them. When a model is centralized, it can be easily closed off to the world, e.g., by shutting down an API or removing some code that was once available publicly. This notion is becoming more important with Data Dignity initiatives that are immediately important in many spaces, such as the gig economy, where a worker loses their reputation when switching from one platform to another. Similarly, you want to retain the work you put in when switching from one model to another. Imagine the possibility of using your Alexa home assistant for years and implicitly giving feedback, then confidently switching to another brand, and knowing that at their cores these assistants shared a base model. They could still add their own secret sauce on top but they could share models too, just like how they’re already sharing open-source code and common engineering designs.
Recently, you have published some relevant work about incentive mechanisms for decentralized AI architectures. Could you share some ideas about your vision for this area of decentralized AI?
JH: It would be nice if models could be used for free because they’re trained with data that we’re all providing. Also, with models moving more to devices instead of the cloud, it will become difficult to charge per query for running the model to get a prediction. Instead, it would be great if we could treat models as common goods to be shared. So, with that as the caveat, how do we encourage people to contribute to something they get to use for free? The incentive mechanisms are aiming to solve this problem. Like many solutions in the blockchain space, we are trying to get creative with how to keep our system stable. My vision is for people to be compensated for the data they provide while keeping models free to use. I currently see two main ways to do this: a benevolent person rewards people for good contributions by introducing a pool for rewards or a zero-sum game where bad data contributors effectively pay good data contributors. Of course, this might fit every scenario.
Your research also covers the area of federated learning. What makes federated learning so difficult to implement but also so relevant?
JH: Federated Learning (FL) addresses many of the original intents of SUM such as persistence: promoting sharing models by decentralizing them. Using principles of FL along with a few techniques that normally go with it such as Differential Privacy (DP) and secure Multi-Party Computation (MPC), keeping data private will be much easier. So, there is a natural fit with SUM and my interests. One big challenge in FL is that you don’t know where the data came from because you only get to see the update to the model and not the actual data that produced this update. It’s difficult to get each component of an FL system right so that data is really private while keeping accuracy high. My concern is that I don’t hear much about encouraging good quality data to be submitted (of course, there is work in filtering out bad updates with techniques like outlier detection). Usually, FL is done in systems that the user gives good data implicitly, for example in next word prediction, they tap on the word they would have typed. With models exposed so publicly, we should be concerned that a malicious actor could spoof bad data in order to corrupt the model. I’m hoping that we can find ways to use incentive mechanisms that I’ve introduced or ones inspired by the same principles when adopting FL solutions.
TheSequence is a summary of groundbreaking ML research papers, engaging explanations of ML concepts, and exploration of new ML frameworks and platforms. We keep you up to date with the news, trends, and technology developments in the AI field.
Is federated learning very dependent on mobile deep learning? What do you think are the biggest challenges that mobile deep learning frameworks need to overcome to have mainstream adoption?
JH: Right now, FL does depend on being able to perform deep learning on mobile devices. You often hear about how a crazy infrastructure was needed to train the latest large model. That infrastructure isn’t needed to train on your mobile device because you’re just training with a few samples, not a giant dataset. Still, just running inference (prediction not training) on your mobile device for certain tasks would be computationally expensive because of the size of the model. There is lots of important work being done in trying to distill large models to run them on less powerful devices but training these models will still be difficult. I’m hoping that more work will be done on finding ways to train large scale models more efficiently, maybe with entirely new algorithms. For example, maybe the entire model doesn’t need to be updated? Maybe just one of the matrices? FL adoption means that we can look more into adapting model training paradigms since we won’t be using typical techniques available when training on a large dataset and producing a static model.
Any other areas of deep learning you are excited about these days?
JH: I’m excited to see companies sharing research and code. This would have been unfathomable 10 years ago where it was considered valuable to keep things secret. I’m looking forward to the day where companies can collaboratively contribute to decentralized models not controlled by one centralized organization, similarly to how they do with open-sourced code like Linux, Android, git, programming languages, etc. Once they do, and even before that, explainability, privacy, outlier detection, and many other aspects will become very important.
I’m also excited to see deep learning becoming easier to use and being applied to more problems.
💥 Miscellaneous – a set of rapid-fire questions
TensorFlow or PyTorch?
JH: PyTorch. I like TensorFlow because it feels stable in production but most researchers I work with prefer PyTorch.
Favorite math paradox?
JH: There’s no such thing as something that is not interesting. Assume there is, then there must be the smallest item of the set of data that is not interesting but that’s interesting! ※
Any book you would recommend to aspiring data scientists?
JH: Invisible Women: Data Bias in a World Designed for Men by Caroline Criado Perez
Is P equals NP?
TheSequence’s goal is to make you smarter about artificial intelligence. 5 minutes of your time, 3 times a week – you will steadily become knowledgeable about everything happening in the AI space. Subscribe to receive it straight into your inbox.