🎙 Orly Amsalem/cnvrg.io on building developer-first ML products
Can software developer be transformed into an ML creator?
It’s so inspiring to learn from practitioners and thinkers. Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you like it. No subscription is needed.
👤 Quick bio / Orly Amsalem
Tell us a bit about yourself. Your background, current role and how did you get started in machine learning?
Orly Amsalem (OA): I am Orly Amsalem, the VP of AI Innovation and BizDev at cnvrg.io, a machine learning (ML) platform. I lead a team that creates ML solutions that are designed for software developers. We help software developers enhance their applications with ML in their day-to-day work.
I got into machine learning years after I was a software engineer, developing large-scale information systems and BI systems. It was natural for me to leverage my knowledge in data and data analytics and move on to using machine learning.
🛠 ML Work
Could you tell us about the vision and inspiration for the cnvrg.io platform? What is the importance of having a developer-friendly, infrastructure-agnostic stack for ML pipelines in the current state of the market?
OA: Our vision is to help data scientists build AI easily while focusing on what data scientists were trained to do, which is to solve complex problems. Today, data scientists and ML developers spend the majority of their time with technical complexities. cnvrg.io enables ML developers to spend less time on infrastructure, versioning and operations in general. The original cnvrg.io classic solution achieves this goal, and we have customers that can testify that the time they spent on non-data-science work has dramatically reduced.
As many of us know, there is a shortage of data scientists in the world, and the demand for creating and using ML is increasing. This led us to create AI Blueprints, a software developer-friendly ML tool that can practically transform every software developer into an ML creator. Our belief is that software developers can close the gap between the ever-increasing demand for AI in everyday applications. As developers ourselves, we know how important flexibility is to the software development creation process. This is why we offer an open platform that is agnostic to infrastructure, so that every data scientist and every software developer can freely implement ML.
AI Blueprints is a recent addition to the cnvrg.io platform that enables the use of curated ML pipelines in ML solutions. On one hand, AI Blueprints can lower the bar for developers to build ML pipelines. On the other hand, data scientists are typically skeptical about curated recipes as ML solutions require quite a bit of customization. Could you tell us more about AI Blueprints and how does it balance ease-of-use versus developer customization and flexibility?
OA: When we designed AI Blueprints, we thought exactly about those audiences – professional data scientists and software developers, and also business users. We asked ourselves the same question: how can we create one tool that can help such different preferences? And the answer is in the way they are built. AI Blueprints are open-source, fully customizable ML pipelines. For professional data scientists, we recommend using AI Blueprints as a means to democratize AI in their organization. Data scientists can be the creators and curators of advanced AI models and organizational data sets, and make those models accessible to the entire organization as a ready-to-use blueprint. This means that a software engineer can go to the organization marketplace and find a blueprint that was created by peer data scientists in their organization and apply the blueprint to their own use case or application. For software engineers, we created functionality that enables them to easily consume ML models, and even train ML models on their own data. Because AI Blueprints are built on top of the cnvrg.io MLOps operating system, we can simplify all the technical complexity underneath, and provide only an abstraction layer through AI Blueprints. At the same time, AI Blueprints are customizable and offer the option to change components and have visibility into the code. That way, whatever your level, you can find value in AI Blueprints.
One of the biggest challenges in real world ML solutions is the use of new research models which haven’t been battle-tested in production environments or have complex computation requirements. Can AI Blueprints become a hub to enable developers with access to new, cutting edge models in their ML pipelines?
OA: Yes. AI Blueprints was created for the community. This is why it was so important for us to open-source all the code. As an open platform, we invite contributions from other professional data scientists. This can definitely include new research models to be tested and evaluated by the community as well. You can already browse through our marketplace and find all kinds of ready-to-use AI Blueprints built by cnvrg.io data scientists for object-detection, text-detection, pose-detection and scene-detection and more that can be easily applied to your application in a few clicks.
Infrastructure portability is one of the aspects of ML solutions that is quite often ignored. This is the problem cnvrg.io is trying to address with Metacloud which abstracts the underlying compute infrastructures such as AWS or Azure from the lifecycle of ML pipelines. How big of a challenge is infrastructure locked-in for ML solutions in this nascent state of the market?
OA: cnvrg.io Metacloud is about providing an infrastructure marketplace that will simplify the work of data scientists, especially in large-scale machine learning.
Large-scale ML comes with immense costs that can be a major blocker for many small-medium enterprises. The flexibility to have an open market of infrastructure opens the door to leveraging infrastructure on traditional cloud platforms. The flexibility to have other players that can participate is important for optimizing costs and lowering the bar for companies that can leverage large-scale ML. cnvrg.io has been called the “Switzerland for AI computing” by making it effortless for enterprises to run and move workloads across a variety of infrastructure, whether it’s in the cloud, on-premises or at the edge. Like in every field in life, positive competition can be healthy.
cnvrg.io provides a complete stack to build and manage the lifecycle of ML pipelines. How do you see the balance between all-in-one platforms like cnvrg.io and best-of-breed stacks that specialize in a single aspect of the ML solution lifecycle?
OA: At the end of the day, the most important thing is that data scientists and software developers will be able to deliver amazing applications, powered by ML. So, from an ex-developer point of view, I would say that teams should choose the tools that work for them and make their day-to-day work more efficient. But, at the end of the day, you want as few bottlenecks as possible. The different phases of the ML pipelines are tightly coupled and often need to be tracked across the entire lifecycle. Having one fluid pipeline from research to production will enable customers to have a seamless workflow with fewer hiccups. Having more platforms that are involved in the process sometimes adds complexity and disorganizes workflows. For example, a typical ML lifecycle starts with a model being trained on data, then it is deployed into production, new data flows into the production system, and the model now needs to take this fresh data into account, and perhaps trigger a retraining process. In that case, using separate systems may create friction in the process and add complexity to managing the end-to-end ML pipeline. So, I would recommend choosing a platform that can manage the full life-cycle of ML models, and automate the continuous training and deployment of AI and ML models.
💥 Miscellaneous – a set of rapid-fire questions
What book can you recommend to an aspiring ML engineer?
OA: There are many great books out there. But, I think that the best way to start is to be practical and to roll up your sleeves and start coding. Especially in ML, it's one thing to read about it and a completely different thing to implement it. I also recommend following leading companies' announcements regarding new models and AI technologies that they release, like advanced language models or vision models that we recently heard about, and start playing with them. There are also great communities much like TheSequence that discuss these types of topics and offer a great way to learn and interact with other AI developers. You can ask questions when you get stuck and follow along with others’ projects. We also hold a community conference annually called mlcon, which brings together AI experts to share their real-world AI applications and explain important topics in MLOps.
Is the Turing Test still relevant? Any clever alternatives?
OA: We definitely need methods and frameworks to make sure ML is ethical and responsible. We constantly need to ask ourselves where this technology can take us, and what is the best and most ethical way to get there. For example, one of the things that companies are adopting is assigning red-teams. These teams try to attack the models, find biases that might be perpetuated through models, and make sure our AI is ethical.
Biggest milestone for AI in the next five years?
OA: Responsible AI and making sure we are using ML in an ethical way. This is already a major topic, but finding solutions to both make AI democratized while also controlling it ethically will be an important balance that the community and enterprises will need to solve.