In TheSequence Guest Post our partners explain in detail what ML and AI challenges they help deal with. In this article, Fara Hain from Run:ai discusses the problem of ‘shadow AI’ – when siloed teams each buy their own infrastructure or use cloud infrastructure for AI (along with their own MLOps and data science tools), creating inefficiency and complexity for the teams managing this infrastructure.
How Does Shadow AI Begin?
Like 'shadow IT' in the 2010s, when cloud computing made it easy for rogue teams to purchase and manage their own infrastructure, IT is once again contending with their shadow. This time, the rise of one-off AI initiatives inside organizations is creating Shadow AI. It begins with good intention – as a way to quickly benefit from AI. Teams rarely have any sinister reasons to purchase GPU servers (or use cloud resources) that fall outside of the purview of IT. They simply intend to complete test projects or train models and begin building their experimentation platform in the simplest way possible.
Unfortunately, these initiatives rapidly become siloed. In addition to being a challenge for IT and InfoSec teams to manage, a decentralized approach results in many AI initiatives never making it to production.
Teams using AI infrastructure in this siloed way may find that they can succeed for a short time with a few data scientists building a few models, but over time the organization needs to reign in the many small AI projects to help build efficiency and shift focus to production-ready AI.
Why Should Organizations Avoid a Siloed Approach to AI?
In Q3 2021 Run:ai completed a survey of over 200 people who have some degree of responsibility for their company’s GPU infrastructure. One of the findings displayed here will help answer the question.
Almost two-thirds (63%) of surveyed companies have research teams of 10 or more. These companies are growing their data science functions across departments. It’s amazing to see that 30% of those surveyed have more than 75 researchers in data science teams.
But the problem became clear when we asked them: What does access to GPUs look like in your organization? Is it easy to get access when you need it? 35% replied simply, “No.” Another 38% replied that it was only ‘Sometimes easy to access resources’. While having access to a few GPUs may be relevant when teams are building or training small models, when scaling AI research, or when moving models to production, access to larger quantities of GPUs is needed.
A siloed approach quickly adds complexity and management challenges to the teams responsible for AI. These researchers sometimes need vast quantities of compute to achieve their goals and if each team is responsible for their own infrastructure, the resources may not be in the hands of the right teams.
In addition, GPUs are often sitting idle. I love GPU unboxing videos as much as the next person, but a workstation sitting under a researcher’s desk is essentially an expensive piece of furniture. 75% of the time, that server will be sitting idle.
Why do GPUs sit idle?
According to this recent article written by one of my colleagues, there are many reasons – from the mundane – ‘researcher who holds access going to get lunch or taking a day off’ to the more complex reasons based on how AI models work. According to the article,
“Most applications have CPU and I/O work in between launching GPU kernels. The GPU utilization of a deep learning model running solely on a GPU is most of the time much less than 100%. For example, in medical imaging models, each step can have a few minutes of work on the CPU as well. Well-built architecture can avoid this by running CPU and GPU tasks in parallel instead of sequentially.”
There are obvious and sometimes inherent limitations of AI modeling that keep GPUs idle.
How to avoid Shadow AI?
A siloed approach to AI in an organization will ultimately frustrate researchers and slow down production-ready modeling. It can be avoided in the following ways:
Pool Resources and Manage Centrally Across Teams
The first step to centralized AI is to pool all available CPU, memory, and GPU resources and give data scientists access to those resources seamlessly. Nodes added to the clusters should also be instantly available to research teams. Ideally, IT also will make on-demand bursting to the cloud part of their pooled setup.
Let IT Admins Set Policies Based on Research Priorities
Once a pool of GPUs is available to all teams and users, automation can be set up. Pre-defining policies across projects, users and/or departments help align resource consumption to business priorities. For one of Run:ai customers that meant shifting access to greater quantities of GPUs for the team doing COVID research. Other teams had priority prior to the pandemic, but the research institution was able to automatically shift access to the researchers who needed large quantities of compute at that time.
For many Run:ai customers, shadow AI is already in the past. The graph below shows how one company’s ability to access pools of GPUs on-demand has greatly increased utilization. This is directly correlated to their ability to do faster modeling.
With GPUs managed by different teams, the customer was initially getting less than 25% utilization of their expensive on-premises compute resources. In the first few days after Run:ai’s Atlas Platform was installed, you can see on the graph that automation enabled researchers to access their GPUs but also any idle GPUs in the pooled environment. That was an eye-opening event for our clients.
Within two weeks utilization regularly hit 100%. This enables IT to see that they do actually need to purchase additional resources. New servers were installed in mid-November, and quite quickly researchers across teams and departments were able to access those GPUs as well.
Conclusion
Shadow IT is mostly a thing of the past, and shadow AI should be as well. By pooling researchers’ compute resources and automating access to idle GPUs, we imagine shadow AI will be short-lived.