The Sequence Chat: Shreya Rajpal, Co-Founder and CEO, Guardrails AI About Ensuring the Safety and Robustness of LLMs
The co-creator of one of the most important LLM guardrails frameworks shares her perspectives on building safe, robust and efficient LLM applications, the architecture of Guardrails AI and more.
Quick Bio
Shreya Rajpal is the creator and maintainer of Guardrails AI, an open-source platform developed to ensure increased safety, reliability, and robustness of large language models in real-world applications. Her expertise spans a decade in the field of machine learning and AI. Most recently, she was the founding engineer at Predibase, where she led the ML infrastructure team. In earlier roles, she was part of the cross-functional ML team within Apple's Special Projects Group and developed computer vision models for autonomous driving perception systems at Drive.ai.
🛠 AI Work
You are the creator of Guardrails AI, can you tell us about the vision and inspiration for the project?
Guardrails AI began from my own exploration of building applications with Large Language Models (LLMs). I quickly discovered that while I could reproduce some really exciting GPT-application demos, deriving practical value from these applications repeatedly was challenging, mostly due to the inherent non-determinism of LLMs.
There were significant parallels between the production gap with LLM apps that reminded me of my experience in self-driving cars. The new wave of applications people build with LLMs is more complex, more like a self-driving car's system, combining various specialized models to complete individual tasks.
An early hypothesis I had was that the process of building reliable AI applications would be like the strict online verification processes used in self-driving cars. Guardrails AI came out of that hypothesis, with the goal of bridging the reliability gap between AI-driven and more traditional, deterministic software components.
What are the core components of the Guardrails AI architecture and what’s the role of the validators?
The core components of the Guardrails AI architecture are 1) Guards and 2) Validators.
The Guard operates as a sidecar to the LLM. Whenever an LLM call is made, the Guard acts as an inline verification suite that ensures both inputs and outputs of the LLM are correct, reliable and respect specified constraints. If your application has a chain or an agent framework with multiple LLM calls, you will typically instantiate multiple LLM Guards to enforce validation at each stage and mitigate the risk of compounding errors.
A Guard is composed of multiple Validators, where each validator enforces a specific requirement for “correctness”. For example, one Validator may ensure that the LLM's output adheres to a specific format like JSON. Another might check for hallucinations in the output, while yet another could screen for toxicity or profanity. These Validators programmatically assess LLM-generated outputs against their respective “correctness” criteria, which altogether builds a comprehensive verification system.
Guardrails AI introduces Reliable AI markup Language(rail) as a language for expressing the validation rules in an LLM which can be expressed in Pydantic, rail or pure Python depending on the structure of the output. Why a new language versus just using Python validation functions?
One of the early decisions in the development of the open-source project was to design something as user-friendly for non-technical users as ChatGPT-based prompting. I created RAIL as a solution to that problem and designed it to be a variant of XML. Since many non-programmers have at least a basic familiarity with HTML, the learning curve for RAIL is fairly straightforward.
At its core however, Guardrails AI is a framework for enforcing constraints on AI applications. As such, both RAIL & Python are essentially interfaces that allow developers to specify which constraints matter to them and choose the interface that best suits their technical expertise.
There is a trend in foundation models about using models to validate other models. How do you see the balance between this approach and the more discrete validation approach proposed by Guardrails AI?
I believe the current trend of using Large Language Models (LLMs) to validate other LLMs stems more from a lack of alternative validation tools than from its efficacy as a validation method, primarily because of these reasons:
Traditional ML metrics are no longer adequate for the new wave of LLM use cases. For example, higher ROGUE scores in text summarization don't correlate with human ratings of better summaries.
LLMs are extremely flexible and adapt reasonably well to new tasks.
However, this creates a 'who-will-guard-the-guards' problem. How can we be sure that the validating LLM is reliable and aligned with our expectations? Anecdotally, I've seen substantial variance using LLM self-validation across repetitive runs. While self-validation is a component of a verification system, developers typically want the bulk of validation to be more deterministic and interpretable.
The Guardrails AI framework takes a much more grounded approach to validation. We break down the validation problem into smaller components and use a mix of heuristic rule-based engines, domain-specific finetuned models, and external systems for a more reliable and interpretable validation process. While scaling this approach takes longer, it builds a greater degree of trust in the overall system.
One of the things I find fascinating about projects like Guardrails AI and the LLM-validation space in general is to use the outputs of active validations to create better benchmarks (which I think we desperately need). How do you see Guardrails AI improving the way we test and benchmark foundation models?
Great point! One unintended benefit of the current evaluation crisis is the wealth of valuable metadata generated during testing. Every request yields not only raw output from the LLM, but also multiple scores and metrics that capture the quality aspects you’re interested in. This data is extremely useful with two applications: it's a great resource for benchmarking existing applications and ends up being a rich goldmine for fine-tuning newer, smaller domain specific open-source models.
Are there fundamental differences between building Guardrails for different domains such as language, image, video, or audio? Additionally, what are the core distinctions from unit testing frameworks?
From a system-design perspective, the underlying philosophy of implementing guardrails is fairly consistent across different modalities. However, the actual guardrails implementations can vary significantly. Here’s what I mean by that:
The key challenge with building guardrails lies in encoding “correctness” requirements into actual executable checks that function as a verification system around the foundation model. Because the tasks we use GenAI for are fairly complex when compared to previous generations of ML applications, evaluating their correctness is non-trivial. For example, two summaries could convey completely different information while still being “correct”.
Our solution to this problem is to break down “correctness” into discrete validators, where each validator focuses on one aspect of the model’s performance. E.g., for generative text, different validators inspect structured output generation, hallucinations and toxic content. On the other hand, generative audio validators may look at the fidelity to the voice being replicated, the continuity of audio and the speaking speed.
Compared to traditional unit testing frameworks, guardrails are domain specific and also inherently stochastic.
🛠 AI Work
What is your favorite area of research outside of generative AI?
Efficient ML! I worked on this in the past but thinking about how to take very large models and shrink them down to low latency, low memory versions of themselves without sacrificing performance is a super exciting problem.
Is the guardrails space substantial enough to create standalone platforms, or will these projects evolve as features of larger AI platforms?
I strongly believe that there is substantial room for specialized guardrails for LLMs. To unlock their full potential, we need to solve key challenges that allow us to use LLMs as reliably as deterministic software APIs today. It’s very hard to solve these challenges purely on the model level, which creates an opportunity for platforms working exclusively on Guardrails.
For instance, consider a financial institution utilizing LLMs for automated customer support. Here, the guardrails must not only ensure data security, lack of hallucinations, etc. but also adhere to specific industry and company regulations.
How do you differentiate Guardrails AI from the work NVIDIA is doing with NeMO Guardrails?
Great question! NeMO is primarily a framework for building chatbots that allows you to create controlled dialog flows. Guardrails AI, on the other hand, is a framework for performing LLM validation – you can use the Guards that are created from Guardrails AI as checks in chatbots you create using NeMO. Additionally, Guardrails AI applies to all types of LLM applications in addition to chatbots and dialog systems.
Your work seems to strongly support open-source initiatives. How do you perceive the balance between open source and closed foundation models? Who ultimately prevails in the end?
I don’t think it's strictly either-or — I strongly believe that there's room for both open source and commercial models to co-exist. Currently, commercial models lead the way, especially in setting high-performance baselines on new benchmarks. However, the top open-source models are not far behind in terms of raw performance. Companies that will go for open-source models will do so for reasons related to cost, privacy, or explainability. At this stage however, the industry is focused on demonstrating the significant value that LLMs can bring; only then will it make sense for companies to invest in tuning and hosting their own custom models.