The Sequence Chat: Microsoft's Evan Chaki on Semantic Kernel and Combining LLMs with Conventional Programming Languages
A veteran innovator in areas such as low-code now is working on one of the most innovative projects in the LLM space.
👤 Quick bio
Please tell us a bit about yourself, including your background, current role, and how you got started in machine learning.
I moved from Florida to Seattle five years ago, after working in various technology roles across different sectors. At Microsoft, I have been involved in accelerating the Power Apps and Dynamics platforms, which enable low-code / no-code development of business applications.
For the past five months, I have been focused on Semantic Kernel, which is an open-source project that makes adding intelligence to new and existing apps approachable for app owners to help increase productivity for their users.
I have always been interested in machine learning and how it can help increase user productivity. I had the chance to lead the AI Builder team, a few years ago, for a brief period and it was amazing to see the power of the models.
🛠 ML Work
Your recent work at Microsoft has been focused on the Semantic Kernel (SK) framework, which was recently open-sourced. Could you please elaborate on the vision and history of the project?
For several years, Microsoft has been on the forefront of advancements in supercomputing and machine learning, and we've seen the industry train large AI models that can accomplish a wide variety of tasks using natural language —from summarizing and generating text, to generating photorealistic images, to writing sophisticated computer code. We quickly realized the power of these models and what they will be able to unlock for people, developers and organizations. As a team we realized that we needed to share what we learned with a broad open-source community, which we did in March 2023.
It has been exciting to keep learning with the community and see how people around the world are building on top of Semantic Kernel and using it in their organizations to solve real-world business problems.
We will continue to listen to the community and build with them features that will empower the productivity gains we envisioned from the start.
Could you drill down into the different components of the SK architecture and their relevance to LLM-based applications?
Semantic Kernel is a platform that enables developers to create powerful and complex AI applications using large language models (LLMs) as building blocks. It allows developers to leverage the capabilities of LLMs without having to write complex code or train custom models. Semantic Kernel at its core is an orchestration engine (the kernel) and hooks to multiple other systems.
There are a few core components in Semantic Kernel:
Kernel
Planner
AI Skills
Memories
But it all starts with an Ask from a user (a problem or task that needs to be completed). An ask from a user might be, “summarize this text, create a Word document from the summary, email my boss a link to the Word doc and remind me to follow up on the email in a week”
Planner: When the Ask comes into the service, that Ask is routed from the kernel to a component called the “Planner.” The Planner’s job is to find AI skills that can be used to help the user complete that Ask. For the example above, the planner would find the Summarize Skill, Word doc creation skill, the OneDrive creation skill, the Outlook email skill and so on. When it discovers Skills that look like they can help solve the Ask, the Planner uses a LLM to determine the step-by-step execution plan using those AI Skills and many others to solve the Ask.
AI Skills: Planner uses AI Skills; we think about those in two different flavors:
Semantic
Native
Semantic is plain language “prompts” that people are familiar with from using ChatGPT. There is a lot of power in semantic skills, the more detail that is sent to the LLM the better results you get. Just like when you work with a new colleague at work. The summary skill is one that can be done as a Semantic skill, just ask the model: “I want you to summarize this text and return the results to me”.
Native skills allow a developer to use the programming languages they are already familiar with, C#, Python, Java, Typescript, and others to create a mini application that has one or more input parameters, executes code and returns an output. These types of skills are ideal for the remaining steps in the example above: calling Microsoft Graph, creating emails, and looking up who the user’s manager is.
Memories: The final step in the process is to execute the steps (in order). The more context you give a LLM the better results you will get, this is where memories come in. Memories allow you to add context to LLM that are relevant to the AI skill. This could be context about who I am as a user, what my email writing style is like, what goal I am working on or even live data using connectors. One Ask could result in many AI skills that need to be executed to meet their user’s goal. The AI skills are automatically chained together and the output from one can be passed into the input of the next one before the results are returned to the user just like in the example.
Knowledge augmentation is one of the main challenges for building foundation models-based applications. SK includes the notion of Connectors that seems to target this area. What are some of the best practices for knowledge augmentation in LLM applications?
You hit the nail on the head here. Many people forget that the models are pretrained and they don’t have the context about what you are trying to achieve. There is an abundance of data we all have access to that is piling up and siloed into different applications and services. Using a combination of data connectors to get into that live data, databases to store basic user context along with vector databases to search, discover and find related items is critical move into a 10x or 20x productivity world.
The other powerful think about connectors bring to users is the other side of the equation. Context is particularly important to help draft emails, write reports, gain insights from data; and there are many scenarios where I need help to get all my work done. The other side is using data connectors to complete my tasks for me. If I need a meeting setup with 3 colleagues, I need to go into Outlook, pull up the calendar, enter in their names, search around to find the first timeslot that looks acceptable to all of us. Using Microsoft Graph, I can create an ask that says: “Book a meeting in the next week about X with these people and include a link to this document in the meeting body” and can be done for me to review and approve.
Hybrid programming using natural language and traditional programming languages is certainly one of the novel ideas with the emergence of foundation models. Could you please provide some examples of use cases that can be unlocked by this form of hybrid, semantic programming?
Semantic Kernel and AI skills can be used to automate and simplify tasks that involve processing text documents, such as creating educational content or querying contract information. Here are two examples of how our customers have benefited from using Semantic Kernel and AI skills for these use cases:
One of our customers is a company that creates learning materials for children to help them augment their education. They used to have a manual and time-consuming process for creating curriculum, which involved multiple teams and steps. With Semantic Kernel and AI skills, they were able to streamline and accelerate this process by using natural language commands or queries. They could pull in the curriculum text from their data source, generate a set of relevant questions and answers for each topic using an AI skill, auto-tag each question based on the sub-topics using another AI skill, and push all the questions and tags back into their core systems. Now they are creating high-quality and engaging learning content in minutes instead of days, while reducing errors and inconsistencies.
Another common use case among our customers is contract Q&A. Many of our customers have multiple contracts that they need to manage and understand for various purposes, such as compliance, negotiation, or service delivery. They used to have to search through lengthy and complex documents manually or rely on experts or lawyers to answer their questions. They are loading their contracts into memory systems, asking questions about the contracts using an AI skill that understands natural language and legal terms, and get accurate and concise answers from the relevant sections of the contracts.
What are some of the other foundational capabilities that are relevant for building LLM-based applications in the real world?
Besides memories, data and AI skills, there are two more elements that are required to scale up this technology – effective user interfaces and responsible AI guardrails.
Semantic Kernel can handle the query and deliver the results, when the results are a long list of CRM records based on the query, the app builder will need to make them meaningful to the end user. UI designers play a vital role in this AI era we live in. They need to ensure that the user is not overwhelmed with all the data that comes back, but also use AI to rank or filter the most relevant results for the user's needs.
Adding safeguards from the user's input to the output of the process is an essential part of a real-world deployment. Making sure that the query and results are not harmful, offensive, or biased and that they align with human and corporate values and goals is critical. This is what we mean by responsible AI. We recently announced the addition of two new open-source tools designed to make the adoption of responsible AI practices more practical.
💥 Miscellaneous – a set of rapid-fire questions
What is your favorite area of AI research outside generative AI?
I am extremely excited about applied AI in the biomedical field. With an aging world population as well as new variants of diseases coming to us rapidly, I am excited about a world where medicine can be specialized per patient. For instance, I would love to see AI help find a cure for Parkinson's disease, which affects my father and millions of other people around the world every year. Diseases that impact a small population don't get the funding today to find cures, I am hopeful that with AI and additional computing power we can find cures at a much faster rate.
SK has some similarities with LangChain. How do you see the strengths and weaknesses of each project in the foundational programming space?
We’re fully supportive of other open-source projects in this area. Harrison and the community around LangChain have done some great work helping people ramp up on AI and learn which components are valuable in this new age. We introduced Semantic Kernel to help explore the space considering how much there is to learn w/ programming with LLMs. Semantic Kernel is focused on enterprise class and scale AI solutions that companies can trust and depend on. We have been onsite with customers since we launched publicly, almost every week, learning about their needs and listening to feedback that will help them move their solutions into production to help improve their customer experiences and employee productivity.
We are looking forward to learning from and collaborating with LangChain and others in the future!
Foundation models have the potential of transforming the entire development tool space. In your opinion, what other areas of the development stack do you think will be transformed by this new paradigm?
Testing/debugging and deployment/maintenance are two aspects of development that will be radically transformed by foundation models.
I envision a world where testing and debugging are done automatically. Not only running tests but also fixing errors. When tests fail or bugs are detected, systems should be able to generate PRs with the corrected code for developers to review. The developers can check and edit the changes as needed. As these advances, I imagine a scenario where, in production, a new mobile phone OS introduces some change that affects some pattern, and the code adapts itself on the spot to work with the new change while also maintaining compatibility with existing and previous versions across OSs and frameworks.
Could you please describe the future of programming using foundation models in the two, five, and ten years horizon?
With AI innovation coming in so rapidly over the last 6 months, I think it will take a little time for people to truly understand what they can do with it. ChatGPT has made AI known to the world and people are starting to think about chat-based use cases that can help them be a little bit more productive. I see this continuing for the next few years. I believe semantic AI skills will be at the forefront of during the next few years.
Past that timeframe is when mass scale adoption will start to occur, and models will continue to advance. I could see semantic AI skills being created, as needed, by or in the model along with most of the native skills not being needed. APIs could be called between models to pull and push data as needed.