🎙 Judah Phillips / Squark about No-Code Predictive Analytics
Challenging field of predictive analytics and the future of data science
Getting to know the experience gained by researchers, engineers, and entrepreneurs doing real ML work is an excellent source of insight and inspiration. Share this interview if you find it enriching. No subscription is needed.
👤 Intro / Judah Phillips
Tell us a bit about yourself. Your background, current role and how did you get started in machine learning?
Judah Phillips (JP): I’m an entrepreneur who started working in software in the late ’90s. When the dotcom bubble burst, I built my own company, saving enough money to go to grad school (the USA makes you pay). Afterward, I checked the box working in big brands. Then took the dive and started up some other businesses. Somehow I found the time to write three books, join the faculty at a couple of schools in the USA (Babson and BU), creating a few courses for them. I also did a stint as a Harvard I-LAB VIP and joined other boards, such as the University of Massachusetts Humanities and Fine Arts Advisory Council. Having had my fill of consulting, I wanted to do a SaaS company. So we started Squark.
The genesis of Squark was the realization from about 2008-2016 that to do advanced analytics was too expensive, hard, and time-consuming. Not enough trained people existed to fill the gaps needed in business. The software was esoteric; the algorithms were weak; the hardware was costly; bosses wouldn’t cover training. It was a pain.
I was fortunate to see statistics work. When it worked, it worked well. But it was generally challenging, complex, and slow, with too much data work and not enough outcomes. So I saw a future market to automate it all. I used to say, no one is creating software like Squark. It’s a new category. When the analysts cover this new type of software, we will be in those reports. I was correct. When Forrester finally wrote about “AutoML” and later “automation-focused predictive analytics and machine learning (PAML),” we were in those reports. But like all ideas, other people thought similarly, so a few other vendors were doing similar things in those early reports. However, I knew then what we have proven: Squark has the right balance between power and ease of use.
🛠 ML Work
Squark focuses on the challenging field of predictive analytics and, to make things more interesting, you guys take a no-code approach to it. Tell us about the inspiration for the Squark platform, and its current capabilities.
JP: Long ago, when there was no such word as data science and big data wasn’t a thing people talked about, I worked on teams that did predictive analytics and used algorithms. I even recall KNN for concept extraction being used at my first startup in 1997.
So I watched the world about 5 years ago and thought of the software cycles I had seen. “Web analytics” in the ’90s went from counting log files with UNIX utilities where we manually removed robotic traffic. Today anyone can get Google Analytics and a rather powerful capability for online measurement. In the ‘00s, I was doing a lot of BI. Talking spindles. Talking copper. Talking about data modelers and huge complex implementations to build cubes. ROLAP. Today you can do some of that same BI work, in context, within a few minutes via DOMO, Tableau, Qlik, Power BI. These analytics concepts went from being engineering and code-heavy to being abstracted into a user experience. Sure, sometimes you still need to start with a fact table too. But the idea was to create Excel for AI. Lotus 123 for ML. Make AI as simple as a spreadsheet.
In the late ‘10s and into the ‘20s, I became convinced that ML and AI were too hard for clients of one of my companies. I consulted with these global brands on ML strategy and saw the platforms and international consultancies selling expensive services and solutions. I wanted to automate the ML work and do it so that the user could ask predictive questions of any data for any client. And they could simply connect data and then get an instant answer. All the hard data science work was behind the click. I figured if the tool could do in 5 minutes what took two months, then it could be sold for the cost of one month’s work, and everyone would be happy with the results. I jest, but that was the idea.
Squark is an end-to-end automated AI system that focuses on answering supervised machine learning questions. Model classes include classification, regression, time series, clustering, and more. We offer it as a SaaS where users log in and squark their data. We also support other modalities for on-prem deployments from bare metal on up or VPC deployments. You can use Squark’s API microservice or export the code. It’s a Linux app that runs Kubernetes, Docker, and is effectively a drop-in AI infrastructure, if you’d like. Or just live the dream, and log into the SaaS. Use our intelligent connectors to all sorts of systems (Azure, Snowflake, Big Query, Amazon, and many more) and let us make the data better. Squark will auto-clean, auto-prep, auto-feature engineer, and auto-feature select. This includes NLP and date factorization and all sorts of proprietary goodness, including rich explainability using SHAP and more globally, regionally, locally. Schedule drifting models and replace them. There’s a lot more to the platform too. It’s why many leading companies are using it and even replacing unicorns in the space.
We help analytics and data science teams in mid-market and enterprise companies. Analytically-inclined executives buy us, and their teams use us in several industries: media, SaaS, gaming, logistics, healthcare, finance, insurance, and many more. We focus on a clear line of sight to value, which helps drive business adoption, and users love the fact they have a cutting-edge tool to use. If you want to learn more, reach out.
In recent years, there have been massive advancements in ML model generation areas, such as neural architecture search (NAS) and AutoML. How do you see the influence of these types of techniques in low-code or no-code ML platforms?
JP: I call Squark “auto AutoML” just for fun. We go so far beyond AutoML, it’s almost not fair to the other companies to call us that. We code research advancements as they come out and compare them in our academic labs. Research that works well forms a basis for commercialization when we see demand in our target market. In other cases, we take what already exists in open-source and build on top of it. Other times we simply start from scratch, like how we feature select or do NLP explainability or auto prep and engineering. We quickly realized long ago that simply skinning open-source and using R wouldn’t really deliver the results we desired, so we try to code anything we can and judiciously use open-source. This includes writing our own services in our SaaS infrastructure. Our massively parallelized Squarkitecture runs on NVIDIA GPUs or CPUs with load balancing and auto-scaling. The infrastructure alone, excluding all the Squark ML core engine goodness, is a cloud investment that would be hard and expensive for most companies even to begin to replicate.
Fast-forwarding 3-5 years, how do you see the balance between traditional high-code ML development, low-code, no-code ML solutions, and what do you think the profession of data science will look like in 10 years?
JP: You do have to do both. It’s not an ‘us’ or ‘them’ thing. Some work has to be automated. Fifteen thousand stores need forecasts every day on changing data that drifts. It has to be automated to stay competitive. Yet, the hard-hitting, new problems will still require people with techniques guided by people with domain expertise. And everything still needs to be maintained and evolved by humans. The profession will only grow in importance and value, but I think it won’t always be called “data science.”
Call me old school. But it’s always been “analytics” to me. I remember a time when I spoke at a conference in 2010 at a tradeshow. The theme of the show and topic was “advanced analytics.” In 2010, few people used the term “data science.” It was still vocabulary, but the concepts weren’t. Meanwhile, I had a team of SAS programmers we called the “predictive analytics” team. So 11 years ago, at that show, I asked how many “data scientists” there were in the crowd. One person raised their hand of 300 media analysts – many of whom had stats, data, and programming chops. Today everyone is a data scientist in a crowd like that. So I think in the future, the names of things in the industry will keep changing. Clearly, new techniques will continue to develop and new algorithms released. There’s going to be a lot more ensembling going on, for sure, no matter what we call it. Lots more APIs. But the game is still the same. Scientific principal works. Math works. Process is important. People’s perceptions of analytical delivery matter. Creating value however you define it and taking action via the data will still be what people really care about. So in that context, what I think will happen is 80% of all data science work will be entirely automated. And as a result, there will be fewer data scientists. Those that still do it like today will be highly specialized and work with domain experts. We will be creating data experiences from the building blocks of today’s “data science,” likely procedurally using API microservices. Each node in the data experience may have an underlying AI model that self-refreshes based on evidence it is working (or not). Creators will call these best performing nodes in a common way to perform the desired function on that node that feeds the input to another node, and so on until the data experience is created in, out, and across the metaverse. Not just decision intelligence with multi-linked models, but something larger and next level, perhaps called experience intelligence created by all the automated decisions being made by ML to improve business and lives.
🎉 Special offer for TheSequence readers*
💥 Miscellaneous – a set of rapid-fire questions
Favorite math paradox?
My next investment round.
What book would you recommend to an aspiring ML engineer?
Exploratory Data Analysis, by Edward Tukey
Statistical Machine Learning and Data Mining, by Bruce Ratner
The Glass Bead Game, by Herman Hesse
Is the Turing Test still relevant? Any clever alternatives?
Yes. Feigenbaum and Winograd have some cool stuff here.
Does P equal NP?
I don’t think any human knows the answer, but some child alive today may figure it out with a novel approach that doesn’t yet exist. And the person who solves will likely be using ML, as I understand that is how the Conway knot was solved. Augmented humans with machines.