π Iskandar Sitdikov/Provectus: Healthcare has it all: NLP, computer vision, recommendations, and a whole lot more
TheSequence interviews ML practitioners to merge you into the real world of machine learning and artificial intelligence
There is nothing more inspiring than to learn from practitioners. Getting to know the experience gained by researchers, engineers and entrepreneurs doing real ML work canΒ become a great source of insights and inspiration. Please share this interview if you find it enriching. No subscription is needed.
π€Β Quick bio / Iskandar Sitdikov
Tell us a bit about yourself. Your background, current role and how did youΒ getΒ started in machine learning?Β
Iskandar Sitdikov (IS): I amΒ Principal AI/ML Solutions Architect at Provectus.Β I lead ML/MLOpsΒ and research efforts for the company and a number of clients ranging from healthcare toΒ adtech.Β Β
My first degree was management, but I quickly realized that I want to take an applied sciences path, so I got myΒ bachelor's in mathematicsΒ and myΒ master's in computer science.Β
IΒ actually rememberΒ theΒ moment I got interested in ML. It was about 2010 and I was talking to my professor andΒ aΒ classmate, and the topic of artificial neural networks popped up in our conversation. And after dozens of projects,Β conferences,Β and meetupsΒ I amΒ talking to you right now.Β
π ML WorkΒ
ProvectusΒ works withΒ large organizationsΒ helping toΒ incorporateΒ MLΒ as part of their digital infrastructure. What are theΒ main challenges that organizationsΒ encounterΒ when starting to buildΒ MLΒ solutions in real world environments?Β Β Β
IS: That is a complex question, and the answer isΒ multifaceted.Β Simply building a model is not enough. Since models areΒ theΒ very importantΒ components of larger systems, their requirements should be the same, and even more so given the specifics of the machine learning domain.Β Β
We need to understand the modelsβ behavior and be able to experiment and keep a record of all actions executed by the modelsΒ βΒ what we call the ML environment. There are two main challengesΒ when building such environment:Β
Selecting the right tools for building the environment;Β
Fostering an engineering culture for using tools inΒ ML/data science teams.Β ThisΒ couldΒ result inΒ a big learning curve, primarily because the concepts of environment components are still maturing, and not all ML specialists have enough experience with them.Β Β
Also, environments are not built in a vacuum; they rely on existing infrastructures. Some of them are still in a nascent stage. What matters here is how the data gets into the system, and how it is processed, stored,Β cataloged,Β and discovered.
There is a wide and growing gap between machine learning research and practical applicationsΒ in the current market and many of the techniques that are published in research papers are simply not practical in real-world applications.Β What are some of the things data science teams should consider when trying toΒ adoptΒ cutting edgeΒ ML research methods in practical applications?Β Β
IS: The beauty and shortcoming of researchΒ are thatΒ it does not have to be linked to real-world applications. The main thing is to continue generatingΒ ideas and discussingΒ them inΒ researchΒ articles. This is how we build the legacy of our industry.Β Β
Not all ideas will get recognition, but certain ideas will generate others, fueling an ongoing evolution in technology.Β Reading papers and experimenting can give you a new perspective on an existing problem and help you solve it.Β Neural networks are aΒ goodΒ example. At the timeΒ this technique was created, itΒ wasn'tΒ directly applicable to real-world situations.Β Β
All industries have their unique laws. We have deadlines,Β KPIs,Β ROIs, and other standard metrics used in business, but thatΒ doesn'tΒ mean we should follow a single well-trodden path.Β We need to be a little more cunning, experiment more, collect metrics and track everything. We need to add new developments to robust proven methods and test them against existing options.Β ItβsΒ important toΒ shareΒ theΒ findingsΒ andΒ experimental methods as open-source projects.Β
Our advice is to experiment with new methods, but conduct the experiments in a controlled manner, using a carefully created environment.
ProvectusΒ seems to do a lot of work in the healthcareΒ space, which I consider one of the most fascinating environments for ML solutions. WhatΒ makes healthcare so unique and challenging when comes to ML solutions?Β Β
IS: I love the healthcare industry because it has a direct impact on people's lives. But its unique role is also its biggest challenge. We all know the Peter Parker principleΒ βΒ with great power comes great responsibilityΒ βΒ and it applies perfectly to healthcare. Mistakes in healthcare are often costly, so the requirements for high-quality and robust solutions are much higher.Β Β Β
There is also aΒ perceptionΒ of ML models as black boxes, and this in part prevents machine learning from making its way into highly regulated areas like healthcare. Breaking down these barriers is partly resolved by innovations that allow us to "explain" what is going on inside models, and why they behave as they do.
Computer visionΒ seems to dominateΒ the headlines when comes to ML solutions in healthcare. What are otherΒ areas of ML that are gaining rapid adoption in healthcare environments?Β Β
IS: It'sΒ true, computer vision does occupy a large field in healthcare. A lot of procedures in medicine rely on the visual inspection of images. These images can be retinal scans or CT-scans of the lungs, where the main task is to classify diseases or highlight malignant areas. As it happens, we have learned how to execute these tasks with machine learning. Read this case study to learn how:Β ML Helps Combat Preventable Vision Loss in Infants.Β
Document processing is another growing concern in healthcare. In fact, it is one of the largest tasks the industry hasΒ encounteredΒ to date. Document processing encompasses multiple disciplines, including ML, automation, human interaction interfaces, and others.Β
For example, medical report processing involves digitization of the document, its direct processing via machine learning tasks like recognition of texts, tables, forms, and images, and human-in-the-loop for continuous verification,Β storage,Β and integration.Β Β
Machine learning tasks for a single medical report would include OCR (fetching text),Β NLPΒ (semantic understanding for data extraction from raw text data, ontology mapping, etc.), and object detection/segmentation (for medical images in the report). So, to answer your question, healthcare has it all: NLP, computer vision, recommendations, and a whole lot more. Watch this webcast to learn what it takes toΒ Choose the Right Document Processing Solution for Healthcare.Β
There areΒ manyΒ aspectsΒ of ML solutionsΒ such as model compression,Β modelΒ servingΒ or real-time model execution that are oftenΒ ignored untilΒ it'sΒ too late. InΒ your experience,Β what are the top 3-5 practical components of modern ML solutions that most data science teams tend to overlook?Β
IS: Lack of market research is a big one. Whether you are building your own solution orΒ wantΒ to buy an existing solution, youΒ needΒ quality market research. If you are building a custom solution, youΒ have toΒ consider its advantages and disadvantages compared to similar solutions already on the market. Forrester and Gartner doΒ a great jobΒ in outlining markets, but youΒ need toΒ dig deeper and gather specific evaluation metrics for your use case. For example,Β how GoogleβsΒ computer visionΒ products are compared to yours in terms of accuracy of extraction information from images.Β Β
MLOpsΒ elementsΒ areΒ keyΒ inΒ ML solutions.Β Aspects such as modelΒ monitoring, experimentation tracking, deploymentΒ areΒ frequentlyΒ overlooked byΒ manyΒ companies. To make your solution transparent,Β reproducible,Β and understandable for everyone and empower parallel exploration of multiple use cases youΒ mustΒ check those boxes. Watch this webcast to learn more aboutΒ MLOps and Deploying Reliable ML Models in Production.Β
Documentation is critical.Β This isΒ one of the most important aspects of any ML project. ML straddlesΒ onΒ the border between software engineering and research, so youΒ need to make extra efforts to preserve information about how to run your modelΒ whileΒ also explain theΒ methodologyΒ behind it. And as always, follow engineering best practices; it will never hurt anyone.Β
Subscribing, you support TheSequenceβs mission to simplify AI education, one newsletter at a time. You can also give the subscriptionΒ as a gift.
π₯ MiscellaneousΒ β a set ofΒ rapid-fireΒ questionsΒ Β
Is the Turing Test still relevant? Is there a better alternative?Β Β
IS: I think, itΒ is still relevant, but with some modifications. We can call a modified version an alternative, but I prefer to respect the author by calling the ultimateΒ AIΒ test a Turing test :)Β Β
Favorite math paradox?Β Β
IS: Since we are talking about Turing,Β letβsΒ pick the Turing paradox also known as the Quantum Zeno effect. It is an effect when you can βarrestβ the time evolution of particles by measuring them.Β Technically, it isΒ physics, but to me, math and physics are a dynamic duo.Β Β
Any book you would recommend to aspiring data scientists?Β
IS: βLinear Algebra and Analytic Geometryβ by Ilyin and Kim, which is one of the first books you study as a mathematics faculty student at Moscow State University.Β Β
Third time for Turing, I guess. Try βComputing Machinery and Intelligenceβ by Alan Turing if you feel philosophical today.Β
Overall,Β I think youΒ should read as much literature on the sciences as possible. Some of it will fit yourΒ perceptions, and some will challenge them.Β Β
Is P equals NP?Β Β
AΒ million-dollarΒ question! For now, only Gregori Perelman is qualified toΒ answerΒ those types of questionsΒ ;)Β