🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods

Model-intrinsic vs. post-hoc interpretability, activation atlases visualizations and TensorBoard.

Dec 05, 2022

∙ Paid

In this issue:

We discuss the differences between model-intrinsic and post-hoc interpretability methods.
We present the research behind activation atlases, a brilliant ML interpretability method developed by Google and OpenAI.
We discuss TensorBoard which is a mandatory topic when comes to ML interpretability.

Have fun ML geeking!

💡 ML Concept of the Day:Model-Intrinsic vs. Post-Hoc Interpretability Methods

In a previous edition of this series, we explored a taxonomy to understand different ML interpretability methods. There are models such as linear regression or decision trees which are intrinsically explainable. Those models are typically analyzed using interpretability techniques optimized for their specifics which is what we called model-specific interpretability. In general, model-intrinsic interpretability methods look to leverage unique characteristics of explainable models:

Simulatability: The ability of recreating the operations of a model in a reasonable time.
Decomposability: Individual parts of a model are individually explainable.
Algorithmic Transparency: Theoretical Guarantees about the specific behavior of the algorithm for a given input.

From an ML interpretability standpoint, model-intrinsic methods are very simple but also quite restrictive. It is more common to interact with ML models whose algorithmic decisions are not easily explainable. Most neural networks techniques lack the simulatability and decomposability properties. Also, most non-convex problems which are common in neural networks can’t guarantee its convergence using gradient optimization methods.

Enabling interpretability in methods such as neural networks requires post-hoc interpretability techniques which try to explain the overall behavior of the model without understanding the intricacies of the model. Post-hoc interpretability can be classified into two main groups:

TheSequence

🔮 Edge#249: Model-Intrinsic vs. Post-Hoc Interpretability Methods

Model-intrinsic vs. post-hoc interpretability, activation atlases visualizations and TensorBoard.

💡 ML Concept of the Day:Model-Intrinsic vs. Post-Hoc Interpretability Methods

This post is for paid subscribers