TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 453: Distillation Across Different Modalities

Edge 453: Distillation Across Different Modalities

Cross modal distillation is one of the most interesting distillation methods of the new generation.

Dec 03, 2024
∙ Paid
4

Share this post

TheSequence
TheSequence
Edge 453: Distillation Across Different Modalities
1
Share
Created Using Midjourney

In this issue:

  1. Introducing cross model distillation.

  2. UC Berkeley’s paper about cross modal distillation for supervision transfer.

  3. Hugging Face’s Gradio for building web-AI apps.

💡 ML Concept of the Day: Cross Modal Distillation

Most of the distillation techniques explored during the course of this series have been focused on a single modality. However, the question of having distillation models in which the teacher transfers knowledge from one modality to a student primary built for another modality is an interesting one.

Cross-modal distillation enables the transfer of knowledge between models operating on different data modalities. This approach extends the concept of knowledge distillation, originally proposed for model compression, to scenarios where the teacher and student models process fundamentally different types of input data. The core idea is to leverage the rich representations learned by a model in one modality (the teacher) to guide the learning process of a model in another modality (the student), even when paired data is scarce or unavailable.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share