CMU Advanced NLP Spring 2025 (21): Multimodal Modeling I

CMU Advanced NLP Spring 2025 (21): Multimodal Modeling I

376 Lượt nghe
CMU Advanced NLP Spring 2025 (21): Multimodal Modeling I
This lecture (by Sean Welleck) for CMU CS 11-711, Advanced NLP covers: - Vision architecture basics (ViT) - Learning image representations (CLIP) - Combining with a language model