Multimodal AI: LLMs that can see (and hear)

9.751 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Multimodal AI: LLMs that can see (and hear)

Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.com/shaw

Multimodal (Large) Language Models expand an LLM's text-only capabilities to include other modalities. Here are three ways to do this.

Resources:
📰 Blog: https://medium.com/towards-data-science/multimodal-models-llms-that-can-see-and-hear-5c6737c981d3?sk=d0897db8457c91706170d3043ebdbcf0
▶️ LLM Playlist: https://youtu.be/eC6Hd1hFvos
💻 GitHub Repo: https://github.com/ShawhinT/YouTube-Blog/tree/main/multimodal-ai

References:
[1] Multimodal Machine Learning: https://arxiv.org/abs/1705.09406
[2] A Survey on Multimodal Large Language Models: https://arxiv.org/abs/2306.13549
[3] Visual Instruction Tuning: https://arxiv.org/abs/2304.08485
[4] GPT-4o System Card: https://arxiv.org/abs/2410.21276
[5] Janus: https://arxiv.org/abs/2410.13848
[6] Learning Transferable Visual Models From Natural Language Supervision: https://arxiv.org/abs/2103.00020
[7] Flamingo: https://arxiv.org/abs/2204.14198
[8] Mini-Omni2: https://arxiv.org/abs/2410.11190
[9] Emu3: https://arxiv.org/abs/2409.18869
[10] Chameleon: https://arxiv.org/abs/2405.09818

--
Homepage: https://www.shawhintalebi.com

Introduction - 0:00
Multimodal LLMs - 1:49
Path 1: LLM + Tools - 4:24
Path 2: LLM + Adapaters - 7:20
Path 3: Unified Models - 11:19
Example: LLaMA 3.2 for Vision Tasks (Ollama) - 13:24
What's next? - 19:58					

Multimodal AI: LLMs that can see (and hear)

Nhạc Theo Chủ Đề

Liên kết website

Multimodal AI: LLMs that can see (and hear)

Những bài liên quan

Chưa có bài liên quan nào!