Train and Deploy a Multimodal AI Model: PyTorch, AWS, SageMaker, Next.js 15, React, Tailwind (2025)

Train and Deploy a Multimodal AI Model: PyTorch, AWS, SageMaker, Next.js 15, React, Tailwind (2025)

33.925 Lượt nghe
Train and Deploy a Multimodal AI Model: PyTorch, AWS, SageMaker, Next.js 15, React, Tailwind (2025)
Source code AI Model: https://github.com/Andreaswt/ai-video-sentiment-model API SaaS: https://github.com/Andreaswt/ai-video-sentiment-saas Discord & More: https://andreastrolle.com Hi 🤙 In this video, you'll learn how to train and deploy a multimodal AI model from scratch using PyTorch. The model will accept a video as its input, and predict its sentiment and emotion. When training the model, you'll build features like text, video, and audio encoding, multimodal fusion, and emotion and sentiment classification. After training and deploying the model, you'll build a SaaS around your trained model, where users can run inference on their videos through your API. You'll set up invocation of the deployed model with SageMaker Endpoints, and manage the monthly quotas users have. The SaaS will be built with technologies such as Next.js, React, Tailwind, and Auth.js and is based off of the T3 Stack. You'll be able to build along with me from start to finish. Excalidraw drawing + Model files (with and without class imbalance fix) https://drive.google.com/drive/folders/1f5tOlIixDUeYtzzIdctQRb_-qllzAMQd?usp=sharing Dataset MELD: https://affective-meld.github.io/ Features 🎥 Video sentiment analysis 📺 Video frame extraction 🎙️ Audio feature extraction 📝 Text embedding with BERT 🔗 Multimodal fusion 📊 Emotion and sentiment classification 🚀 Model training and evaluation 📈 TensorBoard logging 🚀 AWS S3 for video storage 🤖 AWS SageMaker endpoint integration 🔐 User authentication with Auth.js 🔑 API key management 📊 Usage quota tracking 📈 Real-time analysis results 🎨 Modern UI with Tailwind CSS 💲Costs + How to follow along for free One full training job run costs ~15 USD. When deploying the endpoint it’s ~1.5 USD per hour of uptime. S3 is really cheap. IAM roles, users etc are free. If you want to not use money: -Don’t create the S3 bucket Then you also won’t need the EC2 instance for downloading the dataset -Don’t start a training job, but download my provided model from Google Drive to play around with locally -Don’t deploy the endpoint. When building the SaaS part, use the dummy data I write in the video before calling the actual endpoint -You can look into using free tier instances from AWS, so you can play around AWS -You can of course still follow the video, learn the concepts and code along 📖 Chapters 00:00:00 Demo 00:02:14 Project initialization 00:06:38 What we’ll build 00:19:01 Training theory 00:42:42 Fitting 00:47:45 Representing data in ML 00:50:00 Our model 01:13:31 Extracting dataset 01:18:03 Dataset class architecture 01:25:36 Dataset class implementation 02:27:13 Model architecture 02:44:21 Model implementation 03:41:54 Logging with TensorBoard 04:06:25 Counting model parameters 04:12:24 Train script implementation 04:35:51 FFMPEG installation on instance 04:45:15 SageMaker training job creation script 04:50:04 AWS infrastructure 05:11:02 Downloading dataset to S3 with EC2 05:22:40 Creating training jobs 05:35:36 Class weights for class imbalances 05:55:20 Checking TensorBoard logs 06:01:18 Inference script 06:29:00 Local inference 06:33:00 Comparing with state-of-the-art models 06:35:55 Deploying endpoint 06:56:10 IAM user for endpoint invocation 07:00:28 Initializing Next.js project 07:06:19 Auth 07:58:00 Dashboard setup 08:03:17 Database schema 08:14:55 Docs part of dashboard 08:41:30 Endpoint for S3 signed url 08:54:14 Endpoint for inference 09:04:02 Invoke endpoint 09:08:39 API demo in dashboard 09:56:38 Deploying endpoint 09:57:44 End-to-end testing and debugging 10:05:02 Successful E2E example 10:05:32 Fixing up docs 10:08:23 Timeout issue 10:11:06 Closing notes