Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL

5.331 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL

We will fine-tune VLMs to chat with images using Python! Specifically, we'll fine-tune the Qwen2-VL-7B-Instruct model using LoRA and 4-bit quantization. GitHub below ↓

Want to support the channel? Hit that like button and subscribe!

GitHub Link of the Code
https://github.com/uygarkurt/Fine-Tune-VLMs

Qwen2-VL-7B Model
https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct

Dataset
https://huggingface.co/datasets/HuggingFaceM4/ChartQA

What should I implement next? Let me know in the comments!

00:00 Introduction
00:50 Install Necessary Libraries
01:49 Imports
03:40 Hyperparameter Definitions
08:12 Dataset Preparation
22:38 Load VL Model and Processor
25:06 Sample Inference
32:18 Configure LoRA
34:05 Training Arguments Configuration
35:42 Data Collator
39:03 Configure Trainer
39:55 Start the VLM Training
40:42 After Training Inference and Evaluation

References
https://huggingface.co/learn/cookbook/en/fine_tuning_vlm_trl
https://huggingface.co/docs/trl/en/sft_trainer
https://huggingface.co/docs/transformers/main/en/tasks/visual_question_answering

Buy me a coffee! ☕️ 
https://ko-fi.com/uygarkurt					

Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL

Nhạc Theo Chủ Đề

Liên kết website