Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL

Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL

5.331 Lượt nghe
Fine-Tune Visual Language Models (VLMs) - HuggingFace, PyTorch, LoRA, Quantization, TRL
We will fine-tune VLMs to chat with images using Python! Specifically, we'll fine-tune the Qwen2-VL-7B-Instruct model using LoRA and 4-bit quantization. GitHub below ↓ Want to support the channel? Hit that like button and subscribe! GitHub Link of the Code https://github.com/uygarkurt/Fine-Tune-VLMs Qwen2-VL-7B Model https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct Dataset https://huggingface.co/datasets/HuggingFaceM4/ChartQA What should I implement next? Let me know in the comments! 00:00 Introduction 00:50 Install Necessary Libraries 01:49 Imports 03:40 Hyperparameter Definitions 08:12 Dataset Preparation 22:38 Load VL Model and Processor 25:06 Sample Inference 32:18 Configure LoRA 34:05 Training Arguments Configuration 35:42 Data Collator 39:03 Configure Trainer 39:55 Start the VLM Training 40:42 After Training Inference and Evaluation References https://huggingface.co/learn/cookbook/en/fine_tuning_vlm_trl https://huggingface.co/docs/trl/en/sft_trainer https://huggingface.co/docs/transformers/main/en/tasks/visual_question_answering Buy me a coffee! ☕️ https://ko-fi.com/uygarkurt