# Ultimate Vision Language Model Showdown: PDF to HTML Conversion Challenge
Sonnet: https://app.promptjudy.com/public-runs?runId=complex-ocr-prompt--1503547373-aws-bedrock%2Fus.anthropic.claude-3-5-sonnet-20241022-v2%3A0%232QpA4Wc9x6nALY9_4YHT3
Qwen: https://app.promptjudy.com/public-runs?runId=complex-ocr-prompt--1503547373-qwen%2Fqwen2.5-vl-32b-instruct%3Afree%23BjtBH0OmwcX_VbqMN0dQT
Mistral: https://app.promptjudy.com/public-runs?runId=complex-ocr-prompt--1503547373-mistral-small-latest%235uTSt7T4w4pPQdV1N5zyL
4-o mini: https://app.promptjudy.com/public-runs?runId=complex-ocr-prompt--1503547373-gpt-4o-mini%23oNtE_OLi0mJ67dQnbYMR2
Test setup:
https://youtu.be/ECJ3ivdKLq8?t=140
Results:
https://youtu.be/ECJ3ivdKLq8?t=538
In this comprehensive benchmark test, we push the boundaries of what today's most advanced Vision Language Models can achieve when converting complex PDF documents into semantic HTML - a critical task for financial analysis and RAG applications.
## Models Tested:
- **Commercial Models**: OpenAI's GPT-4o, GPT-4o-mini and O1, Anthropic's Claude 3.5 Sonnet, Google's Gemini 2.5 Pro
- **Open-Source Challengers**: Mistral's latest mistral-small, Qwen's qwen2.5-72b-VL and qwen2.5-32b-VL
## The Challenge:
Converting information-dense financial documents (including Apple, Google, NVIDIA, Toyota annual reports) into semantic HTML that preserves all information faithfully while remaining usable by text-only models for inference - without relying on absolute positioning.
## Key Findings:
- Claude 3.5 Sonnet takes the top spot, but Qwen follows surprisingly close behind
- OpenAI's models underperform significantly on this specific task
- Open-source models from Qwen and Mistral show impressive capabilities, beating Gemini
- Zero tolerance for hallucination - even a single wrong number resulted in a score of 0
Watch as we analyze these cutting-edge AI systems handling one of the most challenging VLM tasks: preserving complex tables, hierarchical rows, and financial data with 100% accuracy. The results might surprise you!
#AIBenchmark #VisionLanguageModels #PDFtoHTML #AIComparison #OpenSourceAI #Claude #GPT4o #Qwen #Mistral #Gemini