vLLM Office Hours - June 20, 2024
Happy one-year anniversary vLLM! In this session, we covered what's new in vLLM v0.5.5, including FP8 weights and activations, speculative decoding, and OpenAI Vision API support. We dug deeper into various topics, including new quantization kernels, GPU architecture compatibility, embeddings in the OpenAI API, optimization tips for GPTQ configurations, and handling concurrent requests in the API server. For more details, you can access the session slides here: https://docs.google.com/presentation/d/1BAGbJ-aGYrAMUugReF758u5JUT9EAJLn
Sign up for bi-weekly vLLM office hours: https://hubs.li/Q02Y5Pbh0