[Bug]: Gemma4-31B-it deployed on vLLM cannot process images in tool message

快速结论：在 vLLM 部署 Gemma4-31B-it 模型时，如果在 tool message 中传入图片，会导致 HTTP 500 错误。优先排查 vLLM 版本是否为 0.23.0 或更早版本，该问题已被 vLLM 开发团队确认并在后续版本中修复。

问题场景

用户在 vLLM 上部署 Gemma4-31B-it 模型，通过 OpenAI 兼容 API（/v1/chat/completions）发送包含图片的 tool message 请求时，服务端返回 HTTP 500 Internal Server Error。环境为 Ubuntu 24.04 + Python 3.12.3 + PyTorch 2.11.0 + CUDA 13.0，使用 8x NVIDIA A100-SXM4-80GB 显卡。

报错原文

APIServer pid=1) ERROR:    Exception in ASGI application
...
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/serve/utils/api_utils.py", line 91, in wrapper
    return handler_task.result()
           ^^^^^^^^^^^^^^^^^^^^^

原因分析

vLLM 开发团队已确认该问题（”Was able to repro it as well”），属于 vLLM 自身缺陷。可能原因包括：

vLLM 在解析 tool message 中的图片内容时，逻辑存在问题，导致模型处理流程中断。
Gemma4-31B-it 的图片处理部分与 vLLM 的 tool call 机制不完全兼容。

环境排查

确认 vLLM 版本（例如，用户报告在 vllm 0.23.0 上仍然复现）
确认 Python 版本（本例为 3.12.3）
确认 CUDA 版本及显卡驱动（本例为 CUDA 13.0，驱动 570.172.08）
确认 PyTorch 版本（本例为 2.11.0）

解决步骤

检查当前 vLLM 版本。如果版本低于 0.23.0，升级到包含修复的版本（该 Issue 已于 2026-05-28 关闭，建议升级至 vLLM 0.24.0 或更高版本）。
如果问题在升级后仍然存在，尝试在部署命令中添加 --trust-remote-code 参数并传入配置文件。
作为临时 workaround，避免在 tool message 中传递图片，改用其他方式（如将图片作为单独的 user message 传递）。
清理 vLLM 的缓存和日志，重启服务后再次测试。

验证方法

使用包含图片的 tool message 调用 API：POST /v1/chat/completions，请求体包含 tool_choice 和 tools 参数。如果返回正常 JSON 响应而不是 HTTP 500，说明问题已解决。

参考来源

vllm-project/vllm #41452

AI 工具推荐

想把多个 AI 模型放在一个入口？

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

了解 GamsGo AI

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

[Bug]: Gemma4-31B-it deployed on vLLM cannot process images in tool message