unexpected EOF error

快速结论：在 Ollama v0.1.3 中，某些模型（如 mistral、zephyr）第二次 prompting 时触发 Error: error reading llm response: unexpected EOF，优先排查 GPU VRAM 是否耗尽。

用户运行 ollama run zephyr，第一次 prompt 正常，第二次 prompt 后报错。日志显示 CUDA 出现 out of memory 错误。

Error: error reading llm response: unexpected EOF

同时在 journalctl -u ollama.service 或 ~/.ollama/logs/server.log 中出现：

CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5487: out of memory
current device: 0

根据日志，第二次请求时 GPU VRAM 不足（out of memory）导致 llama.cpp 的 CUDA 后端崩溃，Ollama 收到来自底层 runner 的 unexpected EOF。用户反馈从旧版本升级到 v0.1.3 后出现此问题，可能原因包括：

检查 systemd 日志或 server.log，确认是否存在 out of memory 报错。
如果显存不足，可优先尝试：
- 降低模型参数规模（如从 7B 降到 3B 模型）。
- 通过环境变量限制 GPU 层数（如未在 Issue 中提供具体命令，请参考 Ollama 官方文档设置 OLLAMA_GPU_LAYERS）。
如果问题始于 v0.1.3，可降级到之前正常工作的版本：
```
# 请根据平台手动下载旧版二进制替换（例如 v0.1.2）
```
重启 Ollama 服务清理显存：systemctl restart ollama。

重新运行 ollama run zephyr，连续提问两次以上，若不再出现 unexpected EOF 且日志中无 CUDA OOM 报错，则问题解决。

AI 工具推荐

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

发表回复取消回复