Imported GGUF (hf.co/…) thinks by default, but reports no `thinking` capability and rejects `think: true`

快速结论：此报错通常发生在从 Hugging Face (hf.co/…) 导入的 GGUF 模型上，模型本身包含思考（thinking）模板且运行时会默认输出思考内容，但 `/api/show` 能力列表中没有 `thinking`，且 API 调用 `think: true` 时返回“does not support thinking”错误。优先排查导入的 GGUF 模型的 Modelfile 是否完整，需手动添加 `RENDERER` 和 `PARSER` 指令。

问题场景

用户在 Ollama 中通过 `ollama pull hf.co/…` 或 `ollama run hf.co/…` 导入并使用 Gemma-4 系列 GGUF 模型。当用户通过 API 或客户端尝试启用 Thinking 功能（设置 `think: true`）时，遇到 HTTP 400 拒绝错误，尽管该模型在默认 `ollama run` 交互中确实显示了“Thinking…”并输出了思考内容。

报错原文

{"error":"\"hf.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF:Q4_K_M\" does not support thinking"}

此外，`/api/show` 返回的能力列表中缺少 `thinking`：

capabilities: ["tools","completion"]

原因分析

这是由导入的 GGUF 模型 Modelfile 不完整所致。从 Hugging Face 直接拉取的 GGUF 模型经常缺少必要的元数据，具体表现为 Modelfile 中没有 `RENDERER` 和 `PARSER` 指令（对于 Gemma-4 模型应为 `gemma4`）。这导致 Ollama 运行时尽管能正确解析模板中的思考通道并展示思考内容，但模型的能力推断（capability inference）无法正确识别到 `thinking` 支持，从而拒绝 `think: true` 请求。

官方库中的模型（如 `gemma4:26b-a4b-it-qat`）具有完整的 Modelfile，因此无此问题。对比测试表明，同一基座模型的官方版本（含 `thinking` 能力）和 HF 导入版本（不含 `thinking` 能力）行为不同，证实问题出在导入路径而非模型权重。

环境排查

Ollama 版本：0.30.11（Issue 中报告的版本，但可能影响其他版本）
操作系统：macOS (Apple Silicon)
模型：任何 Gemma-4 系列的 HF 导入 GGUF 模型（如 `hf.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF:Q4_K_M`、`hf.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF:UD-Q4_K_XL`）
确认导入模型的 Modelfile 中是否包含 `RENDERER` 和 `PARSER` 指令

解决步骤

运行以下命令，导出当前导入模型的 Modelfile 内容：
$ ollama show --modelfile hf.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF:Q4_K_M > Modelfile
检查导出的 Modelfile 文件。如果缺少用于 Gemma-4 模型的渲染器和解析器指令，请追加以下两行：
$ echo RENDERER gemma4 >> Modelfile
$ echo PARSER gemma4 >> Modelfile
使用修正后的 Modelfile 创建一个新模型（注意模型名称可以自定义）：
$ ollama create gemma-4-12B-coder-fable5-composer2.5-v1:Q4_K_M
确认新模型的能力列表中已包含 `thinking`：
$ ollama show gemma-4-12B-coder-fable5-composer2.5-v1:Q4_K_M
输出应包含：
```
Capabilities
    completion
    tools
    thinking
```

验证方法

重新创建模型后，使用 API 测试 `think: true` 是否被接受：

curl -s localhost:11434/api/chat -d '{"model":"gemma-4-12B-coder-fable5-composer2.5-v1:Q4_K_M","messages":[{"role":"user","content":"hi"}],"stream":false,"options":{"think":true}}'

预期响应中将包含非空的 `thinking` 字段，而不是 HTTP 400 错误。同时，`ollama run` 交互中的 Thinking 行为应保持一致（默认思考、可设置 `nothink` 模式）。