[Bug]: map_httpcore_exceptions while invoking query_engine.query in colab

快速结论：此报错通常出现在 LlamaIndex 通过 Ollama 本地模型在 Google Colab 中执行查询时，根源在于 Colab 环境对出站网络连接的限制导致无法连接到本地运行的 Ollama 服务。优先排查 Colab 网络策略或改在本地环境运行。

问题场景

用户在 Google Colab 中运行 LlamaIndex 官方 starter example（本地模型版本），使用 Ollama(model="mistral") 作为 LLM，BAAI/bge-small-en-v1.5 作为嵌入模型。构建索引成功，调用 query_engine.query() 时触发报错。

报错原文

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  ...
httpcore.ConnectError: [Errno 99] Cannot assign requested address

The above exception was the direct cause of the following exception:
...
  File "/content/starter.py", line 18, in <module>
    response = query_engine.query("What did the author write about?")
  File "/usr/local/lib/python3.10/dist-packages/llama_index/core/base/base_query_engine.py", line 53, in query
    query_result = self._query(str_or_query_bundle)
  ...
  File "/usr/local/lib/python3.10/dist-packages/llama_index/core/response_synthesizers/base.py", line 241, in synthesize
    response_str = self.get_response(
...

原因分析

核心原因是 httpcore.ConnectError: [Errno 99] Cannot assign requested address，它是一个网络层面的连接错误。可能原因包括：

Colab 网络环境限制：Ollama 服务需要在本地或特定端口上运行，Colab 可能阻止了出站连接到 localhost 或限制端口分配。
Ollama 服务未正确启动或端口不可达：用户可能在 Colab 中未启动 Ollama 服务，或服务绑定到错误的地址。
依赖库版本冲突：httpx/httpcore 库与 Colab 运行时网络层不兼容。

值得注意的是，LlamaIndex 查询时通过 HTTP 客户端（如 httpx）与 Ollama API 通信，因此网络库是必要的，但错误表明连接无法建立。

环境排查

Python 版本：用户使用 Python 3.10（Colab 默认）；但需确认 Colab 是否支持该版本（LlamaIndex 0.10.x 要求 Python ≥3.8）。
Ollama 服务状态：确认 Ollama 是否已在 Colab 或可访问的远程地址启动，默认端口 11434。
依赖版本：llama-index-0.10.27，httpx，httpcore 版本是否兼容。
网络可达性：从 Colab 内部是否能 curl localhost:11434 或访问目标地址。
资源限制：是否达到 Colab 进程/连接数上限。

解决步骤

确认 Ollama 服务已在运行：
在一个 Colab 单元格中运行 !ollama serve 启动服务，或确保 Colab 已正确安装并启动 Ollama。注意：Ollama 通常需要独立安装，且 Colab 环境重启后不持久。
测试 Ollama 连接：
使用 !curl -X POST http://localhost:11434/api/generate -d '{"model": "mistral", "prompt": "hello"}' 检查响应。如果失败，说明 Ollama 服务不可达，需调整配置。
指定正确的 Ollama 主机地址：
如果 Ollama 运行在本地，尝试显式设置 Ollama(base_url="http://127.0.0.1:11434")，避免默认解析问题。

增加重试机制（可优先尝试）：

在查询逻辑外层加入 retry 循环，处理瞬时网络波动。例如：

import time
for _ in range(3):
    try:
        response = query_engine.query("What did the author write about?")
        break
    except Exception as e:
        print(f"Retrying due to: {e}")
        time.sleep(2)

此方案来源于 Issue 评论，作为应急手段。

切换 HTTP 客户端库：
如果使用 httpx，尝试在代码中强制使用 requests 库（需修改 Ollama 底层通信方式，非直接支持）。此步骤属于推测性方案。
放弃 Colab，在本地执行：
如果所有步骤均失败，推荐将代码迁移到本地环境（如 Linux/Mac），确保 Ollama 与服务端在同一网络上。

验证方法

成功执行 query_engine.query() 并返回有效的文本答案，无 httpcore.ConnectError 异常。可以通过打印 response 内容确认：print(response)。

参考来源

run-llama/llama_index #12670

AI 工具推荐

想把多个 AI 模型放在一个入口？

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

了解 GamsGo AI

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

[Bug]: map_httpcore_exceptions while invoking query_engine.query in colab