标签： LLM

AI 资讯

Misc. bug: Performance from sycl with server-intel on Qwen 3.6 dropped from 32t/s to 25t/s after update server-intel-b9159

用户通过 Docker Compose 部署 ghcr.io/ggml-org/llama.cpp:server-intel 镜像，使用 SYCL 后端在 Intel Arc Pro B50 显卡上运行 Qwen 3.6 35B 模型（Q4_K_XL_MTP 量化）。镜像从 server-intel

celebrityanime
2026年 6月 21日

AI 资讯

peg-native: lazy grammar fails to prevent malformed tool-call XML (duplicate ) on Qwen3.6-35B-A3B; parser then drops the whole tool call and

用户在 llama-server （多模型路由预设）中加载 unsloth/Qwen3.6-35B-A3B （推理型 MoE 模型，UD-Q8_K_XL 量化），通过外部 --chat-template-file （社区修复的 Qwen3.6 家族模板，XML 工具调用方言）并设置 --jinja

celebrityanime
2026年 6月 21日

AI 资讯

[Bug]: Hardcoded RERANK_LIMIT logic causes API failures (400) and ignores UI Top-K settings

用户在使用 RAGFlow v0.24.0 官方镜像时，配置了 Chatbot 并启用 Reranker（例如 Cohere 或 vLLM 托管的 BGE 模型）。在 UI 中将 Top-N ( page_size ) 设置为 6，Top-K 设置为较低的值（如 10）后，执行查询时触发 400 错

celebrityanime
2026年 6月 21日

AI 资讯

Misc. bug: WebUI CORS proxy requests don’t include API key for MCP connections

用户启动 llama-server 并同时使用 --api-key-file （API key 认证文件）和 --webui-mcp-proxy （WebUI CORS 代理）参数。在 WebUI 中通过 API key 认证后，添加 MCP 服务器（如 http://127.0.0.1:8581/

celebrityanime
2026年 6月 21日

AI 资讯

[Bug]: Gemma4-31B-it deployed on vLLM cannot process images in tool message

用户在 vLLM 上部署 Gemma4-31B-it 模型，通过 OpenAI 兼容 API（ /v1/chat/completions ）发送包含图片的 tool message 请求时，服务端返回 HTTP 500 Internal Server Error。环境为 Ubuntu 24.04 +

celebrityanime
2026年 6月 21日

AI 资讯

[Bug]: [DeepSeek-V4-Flash][MTP] CUDA invalid argument during profile_run with DP4 + EP + MegaMoE on B_00

在 vllm serve 启动 DeepSeek-V4-Flash 模型时，使用 --speculative-config '{"method":"mtp","num_speculative_tokens":1}' 或 num_speculative_tokens=3 ，并配合 --data-par

celebrityanime
2026年 6月 21日

AI 资讯

ValueError: Following weights were not initialized from checkpoint: {‘visual.blocks.16.norm2.bias’, ‘visual.blocks.4.norm2.weight’, …

用户使用 Transformers（v5.2.0 或 v5.3.0）对 Qwen3.5-9B 模型进行 SFT 后，尝试通过 vLLM（v0.17.0）部署时触发报错。报错核心为 HuggingFace 配置类型无效：期望 vllm.transformers_utils.configs.qwen3_

celebrityanime
2026年 6月 21日

AI 资讯

RuntimeError: dictionary keys changed during iteration`.

用户在 RAGFlow 中运行 GraphRAG 知识图谱生成任务，配置了实体解析（Resolution）和社区检测（Community）。任务在实体解析完成候选对合并后，合并阶段崩溃，报错 RuntimeError: dictionary keys changed during iteration

celebrityanime
2026年 6月 20日

AI 资讯

Compile bug: out-of-tree build fails to provision UI assets: ENOENT: no such file or directory, mkdir ‘/src/tools/ui/node_modules’

用户使用 CMake out-of-tree 构建方式编译 llama.cpp，将源代码挂载为只读目录 $srcdir ，并在独立的构建目录中运行 cmake 和 make 。配置启用了 LLAMA_BUILD_UI=ON 且 LLAMA_USE_PREBUILT_UI=OFF ，构建过程中需要从源

celebrityanime
2026年 6月 20日

AI 资讯

Misc. bug: .`llama-server` starts on wrong port

用户在 Linux aarch64 系统上运行 llama-server （版本 9439, built with GNU 16.1.1）时，未指定 --port 参数，预期默认端口为 8080，但访问 http://127.0.0.1:8080/app 返回 404 错误。用户推测 llama-s

celebrityanime
2026年 6月 20日