Eval bug: Qwen3-Embedding model crashes on startup with GGML_ASSERT(block_num_y % num_subgroups == 0) failed.

用户使用 llama-server 加载 Qwen3-Embedding GGUF 模型(0.6B-Q8_0 或 4B-Q4_K_M),启动进程中触发断言失败。涉及 Intel Arc 系列显卡(B70 / B570),编译工具链为 IntelLLVM 2026.0.0,GGML backend 为







![[Question]: using latest image(v0.23.1) can not enable docling](https://www.chat-gpts.plus/wp-content/uploads/2026/06/12440-35fa5ab2-768x403.jpg)
![[Bug]:](https://www.chat-gpts.plus/wp-content/uploads/2026/06/12631-be7e80ae-768x403.jpg)