How to disable ‘smart memory management’ when using GGUF?

快速结论：此问题通常出现在使用 GGUF 模型（如 Scail2 GGUF Q4）在 ComfyUI 中进行高分辨率（如 1280×720）推理时，即使仍有空闲显存，ComfyUI 仍然自动启用 VRAM 卸载（offloading）。优先排查方案：使用 --reserve-vram -999 --disable-smart-memory 启动参数组合强制关闭智能内存管理及显存预留，但需注意此操作不具防 OOM 保护。

问题场景

用户运行 ComfyUI（v0.24.0，Python 3.12.7，PyTorch 2.7.0+cu128，xformers 0.0.30），使用 Scail2 GGUF Q4 模型，在 NVIDIA GeForce RTX 3090 Ti（24GB 显存）上生成 1280×720 分辨率图像时，即使还有空闲显存，ComfyUI 仍自动将模型权重卸载到系统内存中，影响性能。

报错原文

[INFO] Set vram state to: NORMAL_VRAM
[INFO] Disabling smart memory management
[INFO] Device: cuda:0 NVIDIA GeForce RTX 3090 Ti : cudaMallocAsync
[INFO] Using async weight offloading with 2 streams
[WARNING] Unsupported Pytorch detected. DynamicVRAM support requires Pytorch version 2.8 or later. Falling back to legacy ModelPatcher. VRAM estimates may be unreliable especially on Windows

虽然日志明确显示 Disabling smart memory management，但实际上 ComfyUI 仍然在使用异步权重卸载（async weight offloading with 2 streams），导致模型被搬移到系统内存。

原因分析

可能原因：

参数冲突：只使用 --disable-smart-memory 并不能完全禁用所有形式的 VRAM offloading；ComfyUI 内部仍可能使用异步流（async streams）进行权重卸载。
GPU 驱动/固件行为：即使设置了 --disable-smart-memory，某些场景下 cudaMallocAsync 内存分配器可能仍然执行自动 offloading。
PyTorch 版本限制：当前 PyTorch 2.7.0 不支持 DynamicVRAM（需要 2.8 或以上），导致 fallback 到 legacy ModelPatcher，其 VRAM 估算不准确，可能错误触发卸载。

环境排查

ComfyUI 版本：确认已更新至最新稳定版（当前 v0.24.0，Issue 关闭时版本）。
PyTorch 版本：确认是否为 2.7.0+cu128（日志显示为 Unsupported Pytorch detected）。
CUDA 版本：确认 CUDA 12.8 环境（cu128）。
显卡型号：NVIDIA GeForce RTX 3090 Ti（24GB VRAM）。
启动参数：检查批处理文件（.bat）或命令行中是否只单独使用了 --disable-smart-memory，没有同时配合 --reserve-vram。

解决步骤

修改启动参数：在 ComfyUI 启动批处理文件或命令行中，添加以下参数（注意：该组合不具防 OOM 保护，风险自担）：
--reserve-vram -999 --disable-smart-memory
例如：python main.py --reserve-vram -999 --disable-smart-memory
说明：--reserve-vram -999 强制 ComfyUI 不预留任何显存（负值表示不保留），同时 --disable-smart-memory 关闭智能内存管理，使系统尝试一次只加载一个模型的所有数据到 GPU。
仅限单一模型加载：此方案仅适用于一次只处理一个模型负载的工作流，如果同时加载多个大模型，可能会直接导致 OOM（Out of Memory）崩溃。

验证方法

启动 ComfyUI 后查看控制台日志，确认以下信息：

出现 Set vram state to: NORMAL_VRAM
出现 Disabling smart memory management
没有出现 Using async weight offloading with 2 streams 或类似 offloading 提示
实际运行高分辨率 GGUF 模型时，显存占用稳定在较高水平，不再出现频繁卸载到系统内存的情况。

参考来源

Comfy-Org/ComfyUI #14481 — 特别感谢 Issue 作者及评论中提供的 --reserve-vram -999 --disable-smart-memory 解决方案。

AI 工具推荐

想把多个 AI 模型放在一个入口？

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

了解 GamsGo AI

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

How to disable ‘smart memory management’ when using GGUF?