AttributeError: ‘NoneType’ object has no attribute ‘to’

快速结论：在使用 Flash Attention 2 加载不含 attention sink 的模型（如 Gemma 4）时，`flash_attention_forward` 函数无条件调用 s_aux.to(query.dtype)，导致报错。优先排查 s_aux 是否为 None 并添加保护性判断。

问题场景

用户使用 AutoModelForCausalLM 加载未实现 attention sink 机制的模型（例如 "google/gemma-4-E4B-it"）时，指定 attn_implementation="flash_attention_2"，在前向推理阶段触发该错误。

受影响文件：src/transformers/integrations/flash_attention.py 第 84 行。

报错原文

File ".../transformers/integrations/flash_attention.py", line 84, in flash_attention_forward
    s_aux=s_aux.to(query.dtype),  # FA only accepts half precision
          ^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'to'

原因分析

根本原因在于 flash_attention_forward 函数中 s_aux 参数类型声明为 torch.Tensor | None = None，但调用时未经空值检查直接执行类型转换。对于没有 attention sink 机制的模型，attention 层前向传播不会传入 s_aux 参数，导致该参数保持 None。

此前类似问题已在 flash_paged.py 中通过 Pull Request #40434 修复，但 flash_attention.py 仍存在相同的未保护调用。经核查其他集成文件：

flash_paged.py: 已修复（使用 "s_aux" in kwargs 条件字典构建）
flex_attention.py: 已有 if s_aux is not None: 保护
eager_paged.py、npu_flash_attention.py、sdpa_attention.py、sdpa_paged.py: 未涉及 s_aux 处理

环境排查

transformers: 5.6.0（受影响版本），HEAD commit bc4b330451d0e3e33f4ac63593ed9f245227712e 仍存在问题
Python: 3.12.13
PyTorch: 2.10.0+cu129
CUDA: 12.9
GPU: NVIDIA H100 80GB HBM3（其他支持 Flash Attention 的 GPU 也可能复现）
模型配置: 需确认模型是否使用 attention sink（如 Gemma 4 无 sink 机制）

解决步骤

根据 Issue 讨论，推荐的最小改动方案如下：

定位文件 src/transformers/integrations/flash_attention.py
找到第 84 行附近的有问题的调用语句：
```
s_aux=s_aux.to(query.dtype),
```

修改为带空值保护的形式：

s_aux=s_aux.to(query.dtype) if s_aux is not None else None,

此修改利用了 _process_flash_attention_kwargs 中已有的 s_aux is not None 判断机制，因此传递 None 是安全的。

备注：不同于 flash_paged.py 使用的条件字典构建方式（因某些后端会直接拒绝该参数），当前修复方式更简洁且适用于此调用场景。

可优先尝试：直接替换上述代码行，重启应用后测试。

验证方法

使用不包含 attention sink 的模型进行前向传播测试，确认不再抛出 AttributeError。例如加载 hf-internal-testing/tiny-random-Gemma3ForCausalLM 并执行推理：

model = AutoModelForCausalLM.from_pretrained(
    "hf-internal-testing/tiny-random-Gemma3ForCausalLM",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
).to("cuda")
input_ids = torch.tensor([[1, 2, 3, 4, 5, 6, 7, 8]], device="cuda")
with torch.no_grad():
    _ = model(input_ids=input_ids)

若输出正常（无报错），则说明修复生效。

参考来源

huggingface/transformers #45588

AI 工具推荐

想把多个 AI 模型放在一个入口？

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

了解 GamsGo AI

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

AttributeError: ‘NoneType’ object has no attribute ‘to’