[Bug]: shorten_message_to_fit_limit grows content instead of trimming when half_length is 0

快速结论：当需要裁剪到极短长度（如 new_length < 2）时，计算出的 half_length 为 0，Python 的 content[-0:] 实际返回完整字符串，导致裁剪结果反而比原内容更长。优先检查是否在调用 shorten_message_to_fit_limit 时设置了过小的模型限制。

问题场景

用户在调用 LiteLLM SDK 的 shorten_message_to_fit_limit 函数时触发，通常出现在需要将消息内容裁剪以适应较小 token 上限的场景中。该 Issue 中的复现脚本显示，当传入一个较长的字符串并尝试裁剪到 1 个 token 时，函数未能缩小内容，反而返回了更长的结果。

报错原文

AssertionError: Content grew: trimmed result is longer than original

此外，实际运行中可能不会抛出此断言，而是直接返回被增长后的内容，导致后续向模型发送请求时因超出上下文窗口而报错：context_length_exceeded。

原因分析

根本原因位于 litellm/utils.py 中 shorten_message_to_fit_limit 函数的裁剪逻辑（约第 6962 行）。当需要裁剪的 token 数 tokens_needed 相对原内容长度非常小时，ratio 趋于 0，导致 new_length = len(content) * ratio 也为 0，进而 half_length = 0。

在 Python 中，content[-0:] 等同于 content[0:]（因为 -0 就是 0），返回整个字符串，而不是空字符串。因此裁剪逻辑变为：

half_length = new_length // 2   # == 0
left_half = content[:0]         # == ''（正确）
right_half = content[-0:]       # == content（错误！本应为空）
trimmed_content = '' + '..' + content  # 比原内容更长

由于裁剪循环会重复执行直到内容符合限制，但每次迭代内容反而增长（加上 “..” 分隔符），最终达到 MAX_TOKEN_TRIMMING_ATTEMPTS 上限后要么抛出异常，要么返回未缩小的内容。

环境排查

LiteLLM 版本：v1.86.0 及之前版本受影响（Issue 确认）
Python 版本：任何版本（与内置切片行为相关）
关键文件：litellm/utils.py

解决步骤

升级到修复版本：确认已合并 PR 的 LiteLLM 版本（如 v1.87.0 或更新），或直接安装最新的开发版。
手动补丁（如无法升级）：
1. 定位 litellm/utils.py 中 shorten_message_to_fit_limit 函数内裁剪循环中的 half_length = new_length // 2 行。
2. 在该行后添加判断：
3. 保存文件并重新加载 python 模块。
避免触发极端场景（临时方案）：在调用 shorten_message_to_fit_limit 时，确保 model_limit 参数不会导致 new_length 小于 2（即确保传入的 token 上限至少为 4）。

验证方法

运行原 Issue 中的复现脚本：

from litellm.utils import shorten_message_to_fit_limit
content = "hello " * 100
result = shorten_message_to_fit_limit(content, 1)
assert len(result) < len(content), f"Content grew: {len(result)} > {len(content)}"

如果断言不再触发（且结果长度小于原内容），则问题已修复。

参考来源

BerriAI/litellm #28128

AI 工具推荐

想把多个 AI 模型放在一个入口？

GamsGo AI 集成 ChatGPT、DeepSeek、Gemini、Claude、Midjourney、Veo 等常用模型，适合写作、绘图、视频和日常 AI 工作流。

了解 GamsGo AI

推广链接：通过此链接购买，我可能获得佣金，不影响你的价格。

[Bug]: shorten_message_to_fit_limit grows content instead of trimming when half_length is 0