Gemini returns incomplete responses that is not exceeding the 4k token mark

Cody: AI Code Assistant v1.93.1746816255 (pre-release)
I’m using vscode-remote not sure if this is causing the issue, however whenever I use Gemini 2.5 Pro Preview with the latest pre-release version (and stable version) I often get incomplete garbled responses from specifically Gemini (it has not happened with Claude or OpenAI models).

I need to stress that this is not the same as:

As this will just stop randomly.

1 Like

been using gemini 2.5 pro exp hard for over a month, just today got a really garbled response about a coding project taking up 600,000 tokens - hasnt happened before

Same issue for me. I’ve been using it for a long time and in the past couple of days the response just seem to be cut from the middle and they are super short and incomplete

It might be related that Google reported a degradation of the Gemini 2.5 models in the last couple days but they are working on it.

May you provide a screenshot of that output and how it looks like.

Thank you


Here’s 1 example of it.

2 Likes

This is still unusable and I’ve been using the gemini api for a while and haven’t had any issues

Especially since for the most part Gemini works just fine on sourcegraph website (there are times it hiccups).

I feel like truncation has been more of a problem in the last month. I don’t recall experiencing it before then, and my usage patterns have not changed much.

Hey @esafak This could be related to the fact that we conducted an A/B test with 50% of our users and provided them with an enhanced context window. However, we rolled it back because there were anomalies in our metrics. If the metrics look good again in the future, we will consider rolling it out to more users slowly.

Any update on the Gemini issue?

No changes since my last post here Gemini returns incomplete responses that is not exceeding the 4k token mark - #9 by PriNova 19 hours ago.

I got a truncated response, 1466 characters long, with Gemini again.

Hey @esafak We expanded the context window for the models last week as an experimental feature for all users. What model did you use?

I used 2.5 Pro Preview

Yeah, the issue is, that the thinking tokens are calculated with the output tokens and if the model thinks with too many tokens, this will cut from the output token limit.

This is unfortunate but the team is working on a solution.

Makes sense. I have to regretfully share that I am considering alternatives to your product based on my observed deterioration in its quality.

@esafak sadly, the team tries to keep up fixing bugs and provide feature parity with other IDEs which is very time consuming.