This happened to me a lot when I tried to run big models with low context windows. It would effectively run out of memory so each new token wouldn’t actually be added to the context so it would just get stuck in an infinite loop repeating the previous token. It is possible that there was a memory issue on Google’s end.
- 0 Posts
- 2 Comments
Joined 2 years ago
Cake day: July 7th, 2024
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.


At least llama.cpp doesn’t seem to do that by default. If it overruns the context window it just blorps.