@bunchberry

bunchberry@lemmy.world · 3 days ago

At least llama.cpp doesn’t seem to do that by default. If it overruns the context window it just blorps.

bunchberry@lemmy.world · 4 days ago

This happened to me a lot when I tried to run big models with low context windows. It would effectively run out of memory so each new token wouldn’t actually be added to the context so it would just get stuck in an infinite loop repeating the previous token. It is possible that there was a memory issue on Google’s end.