As many of us already noticed, latest GPT-4 model is a step backwards. Also aider, an AI-powered command-line tool designed for pair programming, noticed this.
The recently released GPT-4 Turbo with Vision performs worse on aider’s coding benchmarks than previous GPT-4 models. It scores the lowest in code editing and is more prone to “lazy coding,” omitting necessary codes and leaving comments instead.
Aiders findings on Laziness latest GPT-4 model:
An example of “lazy coding” by GPT-4 Turbo:
Take aways:
- GPT-4 Turbo with Vision scores only 62% on aider’s code editing benchmark, the lowest score among all GPT-4 models.
- The new model is more prone to “lazy coding,” often omitting needed code and leaving comments instead.
- It scores only 34% on aider’s refactoring benchmark, making it the laziest coder of all the GPT-4 models.
- Despite full support for GPT-4 Turbo with Vision on aider, the tool will continue to use gpt-4-1106-preview by default, as it is the strongest coder of the GPT-4 models.
References:
GPT-4 Turbo with Vision is a step backwards for coding | aider