Lazy GPT-4 Issue Persists

As many of us already noticed, latest GPT-4 model is a step backwards. Also aider, an AI-powered command-line tool designed for pair programming, noticed this.

The recently released GPT-4 Turbo with Vision performs worse on aider’s coding benchmarks than previous GPT-4 models. It scores the lowest in code editing and is more prone to “lazy coding,” omitting necessary codes and leaving comments instead.

Aiders findings on Laziness latest GPT-4 model:

An example of “lazy coding” by GPT-4 Turbo:

Take aways:

GPT-4 Turbo with Vision scores only 62% on aider’s code editing benchmark, the lowest score among all GPT-4 models.
The new model is more prone to “lazy coding,” often omitting needed code and leaving comments instead.
It scores only 34% on aider’s refactoring benchmark, making it the laziest coder of all the GPT-4 models.
Despite full support for GPT-4 Turbo with Vision on aider, the tool will continue to use gpt-4-1106-preview by default, as it is the strongest coder of the GPT-4 models.

References:

GPT-4 Turbo with Vision is a step backwards for coding | aider

GitHub – paul-gauthier/aider: aider is AI pair programming in your terminal

aider is AI pair programming in your terminal. Contribute to paul-gauthier/aider development by creating an account on GitHub.

Makes your AI work

Lazy GPT-4 Issue Persists

stevenbaert.ai