As many of us already noticed, latest GPT-4 model is a step backwards. Also aider, an AI-powered command-line tool designed for pair programming, noticed this.

The recently released GPT-4 Turbo with Vision performs worse on aider’s coding benchmarks than previous GPT-4 models. It scores the lowest in code editing and is more prone to “lazy coding,” omitting necessary codes and leaving comments instead.


Aiders findings on Laziness latest GPT-4 model:

An example of “lazy coding” by GPT-4 Turbo:


Take aways:

  • GPT-4 Turbo with Vision scores only 62% on aider’s code editing benchmark, the lowest score among all GPT-4 models.
  • The new model is more prone to “lazy coding,” often omitting needed code and leaving comments instead.
  • It scores only 34% on aider’s refactoring benchmark, making it the laziest coder of all the GPT-4 models.
  • Despite full support for GPT-4 Turbo with Vision on aider, the tool will continue to use gpt-4-1106-preview by default, as it is the strongest coder of the GPT-4 models.

References:

GPT-4 Turbo with Vision is a step backwards for coding | aider