A Statistically Significant Test Proves That OpenAI’s GPT-4 Turbo is Particularly Lazy Over the Winter Breaks

Rohail Saleem
OpenAI GPT-4 Turbo

This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.

Don't ask OpenAI's most cutting-edge Large Language Model (LLM), the GPT-4 Turbo, to perform exhaustive tasks over the winter holidays. That's the conclusion that one can comfortably draw from a recent statistically significant test conducted by an LLM enthusiast.

OpenAI claims that GPT-4 Turbo is capable of handling highly complicated tasks encased within a single prompt, courtesy of its much more exhaustive training. The model is also capable of processing 128,000 tokens courtesy of its expanded token context window, a measure of the richness or depth of input and output of a particular LLM. As a refresher, 1,000 tokens are roughly equivalent to 750 words. This means that OpenAI's latest offering is capable of processing an input of around 96,000 words.

Related Story PS5 Pro Onimusha Way of the Sword Boost Disappoints, But Early Test Crushes RE Engine Optimization Fears

Recently, Rob Lynch, an LLM enthusiast, put GPT-4 Turbo through its proverbial paces. To his utter surprise, the LLM produces a shorter response when it thinks that the current month is December vs. when it is prompted to believe that it is May.

Specifically, Lynch was able to obtain an average output of 4,298 tokens over 477 test runs from GPT-4 Turbo when it was prompted to believe that the current month was May. For December, the LLM gave a significantly shorter mean output of 4,086 tokens, equating to a decrease in productivity of around 5 percent.

While shedding light on the likely cause behind this discrepancy, Ethan Mollick, a professor at Wharton, believes that the GPT-4 Turbo learned from the human tendency to do less work in holiday-heavy December. This also suggests that these LLMs, despite exhaustive efforts to prevent the incursion of harmful human biases, still remain susceptible to inheriting some of the quirkier human shortcomings courtesy of training data infiltration.

This development comes on the heels of another one that suggested OpenAI's GPT model was becoming progressively lazier, resorting to shortcuts instead of giving complete answers to queries. Some anecdotes suggest that users have been pretending to be handicapped to eke out complete answers from the LLM! The situation is apparently dire enough to prompt OpenAI to try to come up with a hotfix.

Rohail Saleem Photo

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button