A Statistically Significant Test Proves That OpenAI’s GPT-4 Turbo is Particularly Lazy Over the Winter Breaks

•

Dec 12, 2023 at 10:35am EST

This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.

Don't ask OpenAI's most cutting-edge Large Language Model (LLM), the GPT-4 Turbo, to perform exhaustive tasks over the winter holidays. That's the conclusion that one can comfortably draw from a recent statistically significant test conducted by an LLM enthusiast.

OpenAI claims that GPT-4 Turbo is capable of handling highly complicated tasks encased within a single prompt, courtesy of its much more exhaustive training. The model is also capable of processing 128,000 tokens courtesy of its expanded token context window, a measure of the richness or depth of input and output of a particular LLM. As a refresher, 1,000 tokens are roughly equivalent to 750 words. This means that OpenAI's latest offering is capable of processing an input of around 96,000 words.

@ChatGPTapp @OpenAI @tszzl @emollick @voooooogel Wild result. gpt-4-turbo over the API produces (statistically significant) shorter completions when it "thinks" its December vs. when it thinks its May (as determined by the date in the system prompt).

I took the same exact prompt… pic.twitter.com/mA7sqZUA0r

— Rob Lynch (@RobLynch99) December 11, 2023

Recently, Rob Lynch, an LLM enthusiast, put GPT-4 Turbo through its proverbial paces. To his utter surprise, the LLM produces a shorter response when it thinks that the current month is December vs. when it is prompted to believe that it is May.

Specifically, Lynch was able to obtain an average output of 4,298 tokens over 477 test runs from GPT-4 Turbo when it was prompted to believe that the current month was May. For December, the LLM gave a significantly shorter mean output of 4,086 tokens, equating to a decrease in productivity of around 5 percent.

OMG, the AI Winter Break Hypothesis may actually be true?

There was some idle speculation that GPT-4 might perform worse in December because it "learned" to do less work over the holidays.

Here is a statistically significant test showing that this may be true. LLMs are weird.🎅 https://t.co/mtCY3lmLFF

— Ethan Mollick (@emollick) December 11, 2023

While shedding light on the likely cause behind this discrepancy, Ethan Mollick, a professor at Wharton, believes that the GPT-4 Turbo learned from the human tendency to do less work in holiday-heavy December. This also suggests that these LLMs, despite exhaustive efforts to prevent the incursion of harmful human biases, still remain susceptible to inheriting some of the quirkier human shortcomings courtesy of training data infiltration.

This development comes on the heels of another one that suggested OpenAI's GPT model was becoming progressively lazier, resorting to shortcuts instead of giving complete answers to queries. Some anecdotes suggest that users have been pretending to be handicapped to eke out complete answers from the LLM! The situation is apparently dire enough to prompt OpenAI to try to come up with a hotfix.

About the author: Writing is my one incontrovertible passion. Over the past six years, he has authored over 2,200 distinct articles on financial and tech-related topics, spanning nearly 1 million words. And he has been a member of Wcctech mobile team since 2025. As an alumnus of the University of Toronto, Rotman Commerce Program, I bring nuance, in-depth knowledge, and a unique perspective to every topic that I cover. When I'm not writing, I'm traveling the world, exploring hidden confectionaries and restaurants as an aspiring food connoisseur.

Follow Wccftech on Google to get more of our news coverage in your feeds.

A Statistically Significant Test Proves That OpenAI’s GPT-4 Turbo is Particularly Lazy Over the Winter Breaks

Related Story Scalpers Are Already Flipping Steam Machine Reservations On eBay For $2,700–$2,900, Roughly Double Valve’s Price

Further Reading

Rockstar Reportedly Has No Plans for GTA VI Physical Disc, Not Even After Launch, Despite Rumors

UPERFECT GR19BU 4K QLED Monitor Review: Sharp And Vibrant

GIGABYTE AORUS RTX 5080 INFINITY And INFINITY WOOD Are Now Available

Quantic Dream Developers Strike to Save Star Wars Eclipse, Warning that 115 Layoffs Would Doom the Unfinished Game