ChatGPT can't reliably add 61 numbers in random order, giving 12 different wrong answers despite showing convincing fake work. The AI optimizes for speed over accuracyChatGPT can't reliably add 61 numbers in random order, giving 12 different wrong answers despite showing convincing fake work. The AI optimizes for speed over accuracy

ChatGPT Is Gaslighting You With Math

\ ChatGPT can write efficient Python code and draft complex SQL queries in seconds. Heck, it’ll brainstorm your entire marketing campaign if you let it. We’re rushing to trust this “genius,” latest models and all, with our legal documents and the code that runs our business operations.

As a Business Intelligence Analyst with over a decade of experience at companies like Amazon and Microsoft, I was curious. I’ve watched this technology go from a toy to a tool that many claim is ready to replace me. But while everyone tests AI with harder problems, I decided to do the opposite.

The results revealed what’s really going on: a major design choice, a cost-versus-performance tradeoff, that runs the entire system. It’s a transparency problem, and it’s a gap big enough to make any data professional skeptical. It revealed that while AI is great at writing code, it can’t be trusted with the most basic building block of my job: simple arithmetic.

And the scariest part? It’s confidently wrong.

The math problem that revealed the trick

My test was a simple list of all the numbers from -3000 to 3000 (at intervals of 100), which came to 61 numbers total. The list was specifically designed so the final, correct answer would be a simple “zero.” This way, I’d know instantly if it was right.

Here’s where the “shortcut” behavior became clear. When I gave it the list sorted in ascending or descending order, it aced the test every time. It correctly recognized the simple pattern: -3000 cancels 3000, -2900 cancels 2900, and so on. This, as it turns out, tested pattern matching, which is a core AI strength.

But what happens when you remove the pattern? I put those exact same numbers into a random order. This broke the simple shortcut and forced the AI to actually calculate.

It failed, and not by a small margin.

The “performative” failure

The AI didn’t calculate. It put on a performance of calculation.

It’s like watching Noah Wyle on The Pitt or ER: he’s a convincing doctor, but you should never trust him to perform an actual medical procedure on you. In the same way, the AI goes through the motions of calculation. It replies with a “step-by-step” breakdown that looks perfectly logical, but the final answer is confidently and fundamentally wrong.

When I challenged it, it “double-checked” and got a different wrong answer.

The failure was more fundamental than I first thought. This process involved extensive hand holding, like asking it to break the list into small groups of 10 numbers. It still got the math wrong on each of those simple, 10-integer sums. Even after I gave it the correct answers for each group, it still failed to correctly add the six group subtotals.

In all, I got 12 different incorrect answers from this back-and-forth, all from the same prompt.

This wasn’t a failure to handle a “complex” list. It was a complete failure to perform basic addition. In the end, it apologized for its “mental calculation errors,” a priceless admission that it was simulating calculation rather than doing it.

This shortcut-first design is the same reason the AI has historically flubbed other simple tasks, like counting letters in a word. Its main goal is finding the cheapest shortcut, not necessarily the correct answer.

Then I found the hidden calculator

So, is ChatGPT just hopelessly bad at math? Not exactly. I went back to that very first wrong answer and, instead of replying, I clicked the “Think Longer” option.

Instantly, this happened:

# Define the list of numbers numbers = [ 2100, 800, -1800, 2400, 1000, -2400, 1200, ... ] # Calculate the sum total_sum = sum(numbers) total_sum # Result 0

It got the correct answer (zero) in one second. This is the “gotcha.” It had the calculator all along. It was just choosing not to use it.

What’s actually happening here?

This isn’t a bug. What we’re seeing is a design tradeoff. ChatGPT is optimized for speed first and accuracy second in its default mode.

The default, fast path is text-only. It attempts to solve the problem by relying on its vast training data, which includes learned arithmetic patterns. For simple sums (like 2+2), this is reliable. But for tasks that require real precision across many steps, like our 61-number list, it fails. It’s like a person trying to multiply large numbers in their head: they understand the concept of math, but quickly lose track of the intermediate steps and “carries.” This text-only approach is very fast and, more than that, it’s cheap to run.

The thinking path (using the Python calculator) is perfectly accurate, but it’s expensive to run. Don’t get me wrong, this “slow” path still returned an answer in a split second. But for OpenAI, the system cost is enormous. Instead of just predicting the next word, the AI has to do a lot more work: it must spin up a secure Python interpreter, write the code, execute it, and then read the output.

This is the cost-performance tradeoff in action. It’s the why behind everything we just saw.

Here’s where it gets personal

Look, here’s the real risk for data professionals. We assume the AI is analyzing. It’s not. It’s just simulating what the analysis should look like.

Now, I know what the response will be: this is all part of the new “dynamic” system. OpenAI’s own press releases boast about this, calling it a “real-time router” that “quickly decides” whether to respond quickly or to “think longer” for hard problems. This sounds great on paper. But my test shows this so-called smart router can fail on seemingly simple tasks that lack obvious patterns.

When I gave it a 61-number math problem in random order, its internal logic misjudged the difficulty. It seemed to think this was a simple task it could crush. This tells me the router’s heuristics aren’t tuned to catch this kind of “deceptively simple” problem. It’s probably just looking at query length or whether there are math symbols. So, instead of correctly identifying this as a “hard problem” and automatically engaging its thinking model, it chose the fast, text-only path and proceeded to fail.

That’s the career risk right there. Imagine asking the AI to “check the subtotals on this expense report.” It replies with a confident, text-only “Looks correct!” You pass that report to an executive, who quickly does some mental math and realizes your calculations are wrong.

In that moment, you’ve damaged your credibility by relying on a tool for a task it wasn’t designed to handle reliably. The AI’s failure was that it simulated the act of checking instead of actually calculating. And you’re a professional, left holding the bag.

When AI math actually works

So, it’s important to know when AI is reliable for these tasks. My test was designed to hit a specific vulnerability. AI math is generally trustworthy in a few other places:

  • When it explicitly uses its code interpreter (like the “Think Longer” path).
  • For simple, in-context arithmetic (e.g., “I have 3 apples and buy 2 more, how many do I have?”).
  • For symbolic math and explaining concepts (e.g., “Explain the Pythagorean theorem”).

The risk, as my test shows, is not knowing which mode the AI is in.

What you should actually do about this

This doesn’t mean “don’t use AI.” It means we need to use it like a pro, not a novice. We have to treat it like the “shortcut engine” it is. So here’s my practical guide for data analysts based on my findings:

Look, if you remember nothing else: if it shows you code, you can trust the result. If it just talks at you, be skeptical.

Use “Think Longer” for any “right-or-wrong” answer. Don’t wait for it to fail first.

Use the right tool for the job. For straightforward arithmetic, use Excel. It’s built for it and is infinitely more reliable. Why make a “creative writing” engine do a calculator’s job? However, for generating an analytical workflow or cleaning data before the calculation, using the AI with its code execution on is a powerful and genuinely useful way to get work done.

This all comes down to transparency. The AI isn’t flawed because it failed the math; it’s flawed because it hid the failure behind a mask of confidence. It has a perfectly good calculator in its back pocket but defaults to the fast, unreliable method without telling you which one it’s using. As data professionals, that’s just not a foundation we can build on. Look, until these systems tell us how they’re getting an answer, the rule is simple: if it’s not in a code block, it’s not an answer. It’s just a performance.

\

Market Opportunity
MATH Logo
MATH Price(MATH)
$0.03088
$0.03088$0.03088
-8.06%
USD
MATH (MATH) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Trading time: Tonight, the US GDP and the upcoming non-farm data will become the market focus. Institutions are bullish on BTC to $120,000 in the second quarter.

Trading time: Tonight, the US GDP and the upcoming non-farm data will become the market focus. Institutions are bullish on BTC to $120,000 in the second quarter.

Daily market key data review and trend analysis, produced by PANews.
Share
PANews2025/04/30 13:50
Who’s Building the Next Phase of Artificial Intelligence? 20 Innovators Shaping the AI Industry in 2026

Who’s Building the Next Phase of Artificial Intelligence? 20 Innovators Shaping the AI Industry in 2026

Artificial intelligence, the center of global investing in 2025, is evolving from an experimental phase. After a few volatile years – characterized by rapid model
Share
AI Journal2025/12/19 05:58
CME Group to launch options on XRP and SOL futures

CME Group to launch options on XRP and SOL futures

The post CME Group to launch options on XRP and SOL futures appeared on BitcoinEthereumNews.com. CME Group will offer options based on the derivative markets on Solana (SOL) and XRP. The new markets will open on October 13, after regulatory approval.  CME Group will expand its crypto products with options on the futures markets of Solana (SOL) and XRP. The futures market will start on October 13, after regulatory review and approval.  The options will allow the trading of MicroSol, XRP, and MicroXRP futures, with expiry dates available every business day, monthly, and quarterly. The new products will be added to the existing BTC and ETH options markets. ‘The launch of these options contracts builds on the significant growth and increasing liquidity we have seen across our suite of Solana and XRP futures,’ said Giovanni Vicioso, CME Group Global Head of Cryptocurrency Products. The options contracts will have two main sizes, tracking the futures contracts. The new market will be suitable for sophisticated institutional traders, as well as active individual traders. The addition of options markets singles out XRP and SOL as liquid enough to offer the potential to bet on a market direction.  The options on futures arrive a few months after the launch of SOL futures. Both SOL and XRP had peak volumes in August, though XRP activity has slowed down in September. XRP and SOL options to tap both institutions and active traders Crypto options are one of the indicators of market attitudes, with XRP and SOL receiving a new way to gauge sentiment. The contracts will be supported by the Cumberland team.  ‘As one of the biggest liquidity providers in the ecosystem, the Cumberland team is excited to support CME Group’s continued expansion of crypto offerings,’ said Roman Makarov, Head of Cumberland Options Trading at DRW. ‘The launch of options on Solana and XRP futures is the latest example of the…
Share
BitcoinEthereumNews2025/09/18 00:56