This study draws attention to a crucial contradiction in artificial intelligence: the theoretical expressiveness of Transformer-like models and their ability to reason in real-world scenarios. This paper shows that deep neural networks and Transformers are consistently unsuccessful in practice on basic, structured reasoning tasks, despite the fact that they are formally proven to be Turing-complete, which means that they can theoretically simulate any Turing machine and solve any computable problem with polynomial resources. A "height comparison" job serves as an example of this, in which the model must combine multiple provided facts to infer a relationship.This study draws attention to a crucial contradiction in artificial intelligence: the theoretical expressiveness of Transformer-like models and their ability to reason in real-world scenarios. This paper shows that deep neural networks and Transformers are consistently unsuccessful in practice on basic, structured reasoning tasks, despite the fact that they are formally proven to be Turing-complete, which means that they can theoretically simulate any Turing machine and solve any computable problem with polynomial resources. A "height comparison" job serves as an example of this, in which the model must combine multiple provided facts to infer a relationship.

The Basic Reasoning Test That Separates Real Intelligence from AI

2025/11/05 21:00

Abstract and 1. Introduction

1.1 Syllogisms composition

1.2 Hardness of long compositions

1.3 Hardness of global reasoning

1.4 Our contributions

  1. Results on the local reasoning barrier

    2.1 Defining locality and auto-regressive locality

    2.2 Transformers require low locality: formal results

    2.3 Agnostic scratchpads cannot break the locality

  2. Scratchpads to break the locality

    3.1 Educated scratchpad

    3.2 Inductive Scratchpads

  3. Conclusion, Acknowledgments, and References

A. Further related literature

B. Additional experiments

C. Experiment and implementation details

D. Proof of Theorem 1

E. Comment on Lemma 1

F. Discussion on circuit complexity connections

G. More experiments with ChatGPT

F Discussion on circuit complexity connections

\

\

\ On the other hand, with appropriate setup, deep neural nets, recurrent neural nets, and Transformers with scratchpads are Turing complete. Furthermore, they can simulate a Turing machine using resources polynomial in the number of steps the Turing machine runs for and the input length. So, with appropriate parameters these can efficiently solve any problem that it is possible to solve efficiently. A little more precisely, given a neural net where the input bits are 0 or 1, it is fairly easy to set a neuron to compute an AND, OR, or NOT of one or more previous values, so any circuit can be converted into a neural net of at most equal size. Any efficient computation can be performed by a polynomial-sized circuit, so it can also be performed by a polynomial-sized deep neural net. Also, given a Turing machine in a state where all entries in its tape that are more than n steps away from the head or heads are in their initial state, there is a circuit of depth O(1) and size O(n) that computes the next state of the Turing machine. That means that running a Turing machine for T steps on an input of length n can be simulated by a recurrent neural net of size O(T + n) and T recurrences. Conversely, given a neural net with a reasonable activation function and sub-exponential edge weights, one can estimate the output of each neuron to within an exponentially small error in time polynomial in the size of the net.

\

\

\

\

G More experiments with ChatGPT

Height comparison. For n ≥ 1, we consider 3n + 2 people having different heights. We give the model 3n + 1 pairwise relations between the consecutive people (in order of height) in a random order. Using this information, one can understand the order of the heights for all people by combining the given information. We ask the model about the relation between person n + 1 and 2n + 2. An example for n = 1 is

\ “Omar is taller than Sara. Vlad is taller than David. Farah is taller than Omar. Sara is taller than Vlad. Is Omar taller than Vlad?"

\ where the answer is true. Note that to answer this question correctly one has to combine at least n + 1 relations. Thus, the locality of the task is always larger than n. (The exact locality would depend on the tokenization.) We found out that ChatGPT (GPT3.5) fails at this task even for n = 1 (simplest case). Note that when working with the GPT3.5 model we used the following prompt so that the model is able to use chain-of-thought reasoning: "You can reason if you want but make sure to include yes/no in your answer." Interestingly, GPT4 performs much better than GPT3.5. We also observed that it is often the case that when GPT4 answers correctly to the question, it orders people based on their height, very similar to what we do in the scratchpad of the graph task. Motivated by this, we tested one more setting where we prompted GPT4 with "Answer only with a yes or no." to avoid the chain-of-thought reasoning. In this case, as expected, the model couldn’t solve the height comparison task for n > 1. The results are shown in Figure 11.

\ Figure 11: For complexity n we have 3n + 2 people and there are n people between the two names we query (see example above). We found out that ChatGPT(3.5) can hardly go beyond the random baseline on this task even for n = 1 while GPT4 performs much better. However, if GPT4 does not use CoT reasoning, its performance would be near random for n > 1. Note that we used 1000 examples for each value of n.

\

:::info Authors:

(1) Emmanuel Abbe, Apple and EPFL;

(2) Samy Bengio, Apple;

(3) Aryo Lotf, EPFL;

(4) Colin Sandon, EPFL;

(5) Omid Saremi, Apple.

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Let insiders trade – Blockworks

Let insiders trade – Blockworks

The post Let insiders trade – Blockworks appeared on BitcoinEthereumNews.com. This is a segment from The Breakdown newsletter. To read more editions, subscribe ​​“The most valuable commodity I know of is information.” — Gordon Gekko, Wall Street Ten months ago, FBI agents raided Shayne Coplan’s Manhattan apartment, ostensibly in search of evidence that the prediction market he founded, Polymarket, had illegally allowed US residents to place bets on the US election. Two weeks ago, the CFTC gave Polymarket the green light to allow those very same US residents to place bets on whatever they like. This is quite the turn of events — and it’s not just about elections or politics. With its US government seal of approval in hand, Polymarket is reportedly raising capital at a valuation of $9 billion — a reflection of the growing belief that prediction markets will be used for much more than betting on elections once every four years. Instead, proponents say prediction markets can provide a real service to the world by providing it with better information about nearly everything. I think they might, too — but only if insiders are free to participate. Yesterday, for example, Polymarket announced new betting markets on company earnings reports, with a promise that it would improve the information that investors have to work with.  Instead of waiting three months to find out how a company is faring, investors could simply watch the odds on Polymarket.  If the probability of an earnings beat is rising, for example, investors would know at a glance that things are going well. But that will only happen if enough of the people betting actually know how things are going. Relying on the wisdom of crowds to magically discern how a business is doing won’t add much incremental knowledge to the world; everyone’s guesses are unlikely to average out to the truth. If…
Share
BitcoinEthereumNews2025/09/18 05:16