Ziverge - Stop Pretending Your AI Can Think

Why Large Language Models Aren't What You Think They Are

John A. De Goes

We are living in extraordinary times.

You can literally describe any scene your mind can imagine, and ask AI to generate a photorealistic picture or even a video clip.

AI can write a story or article about anything, generate any photo or video, and even produce code in any programming language that actually runs... sometimes.

Yet, for all these impressive feats of silicon, AIs can get stumped on problem solving tasks that any intelligent school child could solve.

Moreover, it’s inconceivable to imagine that, if you asked AI to design a rocket, or construct a 10 million line-of-code (LOC) software application, the result would be anything short of disastrous.

The distance between what AI can do, and what true artificial intelligence could do, remains unimaginably vast, and innovations such as ChatGPT o3 barely move the needle.

Now to some, the distance between where we are now, and so-called “Artificial General Intelligence”, is merely a matter of more data and more tuning.

Very soon, the True Believers tell us, the days of low-cost and superhuman intelligence will be upon us, without any new breakthroughs required.

I’m here to tell you: the True Believers are all wrong. New approaches are needed to usher in the truly revolutionary age of AI that startups have overpromised and oversold to VCs.

To see why, we have to get a little messy exploring how AI works and what its limitations are.

Large Language Models

What most True Believers don’t realize is that the conceptual basis for much of what is being called “AI” today is so-called Large Language Models (LLMs), and their derivatives–the foundation of today’s AI revolution.

There are many ways to view LLMs, but a simple and fairly accurate way is as statistical auto-complete. Given a linear sequence of tokens (e.g. fragments of words), an LLM can suggest a variety of highly-probable completions based on its training data.

Now, while this is a good model of how LLMs work, it doesn’t accurately convey the power of the approach, nor the range of useful applications.

To speak to the power, an LLM can be built from all human data ever created, which gives it a crude but effective form of “insight” into acquired human knowledge, information, wisdom, and illustrated reasoning.

To speak to the use cases, one need only look at the stunning range of AI-based applications that are built on LLMs–including generative AI and agents–which, though built on LLMs, employ their own tricks to obtain autonomous and general-purpose automation.

Behind the Curtains

How can “statistical autocomplete” be responsible for such power and applications?

Pondering this question has led many a True Believer to assume that, deep within the layers of neural networks that make up a large language model, there must be something more than just tables of probabilities–something “alive”.

Yet, LLMs can be functionally modeled as a flat table of prior input tokens and predicted successor token–albeit, an extremely large table. To see how this is possible, let’s take a very simple case: a model with 1 token context. For simplicity, let’s assume that tokens are words (which is incorrect, but just makes the example easier to comprehend).

In this case, we could make a frequency table like so:

Previous word

Next word

Count

To fill in this table, we look at all text ever written, and for every word we find, we look at the next word, and increment the total count of this specific two-word pair in the table.

After feeding our simple algorithm all text, we’ll end up with a table that looks like this:

Previous word	Next word	Count
the	cat	23929321
the	dog	99134989
…

With this fake data, we can see that it is more common for the word “dog” to follow the word “the” than the word “cat”. So if we were to leverage this information, and our prompt were “the”, we would predict the next word to be “dog” over “cat” (of course, in the full table, there are probably many other words more likely to follow the word “the”).

By sorting the table by probability, you could find the most likely word to follow any word. If you want, you could also introduce some random variation in making your predictions–choosing not just the most likely word, but some highly likely word at random.

Now although very simple, you can imagine such a model would be useful in a word processor application. To make it more useful, you can repeat the process with 2 tokens. To make it even more useful, repeat the process with 1 billion tokens or more.

If you could actually build such a massive table and then efficiently query it, then you would start to see behavior similar to large language models–and suddenly, with nothing more than tokens and counts and a query language like SQL, you might be able to convince the True Believers that the Terminator doomsday scenario is just around the corner.

Now, while a true LLM can be faithfully (if impractically) modeled with a table, the way this “table” is built is much more sophisticated, and involves “lossy compression”. The lossy compression both supports efficient querying, and, importantly, “forgets” enough information to enable generalization. Through generalization, an LLM can predict the next token in a sequence of tokens that does not occur in the training data.

If all LLMs can be represented by a giant static table–essentially a function that takes n inputs and produces a prediction–then what does this say about the nature of LLMs?

No De Novo Reasoning

It is my contention that LLMs are incapable of de novo reasoning–and that, moreover, any apparent “reasoning” seen in LLMs is latent or reflective intelligence, originating not from the LLMs, but from the human-authored training data.

Beyond just the fact that they can be modeled as a total function on a finite domain (and are therefore not Turing complete without hacks), there is much evidence this hypothesis is correct, ranging from the autoregressive nature of LLMs (a driving factor in hallucinations) to the extreme sensitivity of LLMs to the specific way a problem is phrased or a prompt is written.

Indeed, a simple thought experiment should convince anyone but a True Believer: if we only trained LLMs on human text that demonstrates fallacious and scatterbrain reasoning, then we should expect that, when we ask our LLM to solve problems, we see only fallacious reasoning and unrelated conclusions (this is exactly what we would see).

LLMs cannot reason, and no amount of data can fix this problem–although there are many tricks to enhance reflective intelligence (such as chain-of-reasoning).

If LLMs merely reflect human intelligence back to us, then what does this say about the limitations of LLMs? With various hacks that can improve such reflection, it probably means that the limit of LLM “intelligence” is roughly on par with the intelligence of the smartest human, augmented with the sum total of human knowledge, but constrained to the domains where humans have already demonstrated reasoning (or domains isomorphic to them).

If we can achieve anywhere close to this limit, then obviously, “AI” will be a tremendous success–not because of how intelligent it is, per se, but rather, because of the variety of tasks we can partially or completely automate using the technology; and how far we can augment human skills and knowledge with these abilities.

At the same time, however, the limitations of LLMs fall infinitely short of the promise of AI, which is to unleash reasoning that is exponentially better than human reasoning (for example, the ability to instantaneously find a grand unified theory, or understand and re-engineer human DNA to eliminate all diseases), and across domains quite dissimilar from those where humans have already demonstrated reasoning.

LLMs are missing something deep: “Artificial Intelligence” is missing the “intelligence”–at least, by any reasonable definition of intelligence.

Despite this, however, LLMs do possess at least one of the crucial ingredients of true intelligence: the ability to abstract.

Abstraction Galore

Although incapable of de novo reasoning, LLMs are extraordinarily adept at abstraction. Abstraction is the art of discarding the ways things are different, so we can clearly see the ways they are the same.

For example, we can look at different breeds of dogs and recognize that they are all dogs. We can note that both triangles and rectangles are simple geometric shapes. We can pick out the pieces of a human face, despite the wide variation of facial features.

Abstraction derives from the generalization inherent in machine learning. It’s closely connected to the “lossy compression” that is exhibited both by human brains, and by digital neural networks (forgetting is a feature–as well as a bug!). The process, both in humans and in machines, seems to require a massive amount of data, because it is only through exposure to such data that we can begin to extract signals from all the noise.

The skills of an LLM are so phenomenal at abstraction, they can perform feats that human brains struggle with: like, for example, ignoring the ways languages differ, to see through to how they are the same (which is why LLMs are fantastic at translation).

Without abstraction, human brains would be overwhelmed by the billion bits per second generated by our sensory systems–more than 10,000 gigabytes per day! Thanks to abstraction, however, we can boil down all of that information to around 10 bits per second, which can then participate in reasoning processes.

Abstraction, then, is crucial for intelligence–and it should indeed be regarded as a component of intelligence. But by itself, abstraction is not intelligence.

Reasoning

To go from abstraction to real intelligence, we need reasoning–and I mean de novo reasoning, not reflective reasoning.

Let me be clear about the distinction: reflective reasoning is what LLMs do when they pattern-match against examples of human reasoning in their training data. De novo reasoning, in contrast, is the ability to construct novel solution paths through a problem space, even in domains where no human has gone before.

We see de novo reasoning everywhere in human intelligence: in how we best our competitors in business; how we play chess and do math; how we prepare, conduct, and learn from physics experiments that reveal to us the hidden structure of the world around us; how we thoughtfully engineer distributed cloud systems that give us reliability and massive scale.

Yet, stated at such a level, divorced from biology or computer science, it's difficult to see how this relates to LLMs or artificial intelligence. To see the relation, we need to be much more precise.

More precisely, then, reasoning is an optimization process capable of driving a system from one state to a desired target state through the application of a known set of steps, each of which may perturb the system in both desired and undesired ways.

For example, some people reading this post probably know how to solve the equation 2x^2 + 4x - 1 = 5 for the variable x. This equation has one state, and we want to drive it toward another state, through the application of the laws of algebra.

Not all systems need be mathematical, of course. A coach helping his team win a game of soccer has a mental model of how each player behaves, and how the team as a whole behaves under different circumstances. Then, equipped with knowledge of the game, the coach provides instructions aimed at helping the coach’s team win the game.

All reasoning is equivalent to graph search–albeit on what is, in most cases, an infinite graph. The nodes in this graph represent different states of the system, while the edges represent application of particular steps.

Reasoning, then, can be thought of as an attempt to find a path from one state to another state, by exploring pathways in a graph. Reasoning is very simple to implement for simple systems (for example, tic tac toe). For infinite graphs, however, the problem is insanely complex.

Challenges of Reasoning

Although one can describe the abstract process of reasoning simply (as graph search), implementing a system capable of any degree of “generic” reasoning is difficult.

There are two main challenges:

Model. Reasoning requires a model of both the state of a system, as well as how valid interactions with the system affect the state of the system. For example, in tic tac toe, we need a model of the game state (the 9 squares), and how legally placing marks on different squares changes the game state.
Search. The model dictates both the topology of the graph, as well as how traversing edges modifies the state of the system. However, it’s not enough to have a model: we need a computationally feasible process that enables us to intentionally direct the state of the system toward a desired outcome (which I call “search” here, even though any feasible implementation would not be based on “search” as most think of it).

Model formation from sensory input clearly requires a lot of abstraction, which is a task that LLMs excel at. Indeed, LLMs already have models embedded in them, in the very way their networks are organized–although it is clear that they are currently insufficiently multi-modal to compete with our own models (of the physical world, for example).

To bring reasoning to our “AI”, we need robust methods of model extraction from multi-modal training data sets (which we seem to have made progress on), and we need robust optimization processes capable of incrementally driving systems from one state to another.

At least for the latter, LLMs just aren’t going to cut it.

New Directions

One of the most promising lines of research today involves so-called energy-based models.

Energy-based models assign a scalar "energy" to each possible state of a system. The lower the energy, the more "stable" or "desirable" the state. In physics, this concept appears naturally: a ball rolling down a hill is moving from a high-energy state to a low-energy state. In machine learning, we can train models to assign appropriate energy levels to different states.

What makes energy-based models so promising is that they provide a natural framework for reasoning: the process of moving from one state to another becomes an exercise in energy minimization. Rather than searching through an infinite graph, we can follow the gradient of the energy function toward more desirable states.

But even with energy-based models, we're still far from true artificial intelligence. What we need is a synthesis of:

The abstraction capabilities of LLMs
The multi-modal modeling of modern neural networks
The optimization framework of energy-based models
And probably something we haven't discovered yet

The True Believers want you to think we're on the verge of artificial general intelligence. The reality is we're barely at the starting line. Our current AI systems are impressive mirrors of human intelligence, but they're not intelligent themselves. They can abstract, but they can't reason. They can reflect, but they can't think.

Until we solve these fundamental challenges, AI will remain what it is today: a powerful but fundamentally limited tool for augmenting human intelligence, not replacing it.

Ziverge is a proud sponsor of LambdaConf 2025 (May 11 - 12th), where you'll be able to learn from AI experts and building your own agents. Take advantage of our readership code here.

Stop Pretending Your AI Can Think

Why Large Language Models Aren't What You Think They Are

Large Language Models

Behind the Curtains

No De Novo Reasoning

Abstraction Galore

Reasoning

Challenges of Reasoning

New Directions

Continue reading

Unleashing the Power of AI on Hoarded Data: How Apache Spark Transforms Enterprise Data Centers into Insight Engines

Stop Pretending Your AI Can Think

Why Large Language Models Aren't What You Think They Are

Large Language Models

Behind the Curtains

No De Novo Reasoning

Abstraction Galore

Reasoning

Challenges of Reasoning

New Directions

Continue reading

Unleashing the Power of AI on Hoarded Data: How Apache Spark Transforms Enterprise Data Centers into Insight Engines

Subscribe to our newsletter