On the highway towards Human-Level AI, Large Language Model is an off-ramp.
Yann LeCun1
The future depends on some graduate student who is deeply suspicious of everything I have said.
Geoffrey Hinton2
I hold that AI has gone astray by neglecting its essential objective --- the turning over of responsibility for the decision-making and organization of the AI system to the AI system itself.
Richard Sutton3
Lots of articles are predicting what AI will look like in the future. They presume that progress in AI will continue as it has in the past. What if they are wrong? With AI acting stupidly4 and development showing diminishing returns5, the future of Large Language Models (LLMs) and other foundation models is—at best—uncertain. Let's assume AI has reached a dead end and triggered yet another AI winter6. I want to explore what AI might look like when the AI winter thaws and we start over from square one. Even if another AI winter does not occur, it is still useful to ask if there aren't better paths to reach human level AI (aka Artificial General Intelligence or AGI).
An AI-generated image of an AI phoenix rising again.
Appearing Intelligent verses Being Intelligent
Have we focused too much on appearing intelligent? Honey bees, humans, and a lot of animals in between not only appear intelligent, but each has adapted to their unique, sensed environment - they are intelligent. Animals are existence proofs of actual intelligence.
Artificial intelligence appears intelligent without the mechanisms that support intelligence. A clever puppeteer can make a puppet appear intelligent, but that does not make the puppet itself intelligent. Pinochio cannot hope to become a real boy until the strings to his puppetmaster are cut.
If we compare current AI with biological intelligence, we can see where AI has come up short and what a reboot might fix. Larger training sets and networks will not solve these shortcomings. We need more than just scaling, we need structural change. The only way to get to AGI is to start over from square one.
Fatal Flaws
Each of the items listed below represent a fundamental and fatal flaw of AI. I will address each in some detail.
Self-guided learning
Adaptation to Change
One-Shot, Feed-Forward Learning
Neural plasticity
Size, Weight, and Power
Meta-cognition
Innovation
Reasoning
Emergence of Peircean Semiotics
1. Self-guided Learning
AI learns no more than a book learns the words on its pages or that raw clay learns to assume the shape of a vase (see my Substack post AI Does Not Learn). A brain is not a container to be filled in the same way you fill your car with gas. Before understanding can occur, an intelligent student must actively engage. Knowledge is more than a list of facts to be memorized; it’s a network of interconnected concepts that require active organization. Using the word learn' in an AI context, as in machine learning, supervised learning, even self-supervised learning, is anthropomorphic hogwash. Those are all examples of programming in which all intelligence derives from a human.
When a worker honey bee is 4 weeks old, she takes her first flight by flying in circles around her own nest so that she can recognize what her home looks like when she returns from a forage site miles distant. Nobody tells her what her home looks like - it could be a hole in a tree or a boxy hive. Worker bees and human infants learn most things experientially (pre-linguistically), without explicit instruction. I doubt if language is even possible in humans that don't first gain a solid foundation of experiential, causal knowledge on their own.
A future AI cannot be intelligent unless it is also emotional (see my Substack post Why Emotion Matters). This is because any intelligent AI needs curiosity: to be self-motivated to gain knowledge and understanding. Emotion is a domain-agnostic mechanism for motivating our actions and guiding what we remember (learn). We have difficulty learning things we do not care about. But we keep memories of things that provide us with pleasure and cause us pain. I cannot recall the concept of 'honey' without also recalling the pleasure I get from eating it. I cannot recall the concept of 'iron skillet' without also recalling the time I grabbed a hot skillet with my bare hand. Every long-term memory in our brains is associated with a valance (good or bad) and a level of arousal (salient or unimportant to me). Low-arousal memories only become long-term memories if I force them into my memory through repetition.
2. Adaptation to Change
Paul Samuelson said, "When the facts change, I change my mind. What do you do, sir?" If you are a deep neural network, you do not change or adapt to new information after 'training' and deployment. Fixed weights in its artificial ‘neurons’ cause a neural network to repeat the same answer—whether or not it is correct.
Intelligence is nature's strategy for survival through adaptation. We are more fit because we adapt to change. Consider my definition of natural intelligence (see my Substack post What Is Natural Intelligence?):
Natural intelligence is an autonomous agent’s strategy for survival and adaptation to change. It works by 1) a small set of innate behaviors and 2) learned behaviors. Learning is a continuous process using sensed data and memory to create plans that improve future outcomes.
That definition could apply to a robot someday, but it disqualifies current AI from being intelligent.
3. One-Shot, Feed-Forward Learning
My view is throw [back-propagation] away and start again.
Geoffrey Hinton7
When knowledge engineers feed labeled training samples into an artificial neural network, the network calculates errors between each sample and what should have been calculated given the label assigned to that sample. This error is used to adjust weights in the neural network, thus reducing future errors. This algorithm is called back-propagation. Some complex models require over 10 million training samples. This processing of samples is slow and energy consuming. There is no evidence animals perform anything resembling back-propagation.
If a child burns her hand from a candle flame or a hot iron skillet handle, she will not make that mistake again. Her future well-being depends on an ability to learn a lesson in one fateful experience. We call this one-shot learning. There are claims of artificial neural networks with one-shot learning, but they all seem to come with conditions and limitations that biological cognition lacks.
4. Neural plasticity
Artificial General Intelligence (AGI) seeks to gain human level intelligence. We are not there yet for several reasons. One reason is that each AI application today is based on a different fixed foundation model optimized for a particular task. Examples of foundation models include:
Large language models or LLMs such as GPT-4, Gemini, or Claude
Vision models such as YOLO, Vision Transformers (ViT), or Segment Anything Model (SAM-2)
Generative vision models such as Dall-E, Stable Diffusion and MidJourney
DNA, RNA and protein foundation models such as Evo, AlphaFold, or ESM
These architectures do not change during training or deployment and they can perform unreliably at the margins of their narrow task domain or training set. When this occurs, we say the system is brittle. Each foundation model’s optimization for a narrow target domain prevents it from being general purpose.
Interestingly, GPT-4 is not one model but eight smaller ones glued together into a “hydra” model. This is one way around the problem inherent in static architectures: just glom a bunch of different models together. For background see Alberto Romero’s Substack post GPT-4's Secret Has Been Revealed.
Neural plasticity is the nervous system's capacity to change its structure and function in response to experiential change and injury. This enables adaptation to changing environments, aging, and pathological insult. A few examples of neural plasticity in humans includes:
Taxi drivers in London have a larger posterior hippocampus because of demands to navigate the complex streets of London8.
Literate people are less adept at recognizing faces than illiterate people. Scientists show that gaining competence in reading results in a volumetric displacement of brain matter used to recognize faces9.
People blind from birth can hear and feel with greater acuity because their brain can use the part of the brain normally used to process vision10.
The architecture of a future AI should not be determined by a narrow foundation model or even multiple models. Rather, the internal architecture of an artificial brain should configure and optimize itself based on the structure of its interaction with its sensed environment—its umwelt. Removing the constraints of a fixed, imposed architecture will go far to reduce brittleness in AI systems.
5. Size, Weight, and Power
In 2011, IBM's Watson defeated the top Jeopardy! human players. Watson required ninety IBM Power 750 servers, each of which required around one thousand watts of power. This estimate does not include the cooling required to remove excess heat from server farms. Contrast Watson’s 90,000+ watts with each human contestant’s brain of roughly 20 watts.
Since 2011, the computational complexity of AI has only grown and so has its power demands. Researchers have estimated that a single ChatGPT query requires almost 10 times as much electricity to process as a traditional Google search11. A 2024 Bloomberg headline says it all: "Microsoft AI Needs So Much Power It's Tapping Site of US Nuclear Meltdown"12. Microsoft has agreed to buy all power for 20 years from a restarted Three Mile Island nuclear power plant that was closed in 2019 and slated for decommissioning.
I don't know what else to say on this matter. AI’s growing power demand is a step in the wrong direction. The solution is compartmentalized or contextualized processing (focus of attention) and finding an alternative to back-propagation.
6. MetaCognition
I recently chatted with the Claude 3.5 sonnet AI from Anthropic. It claimed to be the most intelligent LLM in the world in 2024.
This response shows that Claude is either a master of ironic comedy or, more likely, it lacks self-awareness. The ability to monitor and evaluate one’s own thoughts is called metacognition — it is thinking about one's own thoughts. Claude lacks metacognition.
Metacognition serves a useful purpose. Sometimes, a task has a reward if successful and a cost if unsuccessful. Weighing the cost-benefit versus the uncertainty of success allows animals to either avoid a costly failure and perhaps postpone a task until more knowledge or skill is available. Metacognition increases the overall success rate of learned tasks.
Honey bees, with brains of less than one million neurons, display at least two examples of metacognition (see my Substack post Are Honey Bees Conscious?). So perhaps a lot of animals—besides humans—have metacognition. Claude would certainly benefit from it.
7. Innovation
An innovation transcends perception, experience, and reality itself. Innovative or transcendental concepts include dreams, inventions, jokes, the humanities, fantasies, categories, our sense of time and space, mathematics, conspiracy theories, tools, and symbols. Yuval Harari captured the essence of transcendental concepts when he said, "There are no gods in the universe, no nations, no money, no human rights, no laws, and no justice outside the common imagination of human beings."
Chimpanzees, crows, and tiny Galapagos woodpecker finches innovate. Ethologists have observed each of these species using twigs to extract protein-rich insects from tiny holes in logs. These animal innovators imagine twigs as things that transcend perceptual concepts — the twigs become tools for getting food.
AI does not innovate. Human knowledge engineers are the sole source of knowledge and creativity in AI. AI is auto-complete on steroids.
There is nothing supernatural about innovation. For more on the mechanics of innovation, see my Substack post The Story of Intelligence.
8. Reasoning
Current deep learning is most successful at perception tasks and generally what are called system 1 tasks. Using deep learning for system 2 tasks that require a deliberate sequence of steps is an exciting area that is still in its infancy.
Yoshua Bengio, Yann LeCun, and Geoffrey Hinton13
A lot of so called 'reasoning' that takes place in artificial neural networks is memory retrieval or reasoning-by-association. It is not reasoning-by-causality.14
I speculate that our faculty for problem solving, tool making, and symbolic communication evolved from our nonhuman ancestors’ navigational abilities. I do not think it a coincidence that the best navigators of the animal world are also some of the best tool users and social communicators. The table below illustrates how navigational problems map onto other domains, such as baking a cake.
In the baking questions above, if we posed the same questions to Claude or GPT-4, we would get reasonable answers. However, that’s only because the Large Language Models (LLMs) have already processed thousands of answers vacuumed from the internet. AI is parroting answers—it is memory retrieval, not reasoning. Change the baking domain to a domain without answers on the Internet and the LLMs become clueless or hallucinate (See my Substack post Funny Flubs).
In 2015, IBM commercialized Watson, the AI program that championed Jeopardy!, into IBM Watson Health. Its purpose was to assist physicians in the diagnosis of diseases. On January 2022, IBM sold the unprofitable business to a private equity firm. Critics claimed that “its supercomputer-aided analysis of health data merely compiled existing knowledge without producing new insights”15. Game shows like Jeopardy! are based on providing simple answers to short questions. But treating sick people is complex and context-dependent. People may get the same finite set of diseases, but everyone gets sick and responds to medicines differently, so treatment needs to be tailored for each patient. Doctors need to memorize a lot of facts, but they also need to originate creative and reasoned treatments. AI is neither innovative nor does it reason.
To understand how humans solve problems, we might begin by asking how honey bees navigate (see my Substack posts Beyond AI, A Honey Bee's First Orientation Flight, and Trails in the Sky, Cyborgs, & Empathy). All the navigational problems in the table above presume an allocentric perspective (not an egocentric or transcendental perspective). Honeybees and all animals that navigate use allocentric representations. (see my Substack post The Story of Intelligence).
9. Emergence of Peircean Semiotics
How does a sunset or a poem bring tears to my eyes? How did language originate? How do symbols and their meaning originate within a non-representational brain? Semiotics is the study of signs and the communication of meaning. But it gets complicated because we are using language to describe how language works.
Philosophy, psychology, and the empirical sciences rely heavily on language. Only physics and chemistry—the hard sciences—seek explanations independent of language. I want the same thing for AI, language, and meaning. To understand language, I can only trust an explanation that does not itself require language (see my Substack post Strange Loops). The answer is a physical model based on the semiotic theories of Charles Sanders Peirce. This and the invention of language are addressed in a future post.
Scaling
Many AI proponents claim we can scale our current AI technology to AGI, — they just need more data and larger language models (and refurbished nuclear power plants). Reviewing the nine issues above reveals that none of them are achievable by scaling. Each one requires a fresh approach to AI. AI relates to natural intelligence in the same way that internal combustion engines relate to electric motors. Each pair shares a similar output (intelligent behavior and torque), but you cannot scale, tweak, or evolve one into the other. Starting from square one is the only option.
An AI Winter is Coming
AI in 2025 has lots of business and investment people expecting future wealth from their AI investments—they are perhaps the least objective or qualified individuals to evaluate the technology. Because of that, money will continue to pour in to support AI businesses having no sustainable business model…until one day it doesn't. The Microsofts and Nvidias will survive, but their stock prices will plummet. The companies that are not yet profitable will evaporate. How do I know? I sold my AI software company, Big Science Company, and moved to Silicon Valley right before the last AI Winter (see My AI Investment Strategy). Another AI Winter is inevitable because the current direction of AI technology can never catch up to the hype.
How Do We Get To a Digital Implementation of Natural Intelligence?
At the very least, I hope I’ve shown you we need to rethink our approach to AI completely. But criticism is the simple part. The hard part is providing something that at least has the potential to work better than current AI.
We already have real working models for the nine features of natural intelligence given above. They are called model organisms: round worms, fruit flies, honey bees, rats, chimpanzees, and you and I. All we have to do is to understand how intelligence works in nature. Easier said than done.
Given how little we know about the brain16, a reductionist approach is not the answer. A more promising approach is to seek a high-level framework for understanding cognition. There is some progress in this area. Jeff Hawkins' Thousand Brains Theory of Intelligence comes to mind…though it is not a theory of intelligence as much as it is a theory of how the neocortex works.
My strategy is to first understand cognition in simple brains at algorithmic, computational, and adaptational levels and then trace their evolution into more complex high-level models of cognition. The result is my EvoInfo Model of Cognition.
Later this year, I hope to release Python code that offers one alternative approach to AI that complements the EvoInfo model, implements the first six properties summarized above, and anticipates support for the remaining three properties. I have named the project Nascent Networks. More on that later.
I Seek Your Advice and Collaboration
In the meantime, I am seeking professionals and laypersons interested in following or contributing to a back-to-the-drawing-board redo of AI. Please DM about yourself, your ideas, questions, articles, and other individuals and groups I should know about.
I also welcome your comments.
Yann LeCun (@ylecun) post to X on 2/4/2023
See my Substack post Funny Flubs
An AI Winter is a massive and prolonged withdrawal of investment money from AI technology companies and the subsequent crash of AI businesses. It has occurred at least twice before when investors finally figured out that hype got ahead of technological promise and financial rewards. I sold my second AI start-up company in 2000 during the last AI boom. I moved employees and family from Atlanta to Sunnyvale. Silicon Valley was in full "reality distortion zone" mode. When we arrived, you could not find a place to live. When I left less than two years later, you could not find a U-Haul truck to save your soul. That AI winter ushered in the demise of logic and rule-based AI. In spite of that, twenty-five years later, my emotionally-intelligent customer support chatterbot is still a product (See my Substack post The Illusion of Intelligence).
Woollett, K., & Maguire, E. A. (2011). Acquiring “the Knowledge” of London’s Layout Drives Structural Brain Changes. Current Biology, 21(24), 2109–2114. https://doi.org/10.1016/j.cub.2011.11.018
Dehaene, S., Pegado, F., Braga, L. W., Ventura, P., Filho, G. N., Jobert, A., Dehaene-Lambertz, G., Kolinsky, R., Morais, J., & Cohen, L. (2010). How Learning to Read Changes the Cortical Networks for Vision and Language. Science, 330(6009), 1359–1364. https://doi.org/10.1126/science.1194140
Laurent A. Renier, Irina Anurova, Anne G. De Volder, Synnöve Carlson, John VanMeter, Josef P. Rauschecker. Preserved Functional Specialization for Spatial Processing in the Middle Occipital Gyrus of the Early Blind. Neuron, 2010; 68 (1): 138-148 DOI: 10.1016/j.neuron.2010.09.021
Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021.
Bishop JM. Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It. Front Psychol. 2021 Jan 5;11:513474. doi: 10.3389/fpsyg.2020.513474. PMID: 33584394; PMCID: PMC7874145.
Also see my Substack post Does AI Understand?
“AI Dream Fails”, Science, Vol 375, Issue 6579, January 28, 2022.
Some might dispute my claim that we know very little about the brain. Sure, there are libraries filled with books about the brain. But after all that, if AI is still the best simulation we can come up with, then we still have a long way to go. If I cannot build it, I should not claim to understand it. See my Substack post Does AI Understand?.
The practical problem is that developing real intelligence requires people who know many different technical disciplines at the same time.
Very good post. First thing I have ever read by an AI expert that actually seems to understand all that will be truly needed for AGI. I've come up with most of your "fatal flaws" on my own (I'm an academic in a hard science), and have been mystified why none of them seems to be considered at all by the rah-rah crowd.
I predict that the coming "AI winter" (great term!) is going to be very long (decades at least, possibly centuries).