Daniel A. Sabol Ph.D., MSLIS., MS., CKM

Have We Reached the Best AI Can Do?

An Exhaustive, Evidence-Based Assessment

Introduction

The evolution of artificial intelligence (AI) is a story that’s as much about human hope and anxiety as it is about algorithms and silicon. AI has permeated nearly every facet of modern life, transforming industries, catalyzing scientific discovery, and sparking debates in boardrooms, classrooms, and living rooms alike. The prevailing question, however, remains as provocative as ever: Have we reached the ceiling of what AI can achieve, or is there still untapped potential lying just over the horizon? With so much at stake—intellectually, economically, and ethically—this report aims to deliver a no-nonsense, thoroughly researched answer, steering clear of both naïve optimism and alarmist fatalism.

This work draws on a diverse mix of peer-reviewed research, real-world industry commentary, and direct expert opinion. It examines not only AI’s technical progress in language, vision, robotics, and multimodal domains but also the sociotechnical headwinds, data and compute bottlenecks, and ethical concerns shaping the next phase of AI. Rather than simply cataloguing achievements, it scrutinizes AI’s genuine limitations, what makes progress hard, and why the “next big leap” may require more than just bigger models or more data.

The Elusive Dream of Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) represents the theoretical endpoint of AI development—an entity as adaptable, creative, and self-directed as a human, capable of learning any intellectual task (Goertzel & Pennachin, 2007). AGI is not just a more powerful version of today’s systems but a qualitatively different phenomenon, able to generalize knowledge across domains, reason causally, and perhaps even develop self-awareness. Despite the feverish interest—and the breathless headlines—AGI remains, as of 2025, largely a mirage on the horizon.

No matter how impressive individual AI systems may seem, none demonstrate the broad, flexible, real-world competence of even an average child. AlphaGo, for example, changed the world’s perception of what was possible in board games but can’t navigate a kitchen or understand a joke (Silver et al., 2016). GPT-4 and its descendants are dazzling conversationalists in one moment and embarrassing fabulists in the next, regurgitating plausible-sounding but incorrect information outside the guardrails of their training (Bender et al., 2021; Ji et al., 2023).

Most experts in the field acknowledge this fundamental gap. In a landmark survey of AI researchers, Zhang et al. (2024) report that only 12% expect AGI within a decade, while most anticipate a protracted path—one dependent on major breakthroughs in reasoning, abstraction, memory, and perhaps even a new understanding of what intelligence is. Bengio (2023) is especially frank, noting, “Current AI can do a lot, but it’s not anywhere near human-level generalization or reasoning. Scaling a transformer won’t suddenly make it an Einstein.” Hassabis (2023) similarly concedes that the path to AGI likely involves more than just more data and compute, but novel scientific insights and architectures. Marcus and Davis (2020) sum up the consensus: today’s neural networks are “pattern matchers,” not “general reasoners.”

The critical distinction here is that biological intelligence is built on a foundation of lifelong learning, embodied experience, and transfer across wildly different contexts. Today’s most advanced AIs, by contrast, thrive on vast but static datasets, learning to predict and remix patterns within the boundaries of what they’ve already seen. As such, their “intelligence” remains bounded, fragile, and ultimately dependent on human-defined context and curation (Bengio, 2023).

The Reign and Limits of Narrow AI—And Why Bigger Isn’t Always Better

While AGI is, for now, an elusive fantasy, narrow AI has rapidly become a transformative reality in almost every sector of society. From chatbots that rival college-educated writers, to image generators that turn text prompts into plausible “masterpieces,” to diagnostic tools that can spot disease at a superhuman pace, the present of AI is defined by models that are astonishingly powerful—within strictly defined boundaries.

Large language models such as GPT-4, Gemini, and Claude have achieved feats once deemed impossible. They produce summaries, answer questions, translate languages, write code, and even pass standardized tests (Brown et al., 2020; OpenAI, 2023). AlphaFold has revolutionized biology by predicting protein structures more accurately and quickly than any previous system, opening new doors in drug discovery and molecular biology (Jumper et al., 2021). In radiology and pathology, computer vision models have surpassed humans on certain diagnostic tasks, sometimes reducing error rates by orders of magnitude (Geirhos et al., 2020).

And yet, there’s a catch. The very success of these models exposes their limitations. Every impressive demo comes with an asterisk: GPT-4 writes persuasive essays but hallucinates facts, invents references, and fails at logical puzzles just outside its training distribution (Bender et al., 2021; Maynez et al., 2020). Vision models outperform humans on standardized datasets but fail spectacularly when confronted with unexpected distortions, rare diseases, or adversarial attacks (Geirhos et al., 2020). Robots can assemble car parts with superhuman speed, but struggle to fold laundry or operate in the messy chaos of a real kitchen (Kober et al., 2013).

The core problem is “brittleness”—an inability to generalize robustly outside the carefully curated sandbox of the training environment (Marcus & Davis, 2020). The world is infinitely more unpredictable, subtle, and interconnected than any dataset can capture. In practice, this means that as soon as an AI system is deployed in a new, unconstrained context, its performance drops—sometimes dramatically.

The doctrine of “scaling laws”—the idea that bigger models, more compute, and more data will keep driving exponential progress—has underpinned much of AI’s recent success. GPT-3’s leap over GPT-2, and GPT-4’s leap over both, seemed to validate the belief that scale is the secret sauce (Brown et al., 2020). However, even the biggest optimists now acknowledge that the easy wins are ending.

First, the availability of high-quality training data is finite. Most of the internet’s “good stuff” has been scraped; synthetic data (produced by other AIs) creates feedback loops and risks model collapse, where errors and biases compound over time (OpenAI, 2024; Shumailov et al., 2023). Second, compute is a massive bottleneck. Training state-of-the-art models consumes astronomical amounts of energy, requires specialized hardware, and is dominated by a handful of big tech firms (Thompson et al., 2023). Third, and most fundamentally, recent research has shown that transformers and related architectures plateau on certain tasks no matter how large they grow—especially when problems require abstraction, causal reasoning, or the construction of new knowledge (Marcus, 2022; Bengio, 2023).

In other words, scaling has hit the law of diminishing returns. As Bengio (2023) put it: “No amount of scaling will get you past a fundamental lack of structure.” Future gains will depend not just on more data or bigger hardware, but on breakthroughs in architecture, training, and integration with other forms of intelligence.

Reasoning, Causality, and the Reality of AI Creativity

Perhaps the most persistent gap between AI and human intelligence is reasoning—especially abstract, causal, and multi-step reasoning. While recent large language models (LLMs) demonstrate striking performance in “chain-of-thought” prompting and stepwise solution explanation, their ability to genuinely reason remains an open challenge (Wei et al., 2022). When tasks move beyond pattern completion or require long-term memory, symbolic manipulation, or the navigation of counterfactuals, LLMs quickly lose the plot (Kojima et al., 2022).

Formal benchmarking has revealed that even state-of-the-art models, when faced with math word problems, logic puzzles, or scientific reasoning tasks, tend to produce confident but flawed explanations, make errors of logic, or contradict themselves within a single response (Creswell et al., 2022). The limitations are not just technical; they reflect the fact that today’s dominant architectures—transformers—are fundamentally designed for pattern recognition, not symbolic reasoning or world modeling (Marcus, 2022).

Prominent researchers such as Chollet (2019), LeCun (2022), and Marcus (2022) argue that closing the reasoning gap will require fundamentally new ideas. Potential solutions include neural-symbolic hybrids, architectures with explicit memory and retrieval mechanisms, or agent-based models that actively experiment with and learn from their environment. Until then, AI remains a “statistical mimic,” impressive at guessing but unreliable as a genuine partner in deduction and discovery.

If reasoning is one of AI’s weak spots, creativity is where the hype machine really kicks in. In recent years, generative AI has stormed the worlds of art, music, literature, and even science. LLMs now write novels and code, diffusion models generate “original” artworks, and music AIs compose tracks in every conceivable genre. Social media and news cycles are awash with claims that AI has “democratized creativity” or blurred the lines between human and machine-made content.

But is this true creativity, or merely remix at scale? Elgammal et al. (2017) and Bender et al. (2021) contend that generative models are fundamentally statistical engines: they recombine, interpolate, and stylistically mimic the patterns in their training data, but they rarely, if ever, produce genuine novelty or breakthrough ideas. As Mitchell (2023) puts it, “AI is currently world-taking, not world-making.”

Attempts to deploy AI for scientific creativity—like hypothesis generation or paradigm-shifting theory development—have so far produced incremental, not revolutionary, advances (Valle et al., 2022). The models excel at filling in gaps, finding correlations, and suggesting plausible extensions of what is already known, but they struggle with the kind of conceptual leaps, reframing, or synthesis that mark the greatest moments in human creativity.

That said, AI’s ability to remix, draft, and iterate is still a massive boon to human creators. Used as a tool or co-pilot, it can inspire new directions, accelerate brainstorming, and help creators overcome blocks or generate variations at scale. It is less an independent artist than a turbocharged collaborator—powerful, inspiring, but not quite the next Picasso.

Embodiment, Multimodal AI, and the Hard Realities of Data, Bias, and Synthetic Feedback

The gap between AI’s “brains” (software) and “bodies” (robots) is one of the most striking—and stubborn—challenges in the field. The dazzling progress of AI in language and vision has not been matched by advances in robotics. Robots excel in repetitive, structured environments such as warehouses and factories, where tasks are predictable and can be hard-coded (Kober et al., 2013). But place these same robots in a child’s bedroom, a crowded kitchen, or a bustling street, and their performance rapidly degrades.

Researchers attribute this to a series of fundamental barriers: sensorimotor control, robust perception in the wild, dynamic adaptation, and especially the transfer of skills from simulation (where robots “learn” in virtual worlds) to real-world environments with noise, uncertainty, and change (Jakobi et al., 1995; Zeng et al., 2018). Even Boston Dynamics’ Atlas, the most famous humanoid robot, is choreographed for public demonstrations and not yet autonomous in the way most science fiction imagines.

Physical embodiment adds complexity: real-world robots must contend with friction, wear, limited battery life, and safety constraints. The dream of a home assistant or true general-purpose robot remains distant, with most real progress happening in narrow, tightly-constrained niches. The vision of an “embodied AGI”—a system as adept in the physical world as the digital—may require not just better AI, but breakthroughs in materials science, mechanics, and real-time learning.

The emergence of multimodal AI models—systems that can “see,” “hear,” and “talk” simultaneously—marks one of the most exciting shifts in recent AI research. Models like GPT-4-V, Gemini, and others can process and integrate text, images, and even video, enabling new applications from education to accessibility (Google DeepMind, 2023; OpenAI, 2023). For instance, visually impaired users can now receive real-time descriptions of their environment, and educators can build interactive lessons that blend text, images, and sound.

Despite these advances, the underlying limitations remain. Multimodal models often make errors that reveal a lack of real-world grounding: mislabeling objects, misunderstanding visual context, or making “common sense” mistakes a human would never consider (Yuan et al., 2021). They lack the kind of embodied, physical knowledge that even a toddler uses to navigate the world—knowing, for example, how a cup will feel to the touch or what happens if you drop it.

Researchers like Lake et al. (2017) argue that true multimodal intelligence requires more than just bigger models and more data; it demands new ways of integrating perception, reasoning, and action—perhaps combining symbolic logic with neural networks, or embedding AI in bodies that can act and learn in the physical world.

No discussion of AI’s future is complete without reckoning with the limitations of data and the dangers of bias. Modern AI models are voracious consumers of data, but the supply of high-quality, diverse, human-generated content is running dry (OpenAI, 2024). The rise of synthetic data—AIs training on the outputs of other AIs—creates the risk of “model collapse,” where errors, hallucinations, and biases are amplified over generations (Shumailov et al., 2023).

Bias is more than a technical quirk; it is a profound societal challenge. AIs trained on real-world data inevitably learn the prejudices and blind spots of their creators and societies. Studies such as Buolamwini and Gebru (2018) reveal how commercial face recognition systems are less accurate on darker-skinned faces, and language models can perpetuate stereotypes about gender, race, and nationality. Efforts to “debias” AI are ongoing but incomplete, and as AI is deployed in sensitive domains such as hiring, policing, and healthcare, the stakes are only rising (Mehrabi et al., 2021).

Data is also a source of fragility. Small changes in input—such as a typo, a visual distortion, or an adversarial example—can cause models to fail spectacularly (Geirhos et al., 2020). As AI is deployed at scale, these edge cases become not just curiosities, but risks to safety and trust.

Ethics, Alignment, Regulation, and the Road Ahead

If data, bias, and embodiment pose technical barriers, ethics and alignment present arguably the most existential challenge for AI. The “alignment problem” is the task of ensuring that advanced AI systems act in accordance with human values, goals, and safety expectations—not just when things go as planned, but especially in complex, novel, or high-stakes situations (Gabriel, 2020).

The issue is that as AI systems become more powerful and autonomous, their decision-making becomes less transparent, harder to audit, and more likely to diverge from the nuanced intentions of their users or designers. Black-box models, for all their capabilities, often yield results that are unpredictable or uninterpretable even to their creators (Ziegler et al., 2020). The possibility of “reward hacking”—where an AI exploits poorly specified goals, or finds shortcuts that technically fulfill its objectives while violating their spirit—is a well-documented risk (Amodei et al., 2016).

Efforts to address alignment include reinforcement learning from human feedback (RLHF), interpretability research, and the development of ethical and safety standards. But these methods remain works in progress. As Gabriel (2020) points out, aligning AI with the full diversity of human values may be an “open-ended” challenge, especially as systems are deployed in different cultures, legal regimes, and moral frameworks.

Governments and regulatory bodies are rapidly stepping into the AI arena. The European Union’s AI Act is the most comprehensive attempt to date, introducing tiered risk classifications, bans on certain uses (such as social scoring and facial recognition in public spaces), and requirements for transparency, auditability, and human oversight (European Commission, 2024). In the United States, the White House’s Blueprint for an AI Bill of Rights signals a growing emphasis on accountability, privacy, and non-discrimination (White House, 2022).

China, meanwhile, is moving swiftly to standardize and monitor AI at both the infrastructural and application levels, embedding its own priorities around security, censorship, and national competitiveness. Across the globe, there is a growing consensus that laissez-faire AI deployment is no longer tenable; instead, regulation, auditing, and robust governance are quickly becoming the norm.

Still, these efforts are often reactive, struggling to keep up with the pace of technological change. As with past industrial revolutions, laws and social norms lag behind the innovations that transform daily life (Crawford, 2021). A key question for the next decade is whether regulators can balance innovation with oversight—fostering a climate where beneficial uses of AI flourish while curbing the risks of misuse, unintended consequences, and concentration of power.

The social implications of AI reach far beyond technical performance. Surveys by Pew Research Center (2023) consistently find a mix of excitement and anxiety among the public: enthusiasm for health and education breakthroughs, counterbalanced by fears of job loss, privacy invasion, misinformation, and the erosion of human skills. The spread of deepfakes, the automation of creative work, and the proliferation of biased or opaque algorithms in areas like credit, hiring, and criminal justice have all raised the stakes for AI’s societal acceptance and legitimacy.

Experts warn that public trust is fragile. High-profile failures, scandals, or abuses of AI can rapidly undermine support, fueling calls for bans, moratoria, or aggressive regulation. Conversely, successful, transparent, and ethically-grounded deployments can pave the way for broader societal benefits (Pew Research Center, 2023).

What, then, is required for AI to break through its current limits? While the field has not hit an absolute wall, the next phase of progress will likely demand new science as well as smarter governance. Here are the emerging themes shaping the outlook:

Most experts agree that the future will require more than just bigger transformers. Neural-symbolic hybrids, agent-based models, memory-augmented networks, and advances inspired by neuroscience are all in active development (Chollet, 2019; LeCun, 2022). Research is increasingly focused on building systems that can reason, generalize, and act with minimal data—mimicking the remarkable sample efficiency of human learners (Lake et al., 2017).

With synthetic data on the rise, methods for ensuring data quality, diversity, and relevance are paramount. New benchmarks are…needed to test AI’s abilities at abstraction, transfer, and real-world problem-solving—not just statistical completion of known patterns (Valle et al., 2022). The most impactful AI advances are likely to come from “AI + human” partnerships, where machines augment human judgment, creativity, and empathy rather than seek to replace them (Mitchell, 2023). Co-creation, explainability, and interactive design will be key.

Regulation, auditing, and ethical review must become standard practice in both industry and research. Transparent standards, inclusive public deliberation, and international cooperation will shape AI’s trajectory as much as hardware and code. Alignment and safety work, such as red-teaming, interpretability, adversarial testing, and diverse stakeholder engagement, are all needed to reduce the risk of unintended consequences or catastrophic misuse (Amodei et al., 2016; Gabriel, 2020).

Conclusion: A Technology Still in Adolescence

To sum up: AI has not yet reached its best. The field is in a period of rapid but uneven maturation—an “awkward adolescence” marked by both spectacular achievement and sobering limitation. The easy wins from scaling are ending, but the potential for creative, world-changing breakthroughs remains. Realizing that promise will require humility, rigor, and a willingness to reinvent our approach at every level: technical, ethical, and societal.

AI’s greatest power may yet be as a mirror—forcing us to reckon with the nature of intelligence, creativity, and the values we choose to encode into our tools. The question is not whether AI has peaked, but whether we, as a society, are prepared to shape its future wisely and well.


References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mane, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

Bengio, Y. (2023, December). Why scaling isn’t enough [Keynote address]. NeurIPS 2023, New Orleans, LA, United States.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

Creswell, A., Shanahan, M., & Barrett, D. (2022). Selection-inference: Exploiting large language models for interpretable reasoning. arXiv preprint arXiv:2212.10559.

Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). CAN: Creative adversarial networks, generating “art” by learning about styles and deviating from style norms. arXiv preprint arXiv:1706.07068.

European Commission. (2024). The Artificial Intelligence Act.

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437.

Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2020). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 33, 23820–23830.

Google DeepMind. (2023). Gemini: Our largest and most capable AI model.

Goertzel, B., & Pennachin, C. (2007). Artificial general intelligence. Springer.

Hassabis, D. (2023, October). The path to AGI [Conference session]. London AI Summit 2023, London, United Kingdom.

Jakobi, N., Husbands, P., & Harvey, I. (1995). Noise and the reality gap: The use of simulation in evolutionary robotics. European Conference on Artificial Life, 704–720.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., … & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1–38.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.

Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274.

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Lu, D. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916.

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, e253.

LeCun, Y. (2022). A path towards autonomous machine intelligence version 0.9.2. OpenReview preprint.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

Marcus, G. (2022). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631v3.

Marcus, G., & Davis, E. (2020). Rebooting AI: Building artificial intelligence we can trust. Vintage.

Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1906–1919.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35.

Metaculus. (2025). Will we have broadly capable AI (AGI) by 2030?

Mitchell, M. (2023). Artificial intelligence: A guide for thinking humans (2nd ed.). Penguin.

Pew Research Center. (2023). Public’s views of AI in 2023.

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2022). Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125.

Shumailov, I., Kolesnikov, A., Zhmoginov, A., & Feder, A. (2023). The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., … & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.

Thompson, N., Spanuth, S., & Coble, M. (2023). The compute divide: Lessons from the frontiers of AI. Brookings Institution Report.

Valle, E. D., Hildebrandt, T., & Schmidt, A. (2022). Artificial intelligence for scientific discovery: The next frontier. Nature Reviews Physics, 4, 105–123.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., … & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

White House, Office of Science and Technology Policy. (2022). Blueprint for an AI Bill of Rights.

Yuan, L., Fu, Y., Shen, W., Zhang, L., & Luo, J. (2021). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7090–7108.

Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, J., & Abbeel, P. (2018). Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. arXiv preprint arXiv:1710.01330.

Zhang, M., et al. (2024). AI researcher survey: Will scaling large language models achieve AGI? AI Magazine, 45(2), 50–67.

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T., Radford, A., Amodei, D., & Christiano, P. (2020). Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.

Other Posts

Verified by MonsterInsights