AI progress over the last 3 years has been incredibly swift. In 2022 OpenAI released ChatGPT, the world’s first consumer product demonstrating a command of natural language comparable (but still inferior to) humans.
Previous models like GPT-2 had shown the ability to string together words into complete and somewhat coherent sentences, but nothing like an AI-powered chatbot that could have a somewhat intelligent conversation about nearly anything.
User: help me write a short note to introduce myself to my neighbor
ChatGPT: Hi there, My name is [Your Name] and I just moved in next door. I wanted to reach out and introduce myself. I’m looking forward to getting to know you and the neighborhood. If you ever need anything or just want to chat, please don’t hesitate to stop by or give me a call.
Best,
[Your Name],
User: can you make it more formal?
ChatGPT: Dear Neighbor, I am writing to introduce myself. My name is [Your Name] and I have recently moved in next door. I wanted to take a moment to reach out and say hello. I am looking forward to getting to know you and the neighborhood. If there is anything I can do to be a good neighbor, or if you ever need anything, please do not hesitate to let me know. I am happy to help in any way that I can.
Sincerely,
[Your Name]
Less than 4 months later they released GPT-4, which could write at a roughly highschool level and scored extremely highly on standardized tests measuring world knowledge.

Once it was clear that standardized tests like the SAT would no longer be useful for measuring AI capabilities, researchers began relying on standardized tests of their own, known as benchmarks. The Massive Multitask Language Understanding (MMLU) benchmark is a set of about 16,000 multiple-choice questions on 57 topics, from physics and mathematics to history, law, and philosophy. The creators found that unspecialized humans scored around 34% (only 9% higher than chance), while experts scored around 90% on questions in their domain. By 2024, GPT-4o achieved close to 90% accuracy on MMLU as well, “saturating” the benchmark and making it no longer a useful measure of progress.

At this point, it seems reasonable to say that current LLMs have memorized close to all of humanity’s documented world knowledge. So, benchmarks have shifted to measuring how well they can use that knowledge to reason and problem-solve in the real world. Benchmarks like SWEBench (Software Engineering Benchmark) measure a system’s ability to fix problems in real codebases. It presents AI systems with real, documented bugs and feature requests in open-source projects, similar to what human software developers see every day. Completing tasks requires the ability to understand and modify large codebases broken out over many files, instead of writing or editing small code snippets. As of February 14, the record solve rate is about 65% for standard software development problems.
Short-term future
There are 2 main improvements in AI capabilities that have already started and leading labs like OpenAI and Anthropic plan to build out more in 2025: inference scaling and agents.
Inference scaling allows LLMs to adjust how much time they spend thinking about a question based on its complexity. This is conceptually similar to giving a model a scratchpad where it can
- plan its approach,
- work through the problem step by step, and
- review and refine its answer.
Currently, this process might take minutes for complex tasks. As technology improves, models will be able to spend hours or even days working on especially challenging problems.

Labs are also building AI agents, which will be able to act independently to perform tasks over long periods. While conceptually similar and complementary to inference scaling, agentic capabilities allow the model to interact with the world through continuous cycles of planning, acting, and adapting based on results.
To interact with the world, agents will have access to tools that let them
- Write and execute code,
- Search the internet,
- Use a computer,
- Send emails,
- Access private files,
- and more.
These tools will be crucial for AI agents to work on complex projects like writing reports and building end-to-end software products. ChatGPT has been able to search the internet and execute Python code since March 2023, and computer use is currently in a research preview known as Operator.

This combination of increased intelligence and agency will pave the way for AI to take over much of the white-collar labor currently performed by humans. Once this accelerates the pace of robotics and material science research, blue-collar and manual labor jobs may not be far behind.
The precise extent of AI automation potential is still unknown, but even the most conservative forecasts predict a dramatic shift in the way humans work.
Tech executives are already planning to take advantage of these advancements. In a Joe Rogan interview from January, Mark Zuckerberg claimed that AI will be able to do the work of a “mid-level engineer” at Meta in 2025. Salesforce CEO Marc Benioff has similarly stated that his company does not plan to hire any more programmers, and Anthropic CEO Dario Amodei said in an interview with the Wall Street Journal that over the next three years, we will see models that are “better than us at almost everything.” Once these and other early adopters in tech show the rest of the economy that this is possible, other industries will follow suit.
But even if advances in intelligence stagnate, present technology already has the potential to displace a significant amount of white-collar work. A McKinsey report from 2023 found that in an early takeoff scenario, 50% of current work activities could be automated by 2030, and 90% by 2040. Even under conservative conditions, they expect 50% automation by 2045. Many other commentators like former OpenAI alignment researcher Leopold Aschenbrenner predict similar or even faster automation timelines.
Furthermore, McKinsey’s task-centric perspective is a more representative lens for analyzing AI automation. Current systems are not capable of eliminating entire occupations but do have the ability to automate whole categories of tasks, such as
- scheduling,
- filling out routine paperwork,
- performing routine financial calculations, and
- grading assignments.
Better integrating current technology would not radically alter how we work, but would eliminate a lot of tedium and routine administration, the same way computers and the internet did.

Human Bottlenecks and Possible Worlds
That being said, in the future, it is an open question whether artificial intelligence will surpass human abilities in everything or merely almost everything.
If we never achieve superintelligence and there remain some tasks humans are intrinsically superior at, like creative problem solving, these will become the primary bottlenecks to scientific, technological, and economic progress. But these skills are already scarce at the highest level, and current systems have already shown capabilities matching many mid-level people, so it is unlikely we can sustain a workforce on jobs requiring raw intelligence and problem-solving skills.
In this world people will not be the bottleneck to progress, only certain people who can outthink the machines and justify their relatively insane high cost.
(AI systems’ strengths in pattern recognition and simulation make it unlikely humans will maintain advantages in other niche cognitive abilities like spatial reasoning or long-term strategy. Similarly, economic pressure to develop AI that interacts well with humans will drive advances in psychology and neuroscience, leading to systems that match or exceed human emotional intelligence. If there is something that humans remain superior at, it will be a product of raw intelligence.)
Now consider the possible world where humans are not a bottleneck to progress, and human-level or superhuman AI will be able to replace all labor, including the work required to start, run, and manage corporations. Fully automated firms will be able to copy and paste elite workers thousands of times to work on thousands of problems at once. Firms will be able to allocate enormous mental energy to even the most trivial tasks by simply spinning up more instances of highly trained, perfectly attentive AI employees. As Dwarkesh Patel says in What fully automated firms will look like,
Copying will transform management even more radically than labor. It will enable a level of micromanagement that makes founder mode look quaint. [Alphabet CEO Sundar Pichai] simply doesn’t have the bandwidth to directly oversee 200,000 employees, hundreds of products, and millions of customers. But AI Sundar’s bandwidth is capped only by the number of TPUs you give him to run on. All of Google’s 30,000 middle managers can be replaced with AI Sundar copies. Copies of AI Sundar can craft every product’s strategy, review every pull request, answer every customer service message, and handle all negotiations - everything flowing from a single coherent vision.
In this world, humans become largely disconnected from economic progress, and it is not even obvious what “economic progress” means. If nobody is worth paying to work, then who will have the money to buy what these firms sell?
No one really knows how that would play out, but several possibilities have been floated.
- The government could institute a Universal Basic Income or outlaw certain types of labor automation, guaranteeing human jobs.
- Humans could revert to a subsistence economy made up of families and communities supporting each other.
- Automation and technological development could usher in an era of unimaginable material abundance, such that no one has to work for anything.
- Advancements in neural enhancement technology allow us to increase our own intelligence and rise to the level of the machines.
There are strong arguments against all of these possibilities, and none of them stand out as particularly likely. But the fact that any of these are remotely feasible shows how much this would reshape the world, even if we do not yet know exactly how.
Conclusion

A common and reasonable objection to these kinds of predictions is that they sound wildly futuristic and would require an impossibly high amount of technological progress over such a short period. On top of that, these claims are so alien from human lived experience that at a gut level they seem unreasonable. Generally, this is good reasoning, because most things that sound unreasonably crazy do not happen. But in this case, lived experience is not a good model for technological growth.
Since the Industrial Revolution, we’ve seen rapidly accelerating changes to the way humans live.
- In the 18th century, the mechanization of agriculture began, reducing the energy required to grow food, freeing up more people to work on other things;
- In the 19th century railways, the telegraph, and steamships drastically reduced long-distance travel times, reducing the cost of transportation and allowing for more specialization;
- In the first half of the 20th century, electricity and the rise of mass manufacturing lowered the cost of goods and further increased work efficiency;
- In the 1960s-80s mainframe and personal computers offload rout mathematical calculations;
- In the 90s the internet and cell phones began to transform the way we store knowledge;
- In the 2010s smartphones gave us the ability to contact anyone and learn anything from anywhere;
- and now in 2025 we have machines that win competitive math competitions and are better than most non-experts at tasks solvable by writing text.
The pace of transformative breakthroughs has compressed from centuries to decades, and now even to a handful of years. This acceleration isn’t an anomaly - it’s part of the fundamental pattern of human growth. Looking back, the world of 1900 might as well be an alien planet, and even 2010 feels like a different era entirely. We should expect nothing less dramatic from the decade ahead.