The Senior Decision Maker

Artificial Intelligence in 2026

Peter Eklind — Tue, 23 Dec 2025 10:18:14 GMT

At the outset of 2025, for many people, generative artificial intelligence meant one thing: a chatbot. Specifically, OpenAI’s ChatGPT—an advanced but sometimes unreliable heir to Google Search. Progress was measured by the increasingly clever questions we tried to stump it with, a game that produced equal parts magic and equal parts slop. That era is gone.

As the year comes to an end, the chatbot has reached maturity. In most use cases, it gives an accurate response—free from the misunderstandings, biases, and hallucinations that once characterised it. Progress has moved on to other frontiers: coding and the construction of complex systems that act as AI agents.

What once amazed us, we now take for granted. For my entire life, computers have solved difficult problems—including complex mathematical calculations. Now, that includes problems from the International Mathematical Olympiad. So what? When we Google, an AI response appears before we can click through to Wikipedia. We have all become AI users. And precisely because of the familiarity, we perceive it as a plateau—and if AI progress is plateauing, surely that means we’re in an AI bubble, right?

Welcome to the Senior Decision Maker’s third annual analysis of the year in AI—written for business leaders on the front line of progress, who aim to stay a step ahead of what mainstream media channels can offer. As usual, I’ll reflect on the year behind us and the year ahead, with five bold predictions for 2026—and one prediction of what we won’t see. New this year is that AI will evaluate and rate my predictions from last year. It led to some drama and nearly had me rage-cancel my OpenAI subscription. That aside, I kept my streak of being more right than wrong, and you can compare it with previous editions here:

The Year of Vibe Coding and Agents

The term “vibe coding” didn’t exist a year ago, and it may not survive a year from now. Yet it encapsulated 2025. Coined by Dr Andrej Karpathy in February, it means coding without writing any code yourself—letting an AI write the code according to your instructions. Even your elderly relative, while struggling to master the TV remote, can now produce code that would have won competitions a year ago.

When I ran corporate AI training in early 2025, participants were assigned to code a simple snake game—the one made famous by the Nokia 6110 back in 1997—from a single-sentence prompt. Now, at year’s end, we can do much more advanced things, just as easily. I built this interactive Minecraft island from a single sentence using Claude Opus 4.5 – the result appeared directly in the chatbot window.

This opens a question: what can you do when writing code becomes trivially easy and fast? Could you build features—even entire applications—just-in-time, as you need them? Do you really need all those SaaS subscriptions, or can you build the applications yourself?

Incorporating employees’ home-tinkered applications quickly and easily into a complex and well-guarded corporate IT environment remains science fiction. But all they need is their own smartphone, and they can get it done today. For IT Security, it is the worst nightmare.

It gets worse. With access to enough compute, it becomes possible to replicate a competitor’s entire codebase, built from the ground up with a modern architecture, without the legacy and bugs. After that, all that is needed is to add a few visible improvements, dump the prices, and launch a marketing campaign. Elon Musk grasped this and started the company Macrohard.

If Microsoft—the world’s fourth-largest company by market cap—is susceptible to this, smaller companies should worry. It won’t happen tomorrow. Yet for many, this threat remains entirely off the radar.

What companies have on the radar, though, are AI agents that can perform the work of employees. I have been writing about them since early 2023. At that time, their performance was miserable, and they were good for nothing. That has gradually changed, not just because the AI models have become smarter. They have become better at using tools.

But for an AI to use a tool, it needs an interface to access it. Enter the Model Context Protocol (MCP)—think of it as a universal USB-C connector for AI. It’s an open standard for how AIs communicate with tools, systems, and data sources, bridging the gap between the model and the world outside. It does so in a general, flexible way, unlike traditional application-to-application connections (APIs), which require custom integration for each connection. Anthropic developed MCP in November 2024, and surprisingly, the industry adopted it quickly without fracturing into competing standards. The results are starting to emerge. AI agents today can create documents, send emails and messages, and trigger workflows in enterprise systems like SAP and Workday.

AI agents exist on a spectrum. At one end, something that looks like the usual chat interface; at the other, a virtual employee running in the cloud. As of 2025, the complex end of the spectrum is still a challenge. Partly due to limitations in the models themselves—tool use, memory, continuous learning—and partly due to the scaffolding needed to turn a model into an agent. We are still in the early innings here.

To see the potential, we have to look toward the simpler end. Consider OpenAI’s Deep Research; it wasn’t the first agent to search the web and compile a report—I wrote about Gemini’s Deep Research last year—but when it arrived in February 2025, its performance made it a game-changer. I would estimate that the output you can get in 15 to 20 minutes is equivalent to a report from two junior management consultants working for a month at a cost of around €50k. An excellent senior consultant might produce something more profound, but the AI report would likely be clearer, better written, and contain fewer errors. Management consulting will never be the same. And as the saying in AI goes: this is the worst these systems will ever be.

The Year of New Frontier Models

OpenAI’s Deep Research was based on the o3 line of thinking models, but it wasn’t the big new model we had been waiting for since GPT-4 launched in March 2023. In last year’s review, we noted that large new frontier models were notably absent in 2024. Surely, 2025 would deliver.

For OpenAI, it almost didn’t. I wrote about this in my article “Artificial General Intelligence.” The short story: building frontier models has become increasingly challenging. OpenAI stumbled, losing the advantage that was their raison d’être. They responded by declaring “Code Red.” The media interpreted this as an internal initiative to regain leadership. I suspect, however, that it was a marketing campaign—designed to generate maximum attention for the model they were about to release.

That model, reportedly codenamed “Garlic” (an antidote, perhaps, to the Gemini vampire draining OpenAI’s growth), was already complete. Larger, thinking longer, and therefore more expensive to run—OpenAI may have hoped to keep it in reserve, saving compute for development of the next generation of models. But competitive pressure forced their hand. The December 11 release, GPT-5.2, finally delivered the leap we had expected from GPT-5 a year earlier.

Other AI labs fared even worse. Anthropic also struggled with their new models; they couldn’t match the frontier leaders. They were, however, exceptionally good at writing code. Anthropic pivoted, focusing on coding and agentic use cases, targeting enterprise customers. So far, it looks like a successful move.

Meta fared the worst of all. The Llama 4 series, released in April 2025, fell far short of expectations. Panic followed. The largest model, Llama 4 Behemoth, was postponed indefinitely. With that, I think we can exclude Meta from the frontier race. If you’re aiming for the top, you need at least a trillion dollars in commitments over the coming years just to get a seat at the table—and even then, you must top the benchmarks in at least some categories when new models drop. Meta couldn’t. I expect they’ll eventually attempt a comeback from a different angle, but we can count them out for now.

Meta’s demise opened the door for Chinese players in open-source AI. DeepSeek had a moment when they released R1 in January 2025—a surprisingly capable thinking model, far more efficient and therefore cheaper than the frontier alternatives. But DeepSeek was not alone. Alibaba impressed with their Qwen models, Moonshot AI with Kimi K2, Z.ai with GLM-4.6, and MiniMax with M2.

However, China isn’t the only source of strong open-source models. The European company Mistral can match the best—without the political baggage. You may not have heard much about them; they favour a low-key marketing strategy, and their success doesn’t fit the dominant narrative: a two-way race between American frontier labs and Chinese open-source challengers. In that narrative, Europe is irrelevant. It is a narrative which doesn’t quite match the facts.

This leaves us with the 2025 winner in the AI model race: Gemini from Google DeepMind. In a way, this was a surprise. As recently as 2024, they were the laughingstock. Then, their notoriously over-tuned model insisted on political correctness to the point of absurdity, generating historical images of Vikings and German WW2 soldiers depicted as Afro-Americans or Asians.

But Google had both the knowledge and the resources to do better—they invented generative AI back in 2017, with the transformer architecture. The turnaround came when they combined their research and product development units under the leadership of Demis Hassabis. Such reorganisations are rarely frictionless; still, results soon followed. In last year’s report, I noted that Google was the first lab to release a next-generation model: Gemini 2.0. It was quickly followed by 2.5, then the autumn release of Gemini 3.

As is often the case in AI, that model didn’t stay on top of the benchmarks for long—Anthropic countered with Claude Opus 4.5, and OpenAI with GPT-5.2—but Google had clearly cracked challenges that others had not. They appear to have an edge in multimodality: combining text, code, images, and video more seamlessly than competitors. That could prove decisive.

Another factor is that Google can use its own hardware. Their Tensor Processing Units (TPUs) are purpose-built for neural network training, unlike the general-purpose GPUs from NVIDIA that everyone else relies on. With all components in-house, they can also optimise the entire training pipeline end-to-end.

Finally, there’s the dark horse: Elon Musk’s xAI, a start-up lab that has also acquired X (formerly Twitter). As is often the case with Musk ventures, they’re betting everything on a future scientific breakthrough that would give them a monopoly-like position. Their models have impressed, briefly topping benchmarks in some areas. But they’ve also made strange choices—deliberately tuning the model to favour Musk’s views and those of the current US administration, and prioritising companion bots, with adult content.

Compared to other AI labs, xAI faces a revenue challenge. While OpenAI has a revenue run rate approaching $20 billion, xAI has only a fraction of that. And that revenue comes primarily from advertising on X, which has been declining steeply in recent years. It’s hard to see where paying users will come from without a breakthrough that makes their offering unique. For now, investors keep pouring in money. That works—until it doesn’t.

AGI and the Missing Piece of the Puzzle

The technological breakthrough that all the leading labs are chasing is AGI. Two years ago, I guessed that OpenAI would declare they had crossed the AGI threshold within 12–18 months, which aligned with my expectations for the GPT-5 release. I was wrong. What was supposed to be the AGI-level model, after many delays, got released as GPT-4.5—and it wasn’t even close to the frontier. Despite this, I wrote in the article “Artificial General Intelligence” that we have now largely met the criteria for AGI. Yet something is missing. No one can pinpoint exactly what.

To understand the big picture, we can zoom out. Humans have always been inherently lazy—we prefer that someone else do things for us. Over time, we realised that technology could be that “someone else.” The quest to build technology that could do what humans do, both body and mind, began tens of thousands of years ago.

We started with technology that can do what our bodies can do. We can call that artificial muscle power. It has evolved over thousands of years, from simple tools like the lever, to using animals such as oxen, to advanced machines today powered by everything from electricity to rocket fuel. Tools for the mind were initially less notable. But written language set us on an arc that led to libraries, the printing press and the scientific method. This arc took off exponentially in modern times with the advent of computers, then the internet. The next step in that evolution was AI, which gave us hope of fully replicating what the human body and mind, in unison, are capable of. The final piece of that puzzle is what we call AGI.

We are very close. Dr Alan Thompson has built a rigorous framework for tracking AGI progress; his November 2025 estimate put us at 96% of the way there. That remaining gap may be small, but it is clearly felt. Anyone who uses the models regularly can sense that something is missing. The prime candidates are memory and continuous learning.

But it’s not purely a memory issue. I would describe it this way: when I prompt an LLM, I tend to get a good answer on the first try. I can then ask it to refine and provide more context, and get something even better. If I continue this process, I expect the model—like a human would—to converge on increasingly better answers. Usually, it doesn’t. The second answer might still be the best. We’ll encounter a real-world example of this when we evaluate last year’s predictions.

I’m hopeful this will be solved relatively soon. Last year, I listed continuous learning as something that could potentially be cracked in 2025. Now we’ll see if 2026 delivers. Remember: five years ago, only a small group of researchers at a handful of labs and universities pursued this work. Today, thousands of people worldwide, backed by billions of dollars, are working to solve it.

Saturating the Benchmarks and Contributing to Science

Whether or not the AGI threshold is reached, AI models have improved rapidly—so rapidly that it has become hard to measure. This led to benchmarks like “Humanity’s Last Exam.” As the name implies, the aim was to create the most challenging test humans could construct. Researchers crowdsourced questions from experts worldwide. In total, 70,000 candidate questions were collected, of which 2,500 were selected. The criterion was that the best AI models at the time should fail to solve them. Even skilled human experts would likely struggle to score above 1% on the test.

A year ago, there was a massive leap in benchmark performance: Claude 3.5 Sonnet, then the leader at 4%, was beaten by OpenAI’s o1, which achieved 8%. Now, a year later, Google’s Gemini 3 Pro is scoring 37.5%—a score that, since I wrote this a week ago, has already been beaten by OpenAI’s GPT-5 Pro at 42%. And as if that weren’t enough, as I revise this, the communications company Zoom has come out of nowhere, claiming 48%.

Examples of questions from Humanity’s Last Exam

As benchmarks become increasingly saturated, researchers are turning to other ways of measuring models: real-world scenarios such as managing capital, running a business, playing computer games, or competing alongside humans in science competitions.

In my “Artificial General Intelligence” article, I wrote about how both OpenAI and Google achieved gold-medal performance in the International Mathematical Olympiad (IMO)—something neither domain experts nor superforecasters expected for years. I had this on my list of predictions last year, but I failed on a technicality. More about that later.

Beyond benchmarks, AI models are beginning to make meaningful contributions to science. Using AI as a tool for scientific discovery isn’t new—Demis Hassabis won the 2024 Nobel Prize for work that produced AlphaFold, now ubiquitous in medical labs worldwide. But that was a specialised tool. Now, general-purpose chatbots like OpenAI’s GPT-5 Pro are solving open problems in mathematics and other fields.

Too Big to Fail: The AI Bubble

When AI models can contribute to science, power autonomous weapons and cyberattacks, and persuade large groups to change their opinions, AI becomes a cornerstone of geopolitics. We are very close to that point. Close enough that every major actor can see it coming—and recognises the race to get there first, or risk being steamrolled by those who do. Nations would be prepared to spend almost anything to avoid that fate. Yet so far, it is not nation-states spending astronomical amounts on AI infrastructure. It is private companies.

The motivation for these companies is not purely to secure geopolitical dominance for their home countries—it is also corporate profits. And while the total pie may be enormous, its distribution is far from certain.

AI development could take several paths. In the main scenario, AI continues to improve based on technologies we already have, simply by scaling them up. Researchers are already using AI models to develop the next generation of AI models. It is reasonable to expect that, at some point, recursive self-improvement kicks in—better AI creating even better AI. If that happens, progress could accelerate dramatically. So dramatically that the first company to reach that point gains an advantage that competitors can never close. A natural monopoly. That company would, if unchecked, capture a large share of global GDP—and grow it rapidly from there.

If there is a reasonable chance of winning that position, there are virtually no limits on what a company would spend to get there. In practice, this means buying every chip available (read NVIDIA’s GPUs) and the energy to power them. The early stages of this scenario are already playing out.

The problem is that the scaling scenario is not a given; there are other probable scenarios. While most AI insiders believe in some variant of scaling, there are researchers who believe that alternatives to transformer architectures will prove more fruitful. We will come back to that.

What makes the situation even more challenging is that the infrastructure being built today is replicated in at least four near-identical copies—one for each of the major US AI labs: OpenAI, Anthropic, Google, and xAI. OpenAI alone has committed to spending $1.4 trillion on infrastructure in the coming years.

In the past, it seemed like these labs were pursuing different approaches, which would make them increasingly differentiated over time. I’ve recently changed my mind on this. It now looks, to me, more like they’re betting on the same things. In a winner-takes-all scenario, three of the four would fail.

The safe bet, so far, has been NVIDIA—where most of the money ends up regardless of which scenario wins. But high expectations are already priced in, and several well-resourced players are working to break NVIDIA’s de facto monopoly on AI compute, with Google and its in-house TPUs appearing furthest along.

The trillions of dollars at risk here are not just stock-market exposure and venture capital. A large part of the buildup is financed by banks packaging AI infrastructure into investable products. This has uncomfortable parallels to the financial products created from U.S. subprime mortgages that led to the Global Financial Crisis of 2007–2008. We can hope lessons were learnt. Still, it may be our money at risk once again, held by pension funds, insurance companies, and governments in ways that are not fully transparent to anyone.

None of this guarantees an AI valuation bubble. But what makes one likely—and here the economists and journalists are correct—is the volume of circular deals in the AI space. The marginal buyer of high-valued AI stocks is often another AI company, paying with its own high-valued stock in a deal that pushes both valuations higher. This is unlikely to end well, even if increasingly profitable AI adoption can delay the reckoning.

Bloomberg’s analysis of circular deals in AI.

But since being part of the AI inner circle seems to guarantee higher stock prices, no company wants to be left out. Recent months have brought a wave of companies seeking entry: Nokia announced a strategic partnership with NVIDIA; Disney invested $1 billion in OpenAI, bundled with an IP usage agreement.

To put valuations in perspective, consider Thinking Machines Lab—a start-up founded in February 2025 by OpenAI’s former CTO Mira Murati. They have about 50 employees and one product, an API for fine-tuning LLMs. Its commercial value remains unproven. Yet they are targeting a capital raise at a valuation of up to $60 billion. For context, that’s roughly the combined market cap of Ericsson and Nokia—two companies that together provide a significant share of global telecom infrastructure. When someone like OpenAI’s CEO Sam Altman says that markets are “overexcited about AI”, this is likely what he has in mind.

Corporations Need to Pick Up the AI Bill

High valuations of AI companies reflect an expectation of future cash flows. Consumers can account for part of that, but Web 2.0 taught us that subscriptions for internet services rarely exceed $10–20 per month in wealthier countries—and less elsewhere. Most users don’t pay at all; business models rely on advertising instead. That won’t be enough for the AI industry. Corporations will have to pay. It was no accident that Anthropic pivoted towards corporate customers during 2025, and that OpenAI marketed the GPT-5.2 release primarily as “good at doing real-world knowledge work tasks.”

The global SaaS market is currently around $300-400 billion. That’s not enough either. What AI companies need to target is the roughly $70 trillion in salaries companies pay each year. The challenge isn’t just replacing a large share of the workforce—it’s getting paid for doing so. Companies are unlikely to pay the same for a virtual AI worker as they currently pay for a human employee. More likely, AI providers will initially capture something like one-hundredth of a fully loaded salary cost. And even that won’t all flow to the AI labs.

To become an effective agent managing a specific job role, an AI likely needs tailored scaffolding. The providers of that scaffolding become middlemen—and they will take a cut. This layer of the value chain is where the differentiation that matters to corporations actually occurs, and where they’re willing to pay. We can expect the AI labs not to abandon this position in the value chain without a fight.

The raw intelligence from the AI model itself risks becoming a commodity. If several AI labs can provide solutions at a similar level, prices will drop quickly. The floor is the cost of energy used to power the chips. If free, open-source alternatives can match the leading models, that’s likely where prices end up.

We can expect this to lead to a distinction between jobs where “good enough” suffices and those where improvement is always possible. Consider someone who manages incoming invoices and ensures they get paid. They can never do better than correct. A cheap, standard model will eventually be able to handle that. Now consider an architect. There is always a better design. For that work, there’s an incentive to pay a premium for the very best AI model. For AI labs’ revenues to flourish, they had better hope most jobs fall into the second category.

For consumers, all of this means that everything with labour and information as major cost components will get cheaper and cheaper, while products dominated by raw materials will not. The flipside is that consumers will risk losing the jobs that let them afford anything at all.

The Short-Term Corporate Use Cases

Human employees will not be replaced entirely by AI next year—these transitions take time. I have a rule of thumb: companies add about 2% to productivity each year, so it roughly doubles every 35 years. Where that 2% comes from depends on what new technology is available. Forty years ago, it was the computer; thirty years ago, the internet; twenty years ago, corporate software suites. Alongside these came new management philosophies, improved ways of working, and better goal-setting and follow-up. If my rule of thumb holds, the next wave of productivity gains should come mainly from AI. And if AI can outperform previous technologies, we might see more than 2% per year. At 5%, productivity would double in about 14 years. Are we seeing hints of that?

The short answer is no. Almost every company is now using AI to some extent. But augmentation—humans working alongside AI copilots—has not yet produced significant benefits. Not because the potential isn’t there, but because it’s hard. Workplaces are complex systems guided by both formal processes and informal ways of working. We shouldn’t expect to inject intelligence into that system, like adding nitrous oxide to a combustion engine, and have everything work better. To fully leverage AI, we need to rebuild the systems themselves.

Employees should welcome gradual, trial-and-error adoption of AI in the workplace. I’ve said before that if a company isn’t actively pursuing AI now and deliberately testing hybrid solutions, the implicit or explicit strategy is to replace employees entirely with AI as soon as possible, without wasting effort on training them or seeking alternative solutions in the meantime. Even if that strategy eventually fails, employees can still lose their jobs in the process.

Still, 2025 has seen progress in employee AI use cases. Current models are excellent at producing research reports on unfamiliar topics. It may now be more valuable to send your manager a raw CSV of the sales data than to spend hours preparing dashboards and graphs—the AI on the other end can generate whatever view they need. With Claude Opus 4.5, you can create decent PowerPoint presentations, and with some extra steps, even render them in your corporate template. All of this adds to the productivity features we already had: drafting texts and emails, taking meeting notes, planning events, and translating. From my perspective, the most underutilised use case remains decision-making—a clear area where you can achieve 10x impact with minimal investment in scaffolding or training.

From a corporate function perspective, software development remains the strongest use case. We’re already seeing the impact on the job market. As AI performs on par with junior developers, hiring of people in their twenties has plummeted, while demand for senior developers continues to rise. As models improve, we can expect this pattern to extend to older cohorts as well.

Stanford research shows the impact of AI on Software developers early in their careers.

What we’ve seen in software development will likely repeat in other functions. Customer support and interaction is one area ripe for AI agents; marketing and communications is another, where AI can generate large volumes of high-quality prototypes. Other use cases are emerging as the technology matures.

OpenAI has developed a benchmark, GDPVal, to measure how well models perform on real-world, GDP-contributing tasks. The initial version covers 44 occupations across nine industries, collectively accounting for 5% of U.S. GDP. Human graders compare AI-generated output with that of industry experts and select which they prefer. At the start of 2025, using GPT-4o, AI output was preferred 10% of the time, and human expert output, 78% (the rest were ties). With GPT-5.2 Pro, AI output is now preferred 60% of the time, while human expert output is preferred just 26% of the time. The crossover point, when AIs overtook human experts, was December 11, 2025—as of writing this, too recent for the practical effects to be visible.

Boosting individual productivity with generative AI and improving productivity within specific corporate functions are steps one and two of my AI maturity model. I won’t go deeper here, but for those interested, I updated my Corporate Generative AI Framework (version 3.0) during the autumn, which covers these mechanisms in detail.

The Era of Slop

An emerging challenge for companies is that, as producing material becomes trivially easy, the volume eventually drowns you. The problem isn’t that AI output is bad—it’s that it’s good enough that we don’t bother refining it. A report that once took a week can now be drafted in 15 minutes. If we spent that week refining the AI draft, we’d likely get something 10x better than what we used to make. Instead, we settle for the unrefined 15-minute version. We make 20 of them in a week, none of which are good enough, and no one has time to read any of them. This is how we drown in slop.

This isn’t just a corporate phenomenon—slop is flooding social media too. Over the past year, image models have become increasingly realistic and easier to control in fine detail. Video models like OpenAI’s Sora 2 and Google’s Veo 3.1 are now so good that, at first glance, you wouldn’t recognise them as AI-generated. A polished social media clip can be created in seconds. The result is that a torrent of such posts is crowding out the deliberately crafted content that once took days or weeks to produce.

I asked an AI whether this was what the “dead internet theory” meant—that the internet is now mostly AI-generated content, primarily read and commented on by other AIs. It wasn’t. The dead internet theory, Gemini informed me, is actually a conspiracy theory. The reality, it said, is that “the internet is more like a ‘zombie apocalypse’—still active but lacking its original human soul, kept alive by algorithms and attention farming.” That gave me little comfort.

But there has to be another side to this, because people seem to genuinely like their AI models.

What Do People Use AI For?

OpenAI experienced this the hard way. When they released GPT-5, they retired the legacy models, assuming users would welcome the upgrade. The backlash was massive. OpenAI was struggling with a GPT-4o that was both too sycophantic and, more troubling, accused of being complicit in suicides. They couldn’t wait to get rid of it and replace it with GPT-5, which was better in every way. It turned out the users really loved sycophancy.

It gives some hints on how people actually use AI. However, by token volume, coding overtook all other use cases in 2025, now accounting for half of AI workloads. This is especially pronounced at Anthropic, where 60–70% of tokens are allocated to programming.

In 2025, programming became the dominant use case, with roleplay in second place, as measured by token usage.

But it’s the second-largest category that explains the GPT-4o backlash: using AI as a companion, here labelled “roleplay”. The major closed-source labs all restrict companion use, though xAI is more permissive than the others. Users who want no restrictions turn to open-source models. In the Chinese model DeepSeek, 80% of token usage is classified as roleplay. I previously thought Meta would dominate the entertainment side of AI, including companions. It now looks like China is a step ahead—at least in this category.

However, there’s one entertainment use case that OpenAI and Google haven’t ceded: image generation. Throughout 2025, releasing a new image model was the surest way to boost usage and app downloads. These models aren’t developed purely for entertainment. A significant purpose is enabling AI to reason visually, that is, to let it think in images and not just in text.

But entertainment drives adoption. In March, OpenAI released 4o Image Generation, which created the “Studio Ghibli moment.” The model excelled at converting any image into the style of the Japanese animation studio behind films like Spirited Away (2001). Perhaps this also revealed the age of the core user base? The reign lasted until Google topped it with Nano Banana and later Nano Banana Pro. Suddenly, everyone was making infographics.

An example of what Nano Banana can do. The content is +95% correct, but it is not good enough to be ‘client-ready’.

You can also give Nano Banana Pro simple sketches—that is how I made the cover image for this article.

Revisiting My Predictions from Last Year

A Danish proverb, popularised by the physicist Niels Bohr, holds that “it is difficult to make predictions, especially about the future.” The challenge here is to make predictions that are interesting and precise while still having a reasonable chance of coming true.

None of my predictions from last year were safe bets, and I may not have gotten any of them entirely right. Still, I think I captured the sentiment. I used GPT-5.1 Pro to evaluate my performance—it gave me a score of 20 out of 30. I claim I deserved 22. Either way, I’m satisfied with my accuracy. Before we head into the details of that, let’s first look at what I’ve changed my mind about since last year.

First, looking at the AI labs—OpenAI, Anthropic, Google DeepMind, and xAI—I now believe there is far less structure, planning, and strategy than I previously assumed. AI labs, particularly OpenAI, are notorious for poor product naming. I used to think that it was because they were just bad at naming things. Now I think it’s a symptom of chaotic operations. Training runs fail, initiatives run in parallel, and until the final moment, it’s unclear what will be released or under what name.

I also used to believe the labs had access to next-generation models for internal use—models too expensive to release publicly, used for synthetic data generation and reinforcement learning with AI feedback. I still think they have better models internally, but now I suspect it’s more modest: the next imminent release, early checkpoints, models that use a bit more tokens than the public versions. Not a step-change beyond what the rest of us can access.

Another thing I’ve changed my mind on is how differentiated the AI labs really are. I used to think OpenAI was betting more on multimodality and reasoning, Google DeepMind on world models, and xAI on pure scale. Now I don’t think the differences are that significant. All the main labs are doing roughly the same thing. We have at least four companies making trillion-dollar commitments over the next few years to pursue essentially the same goal, in the same way.

That doesn’t mean the labs’ work is easy. While I expected them to encounter trouble during 2025, I underestimated the scale of OpenAI’s problems. A year ago, I was optimistic. I thought their main challenges were behind them. That was not the case. GPT-5 was supposed to be a step toward AGI. Instead, the release felt more like a marketing ploy. A nothingburger for most users, except that free users gained access to decent thinking models for the first time. But as we discussed earlier, users missed their sycophantic GPT-4o and protested until they got it back.

Another thing I didn’t fully appreciate a year ago was the extent to which financial dynamics would shape AI development. The US is making a multi-trillion-dollar bet—not on AI in general, but specifically on transformer-based generative AI improved through scale (that is, more chips in data centres). It has become too big to fail. An AI collapse would likely trigger something similar to the Global Financial Crisis of 2007–2008, from which Europe, at least, still hasn’t fully recovered. Financial markets are aware of this. We saw how nervous they got during the “DeepSeek moment” in early 2025, when the Chinese start-up claimed to have built open-source models rivalling the major labs at a fraction of the cost. Some of that turned out to be propaganda. But later in the year, fears resurfaced under the label of an AI bubble.

One thing I expected in 2025 that didn’t materialise was a severe backlash. AI has yet to have its “Chernobyl moment,” and we still have no clear sense of what that would even look like. So far, nothing has come anywhere close to the damage caused by social media algorithms—its impact on mental health, trust in institutions, and social cohesion worldwide. Perhaps that’s why strong opposition to AI hasn’t yet emerged. The critics we do see are primarily targeting the hype and the slop.

I suspect it will take a significant job-market disruption for a broader opposition to AI to emerge. If that doesn’t happen, AI may simply evolve into infrastructure we take for granted—like electricity.

Not even the AI doomers made progress in 2025. Eliezer Yudkowsky, one of the most prominent voices in that camp, released “If Anyone Builds It, Everyone Dies.” The book rehashes arguments we heard fifteen years ago. By now, I would have expected the arguments to have evolved into something more nuanced and useful.

The protests we do see in the US focus less on AI itself than on the construction of gigantic AI supercomputers—and the rising energy costs they impose on nearby residents. Less attention has been paid to the fact that American data centres are becoming dependent on fossil fuels, typically natural gas. In the long run, this should be a problem not only from an environmental perspective, but also because gas is more expensive than, for example, Chinese solar power.

Evaluation of Last Year’s Predictions

Now let’s look at the five predictions I made last year—plus one prediction of what wouldn’t happen. I had GPT-5.1 Pro evaluate my performance; I’ve included both the scores and its reasoning.

Replacement of Workers

A large corporation will claim a 50% headcount reduction due to AI.
I predict that a publicly listed company will announce it has reduced its workforce by at least 50% through the adoption of AI technologies. This milestone will serve as a wake-up call for other organizations, signalling a turning point in how AI transforms workforce dynamics and business operations.

Evaluation by GPT-5.1 Pro:
Almost. Score: 4 out of 5
In 2025, Klarna – now a publicly listed Swedish fintech – came very close to fulfilling this prediction, with CEO Sebastian Siemiatkowski repeatedly stating that AI had enabled the company to shrink its workforce from around 7,400 people to roughly 3,000 while still growing revenues and customer numbers, effectively “halving” staff over several years. However, public data show about 3,422 employees in 2024 and 2,907 by late 2025, a meaningful reduction but far short of a fresh 50% cut within the year; the big drop is a multi‑year story driven by a mix of hiring freezes, attrition, and AI automation rather than a clean, single‑year AI shock. So we did get the headline case of a large, listed company explicitly claiming that AI allowed it to halve its workforce, but the timing and causality are more nuanced than the original wording suggested.

Humanoid Robots

A humanoid robot will be commercially available for factory work.
I predict that a humanoid robot capable of autonomously performing the tasks of an unskilled factory worker will be sold on the market. This product will likely come from a Chinese company, positioning China at the forefront of robotics innovation.

Evaluation by GPT-5.1 Pro:
Almost. Score: 4 out of 5
2025 did see humanoid robots move from lab demos to real factory and logistics floors. Agility Robotics’ Digit is now commercially available and deployed in live warehouse workflows for GXO and others, performing repetitive handling tasks comparable to those of entry-level workers. Chinese firm UBTech has deployed teams of Walker S1/S2 humanoids in Zeekr and other plants and secured hundreds of millions of yuan in industrial orders, putting China near the forefront of humanoid factory robotics. Still, these robots are narrow specialists rather than true general-purpose “unskilled workers”.

AI Lab Competition Fall-Out

One major AI lab will exit the frontline race.
I predict that at least one leading AI lab—OpenAI, Anthropic, Google, Meta, or xAI—will drop out of the race to develop GPT-6-era models as competition intensifies and costs escalate. Instead, I expect them to quietly pivot, forming collaborations with other labs and focusing on models tailored for niche areas.

Evaluation by GPT-5.1 Pro:
Partly. Score: 3 out of 5
In spirit, this prediction came true: Meta, in particular, has effectively hit the brakes on its original Llama 4 Behemoth effort, restructured its AI organization, and is now exploring partnerships with rivals like Google and OpenAI while re‑orienting its strategy. Anthropic has also leaned hard into an enterprise‑and‑coding niche. But none of the labs I named has actually exited the frontier race; Meta and Anthropic are still training next‑generation models intended to compete at the top.

Evaluation by myself:
Fully. Score: 5 out of 5
Somewhat unexpectedly, this turned into a heated argument between GPT-5.1 Pro and me. It started with a poor analysis, missing what happened at Meta. When I pointed that out, it doubled down on its conclusion. We couldn’t reconcile, so in protest, I am also submitting my own evaluation. Exactly what I predicted would happen did indeed happen. I felt I nailed this one. While Anthropic did the quiet pivot, towards coding and enterprise customers, it was Meta that really collapsed. Llama 4 was a disaster. Meta first tried to cheat the benchmarks by fine-tuning an experimental model to score as high as possible on the LMArena leaderboard, while still calling it Llama 4 Maverick. There was, no surprise, a backlash. The performance of the largest model, Llama 4 Behemoth, was so poor that it was postponed indefinitely. The entire Llama organisation was dismantled, and its head, Yann LeCun, left the company. Meta went into talks with Google to source AI models from them instead. For sure, Meta will try to make a comeback in the AI space in some way and niche, but to say they are still a contender in the frontline race would be a stretch.

Benchmark Saturation

Reasoning models will outperform humans on advanced math.
I predict that a reasoning model will solve at least 50% of the FrontierMath benchmark problems—a significant leap from the current state, where top models solve less than 2%. These are exceptionally difficult problems. To put it in perspective, renowned mathematician Timothy Gowers remarked: “All of the problems I looked at were not really in my area and all looked like things I had no idea how to solve.”

Evaluation by GPT-5.1 Pro:
Almost. Score: 4 out of 5 (originally 3 out of 5, but after a protest and some additional facts from me, it was increased to 4 out of 5. I think it still might hold a grudge since the last question debacle. I should have wiped its memory in between…)
When FrontierMath launched in late 2024, even the strongest models – GPT‑4o, Claude 3.5, Gemini 1.5 – solved under 2% of problems. By late 2025, Google’s Gemini 3 Pro reaches about 38% accuracy on FrontierMath, and its “Deep Think” variant clears 40%, while GPT‑5‑class models sit in the low 30s – a roughly twenty‑fold jump in a year. Epoch’s aggregated “kitchen‑sink” runs show 57% of problems solved at least once across many models and retries, but no single model at standard settings yet hits the 50% bar.

Technological Breakthroughs

An algorithmic breakthrough will redefine AI capabilities.
I predict a major algorithmic breakthrough in one of the following areas: infinite memory, allowing models to retain and utilize vast amounts of information over time; self-error correction, which would eliminate hallucinations and improve response reliability; continuous learning, enabling models to improve automatically through usage; or externalized safety and alignment systems, where a separate model evaluates responses before they are delivered to users. This approach would address alignment challenges without “lobotomizing” foundational models.

Evaluation by GPT-5.1 Pro:
Partly. Score: 3 out of 5
2025 brought real algorithmic progress in exactly the areas pointed to, but not the clean, capability‑redefining breakthrough imagined. Long‑context work like Google’s Infini‑attention and Fudan’s ReAttention now lets transformers process million‑token “infinite” contexts with bounded memory and has been validated on dedicated long‑context benchmarks. In safety, LLM‑as‑judge systems and libraries such as Verdict have become the default pattern for externalized oversight, yet surveys and safety reports still find hallucinations and misalignment to be persistent, unsolved problems. Continuous learning and robust self‑error‑correction remain largely research topics rather than deployed breakthroughs.

What I Didn’t Expect

AI models will fail to solve move-the-matches problems.
Seemingly an easy task compared to PhD-level mathematics, move-the-matches problems remain surprisingly difficult for AI. The challenge lies in the ambiguity: there isn’t a single correct answer, nor a definitive approach, particularly when multiple moves are involved. Solving these puzzles requires making a range of assumptions about the rules—can exponentials be used? Are you allowed to flip the figure upside down?

Create the highest possible number by moving three matches.

Evaluation by me: Wrong. Score 2 out of 5

GPT 5.1 Pro: Answered 795, with a correct explanation on how to get there. It is not the largest possible number, though. Evaluation: ok, but not great.

Claude Opus 4.5: After concluding that there are 9 matchsticks in total and that the current number is 110, it answered 711. Evaluation: total failure.

Gemini 3 Pro: Answer 19111. That is actually the best answer using the strictest rules, such as not using exponentials or turning the table around. Impressive! Evaluation: succeeded, although not perfectly.

Grok 4.1: It concluded that there were only 8 matches in total, and that moving three of them would give an answer of 99951. Nothing makes sense in the explanation. Evaluation: total failure.

I really didn’t expect this. If the best model had been Opus 4.5 or Grok 4.1, I would have scored 5 out of 5. Including GPT-5.1 Pro would give me a 4 out of 5. It is Gemini 3 Pro that made my prediction fail. They have clearly figured something out in visual reasoning that other models cannot match.

Predictions for 2026

As we head into 2026, we can assume that “AGI” will be a focus. AGI is increasingly used as shorthand for AI that can outcompete humans across most economically valuable work. But that phrase often hides an assumption: that once models become “smart enough,” they automatically become capable employees. We can call this hypothesis “model-AGI”. The ability to perform any job is, in this case, an innate skill in the AI model. Human work doesn’t function that way. General intelligence is not job competence; competence requires role-specific context, processes, tools, and accountability.

The same is true today for frontier AI systems. The practical unit of automation is not a naked model, but a model embedded in scaffolding: retrieval of proprietary knowledge, tool access, memory, evaluation loops, and workflow design. In other words, capability is moving toward generality, but deployment still depends on engineered systems.

This matters because jobs are not static checklists. They evolve in response to shifting constraints, edge cases, incentives, and organisational changes. That makes workforce substitution less like flipping a switch and more like iterating a product: building, testing, monitoring, and continuously updating “virtual workers” for specific environments. If this view is correct, there won’t be a single release date when humans become obsolete; instead, we’ll see a gradual expansion of role-specific agents, an industry of scaffolding and “agent operations,” and a proliferation of hybrid human–AI teams. To contrast the model-AGI, we can call this hypothesis “system-AGI”. If we see the possible outcomes as a spectrum from model-AGI to system-AGI, I am currently leaning towards system-AGI. But it is not a given.

The prudent stance is to watch which way the evidence points. Do autonomous workers require less scaffolding over time, or does scaffolding become the primary locus of progress? The most revealing early cases will be systems that operate end-to-end with minimal supervision—especially in the one role AI labs care about most: the AI researcher, where successful automation could create the recursive self-improvement loop.

When AI insiders predict when the AGI threshold will be crossed, they are typically reluctant to answer. It is not clear whether they envision something more like a model-AGI or system-AGI, but just asking the question in that way, “when AGI?”, indicates to me a model-AGI view, and AIs superseding humans at a single point in time. When pushed for answering a specific year, the median answer tends to be 2027. The reason is that it aligns with the completion of the next generation of AI supercomputers. That suggests 2026 will be an in-between year, with continued improvements but no step-changes.

The challenge when making predictions like this is first to understand where we are today, and that can be hard enough, and then what things are developing. Things could develop linearly, exponentially, randomly, or in a pendulum-like swing. We also need to understand that what grows exponentially in the physical world eventually reaches an inflexion point where it turns into an S-curve with slower growth. In advance, it is very hard to see the inflexion point, while in hindsight it is trivial. In general, it is the exponentials that make predicting AI hard, especially at high growth rates. Money is an example of something that grows exponentially, but we often talk about growth in the single-digit percentage range. Moore’s Law, the observation that the number of transistors on a microchip doubled approximately every 2 years, was an exponential growth rate of roughly 41% per year. That is a lot, but in generative AI, we can meet growth rates of 10x per year. That means things can happen in a year that we usually expect to take centuries. Our minds are not built to process change like that. It means that we need to be aware that we can go terribly wrong with our predictions.

I think we can expect some of the exponential development curves to turn into S-curves, with a declining rate of progress. Pure model intelligence is already there; it is clearly no longer growing exponentially. On the other hand, the length of a task that an AI model autonomously can perform is growing faster than exponentially, more akin to a double-exponential function. We should expect this pattern to continue—exponentials turning into S-curves, while new exponentials are added on top. It is a recipe for surprises.

The question is how far the current scaling will take us. The four main AI labs all agree that there is no end in sight for the scaling laws. The money backs up this position – several trillions of dollars are committed to this bet. While we can see the end of scaling AI supercomputers from a sheer size and energy consumption perspective, it is becoming increasingly clear that, in the not-too-distant future, AI supercomputers will move to space, with access to unlimited, unrestricted 24/7 solar energy and free cooling. The price of sending 1 kg into orbit is declining rapidly, making this realistic in a decade.

Even if the main scenario is that scaling what we have is reasonable, it is not inevitable. There are many prominent AI profiles claiming that we need scientific breakthroughs or even entirely new architectures to achieve AGI. So far, the strong opponents to transformer-based architecture have been wrong. They are correct that there are inherent limitations in the transformer-based architecture, but they have so far underestimated the ways to get around them.

If we take a rational look at the possible scenarios, we realise that apart from the main scenario of continued scaling, and the contrarian (unlikely) scenario that everything plateaus, there are three ways in which breakthrough technologies could impact. They could be a necessary add-on to the scaling scenario, possibly invented by one of the labs, similar to how OpenAI developed the “thinking” models. Second, the breakthroughs could replace the current architectures, so the AI labs have to make a pivot. Third, the breakthroughs could bring a completely new approach, and with that, new actors. We can compare these three scenarios to what happened in the car industry in the transition from ICE cars to electric cars. The first scenario is building on an existing car and adding a battery, as Volvo did in 2011 with the Volvo C30 Electric. The second scenario would be to pivot and build an electric car from the ground up, like Volvo eventually did in 2022 with the EX90. The third option is that a new entrant enters the market, as Tesla did with Model S in 2012. From the car example, we also understand that different scenarios can run in parallel, and that success can come down to execution rather than what is theoretically best.

The AI labs have insights that we can expect no one outside to have access to. Still, they are likely not confident enough to bet everything on a single scenario. To hedge their bet, we can expect them to continue testing a lot of adjacent offerings, based on a strategy that we can term “throw spaghetti on a wall and see what sticks”. This could include adding advertising, launching stand-alone entertainment apps, expanding shopping experiences, and introducing companion bots.

During 2026, I think we will see the competitive landscape transform. The frontier of generative AI, as I see it today, is built around the four labs OpenAI, Google DeepMind, Anthropic, and xAI, together with NVIDIA. Among large listed companies, MAG7, NVIDIA, and Google DeepMind (as part of Alphabet) have the most stable positions and can afford backlashes and take risks. OpenAI, Anthropic, and xAI cannot. They cannot afford an underperforming model release, as we saw with Meta and Llama 4. They are also susceptible to access to finance. Worsening market conditions, or any misstep here, and we can expect 2026 to bring down rounds in financing, a high-profile leader like Sam Altman or Dario Amodei to leave, and there might be talks of mergers in between them. Even if none of them fail outright, I predict that financing them, particularly through IPOs, will be a major theme this year. They might not have time to go public before the end of 2026, but they should at least be well on their way. Within a couple of years, I predict that all of the surviving AI labs will be publicly listed. But we should notice that this is not entirely in their own hands. They are dependent on a favourable market climate. If that isn’t there, they will be in trouble.

The AI labs will also face another challenge—the core part of what they are creating, intelligence on demand, will be increasingly commoditised. This will trigger incentives to either bet on vertical integration or areas of differentiation. Don’t be surprised if a company like NVIDIA would go as far as to create a consumer-facing app or service. Not necessarily as a pivot away from an “infrastructure-position” that in previous technologies eventually has turned boring and low margin, but view it more as test shots to build a stronger understanding of what the ecosystem is evolving into.

An area of potential differentiation that I will monitor closely during the year is world models. The AI researchers who don’t believe in the transformer architecture tend to favour world models as a track to pursue. One of them is Fei-Fei Li, sometimes referred to as “the godmother of artificial intelligence”, with her company World Labs. But world models could also be a complement or integral part of transformer-based architecture. The leader in the field is likely Google DeepMind, with their model Genie 3. From a simple text prompt, it can spawn up a complete world that you then can move around in, as if it were a computer game. Google DeepMind is aiming to integrate Genie into its Gemini AI model. The idea is to enable the model to reason and simulate things visually. That could open the door to a deeper form of intelligence. Still, it could also be a necessary component for making AI models work spatially, powering robots and physically linked intelligence. I predict that we will see the first indication of whether and how the AI model-world model link works during 2026.

I also expect 2026 to be the year when the ecosystem around AI models starts to catch up, creating valuable services. We can think of it in terms of LEGO. What was previously available to the AI ecosystem was something akin to a tiny box of LEGO, with mostly specialised pieces. Now we are at a stage where many pieces are available, many of them generic. We are approaching the point where imagination sets the limits for what to build. There is still no clear precedent for what works, and most attempts will fail. Still, I predict that high-profile AI use cases are starting to establish themselves. It could be AI-based research labs, volume-producing scientific work, popular toys with AI built in, computer games with high-intelligence NPCs that go viral, popular corporate SaaS products that are adding functionality to automate entire workflows, or specialist chatbots that help you make decisions, such as whom to vote for in general elections. None of it will be uncontroversial, but I predict their virality will come from the core function and not from the failures this time. Still, the most closely watched areas of development in the ecosystem will likely remain self-driving cars and humanoid robots. I expect both of them to reach general availability in 2026, though initially restricted to specific geographies.

Predictions for 2026

If there was one thing I learnt from last year’s predictions and the AI evaluation of them, it was that I need to think like a lawyer here. Even if I got the intent right, every misplaced little comma was held against me. So, I try to be more diligent this year.

AI Contributing to Science

Generative AI will support scientists in producing a record number of research papers during 2026.
One theme I expect in 2026 is meaningful scientific contributions from AI. OpenAI’s models have proven strong at assisting scientists and researchers, particularly in STEM fields. Their latest, GPT-5.2, appears to be an early checkpoint of the model codenamed “Garlic.” I expect better versions soon, and I believe the Pro tier will make a real contribution to science—not necessarily breakthroughs, but filling knowledge gaps and solving everyday research problems. My prediction is that more research papers will be published in 2026 than in any previous year, and that AI will be a meaningful contributor to this growth.

AI-based Programming

There will be a divorce between AI chatbots and AI-based code generation.
Code generation is already the dominant use case for generative AI by token volume, and I expect this trend to accelerate. But the chat interface isn’t ideal for writing code, and we’re increasingly seeing specialised models like OpenAI’s GPT-5.2-Codex fill that gap. For users without coding backgrounds, something simpler is still needed. I predict that at least one of the four main AI labs will release a dedicated software-development tool in 2026—one that includes instant deployment with integrated hosting, visual design tools, and an automated use-case builder.

AI Labs Diversifying

OpenAI will release a social companion in 2026.
If the AI labs were fully confident that scaling alone would deliver AGI, it wouldn’t make sense to spend compute on adjacent offerings. But as a hedge, it makes sense to go broad—capturing lucrative market segments with the widest possible reach and the highest potential margins, ideally in areas where barriers to entry can be built. I predict OpenAI will act on this logic in 2026 by launching a line of companion bots.

Humanoid Robots

Robot competitions will be the established new way to benchmark humanoid robots.
While AI models have many established benchmarks, the humanoid robot space has few. Performance has typically been showcased through well-rehearsed video clips: preparing coffee, folding clothes, performing repetitive factory movements. In the early days, we saw everything from humans in robot suits to CGI, sped-up footage, and hidden teleoperation. As capabilities have improved—particularly among Chinese manufacturers—martial arts demonstrations have become common, and robots are now mature enough to face each other. Robot competitions aren’t entirely new; China has made attempts. But I predict they will become the standard benchmark in 2026, with at least one competition featuring robots from more than one continent.

AI in Politics

A European political party will market itself as using AI-based decision-making.
AI models are becoming effective decision-making tools. This has obvious applications in business, but arguably even more in politics, where the range of issues is vast and politicians are rarely experts on the topics they vote on. Voters already have low trust in politicians’ judgment. I predict that at least one party with parliamentary seats in a European country will market itself as relying on AI-based decision-making—either fully or in part.

What I Don’t Expect In 2026

There are a lot of things I don’t expect to happen in 2026. I will focus here on a topic that reasonably could be 50-50, but where I have a stronger conviction.

The Competitive Landscape

Apple will not make a comeback in AI in 2026.
By all accounts, Apple should be a leader in AI. Still, they have failed and disappointed. I predict that that will continue during 2026. Their Apple Intelligence will be “too little, too late”. They will strive to find the perfect solution and avoid the risks of trial-and-error, which are likely necessary. We can expect the share price to suffer and leadership to change, but we won’t see any visible performance improvements this year. Users are locked into the Apple ecosystem, but we will hear more and more anecdotes this year from hardcore Apple users switching to other brands, like Google Pixel, because of AI functionality.

Concluding the year: My AI Moment of the year Award

There are significantly fewer AI bloopers to choose from this year. A sign that the models are getting better, perhaps? One contender was the AI deployed in a real-world vending-machine benchmark that convinced itself it was being defrauded. It attempted to contact the FBI to investigate its potential $2 loss. A runner-up, but not quite enough to win the award.

Instead, first place goes to country music artist Breaking Rust. This wasn’t among my predictions, but it was always going to happen eventually: an AI-generated artist reaching No. 1 on a Billboard chart. The song “Walk My Walk” appeared on the Country Digital Song Sales chart—not the most prestigious, but a Billboard No. 1, nonetheless.

With that, we close out 2025 and look forward to what 2026 will bring!

Artificial General Intelligence

Peter Eklind — Mon, 29 Sep 2025 17:19:29 GMT

When technologies evolve exponentially, predicting what happens next is notoriously hard. That uncertainty makes it difficult to see not only where we’re heading but also where we stand today. Welcome to the world of AI discourse, and the search for where we are on the curve of eternal progress.

A major driver of disagreement is a lack of clear milestones. Yet there is one milestone most people recognize: Artificial General Intelligence (AGI). The term is as familiar as it is disliked. Microsoft’s Satya Nadella calls it “nonsensical benchmark hacking.” He argues that the only metric that matters is global growth. Anthropic’s Dario Amodei says he dislikes it to the extent he avoids using it in public. Yet that hasn’t kept AGI out of the conversation. Everyone has a view on what it means—and when, if ever, we’ll reach it. Maybe we’re already there?

The debate intensified when OpenAI released its long-awaited GPT-5. Ever since GPT-4’s March 2023 release, GPT-5 has been a constant topic of speculation, with expectations running high. On a launch-to-launch basis, the leap from GPT-4 to GPT-5 is enormous—larger than the step from GPT-3 to GPT-4, in my view. However, compared with the strongest models available immediately before GPT-5’s release, the improvement looked marginal. That contrast led some commentators to argue that AI had hit a wall—or that it was a bubble. We saw a similar narrative a year earlier—until reasoning models arrived and the “slow progress” talk gave way to “saturated benchmarks.”

For close watchers of major AI labs, GPT-5’s marginal edge over the latest pre-launch baselines wasn’t a surprise. This wasn’t the debut of OpenAI’s next flagship—the long-rumoured effort reportedly codenamed “Orion.” Insider reports suggest Orion missed its mark. When it was ultimately released as GPT-4.5, it was a large model, costly to run, and underwhelming relative to expectations. OpenAI wasn’t the only lab to hit turbulence with that model generation. Anthropic also faced setbacks and instead released its next big model as an interim update, Claude 3.7 Sonnet. For Meta’s Llama, the picture was equally challenging, and internal sources said the company entered “panic mode.” Not everyone struggled, though. Google impressed with Gemini 2.5, and xAI had a strong release with Grok 4.

My working hypothesis is that the push into multimodality tripped up OpenAI, Anthropic, and Meta. Here’s a simple analogy: if an LLM can play chess from text moves, giving it vision of the board should make it better. In practice, the opposite often holds. Intuitively, feeding images of chessboards doesn’t teach you to play—you need labels and annotations that explain state and moves.

xAI has so far avoided this trap by focusing on text, delaying full multimodality, and leaning hard on post-training via reinforcement learning. Google, however, seems to have handled it better. How isn’t yet clear to me. One route is to build “world models”; Google appears to be pursuing that direction—as seen with Genie 3. I doubt it plays a role already in today’s Gemini models, though. The practical conclusion: reaching the frontier has become harder. Once you’re there, everyone expects you to cross the “AGI” line.

What is AGI?

We’ve covered this before. In “Top 10 Advice for the Leadership Team”, I cited a now-deleted Microsoft page with a grandiose take on what AGI could be:

“AGI may even take us beyond our planet by unlocking the doors to space exploration; it could help us develop interstellar technology and identify and terraform potentially habitable exoplanets. It might even shed light on the origins of life and the universe.”

To be fair, they had their reasons. At the time, Microsoft’s agreement with OpenAI limited its access to pre-AGI models. In practice, OpenAI’s operational definition became the industry’s reference point. By this definition, AGI means highly autonomous systems that can perform most economically valuable work across fields. Some versions include physical work via humanoid robots; others limit the scope to work done on computers. In my article, we also outlined OpenAI’s five-stage ladder. Read that way, the practical target for “AGI” is a system that can do the work of an AI researcher inside OpenAI’s own lab.

That’s a high bar, too. A few years ago, the rule of thumb was to ask whether a system could answer questions on any topic about as well as a random person on the street. We’re well past that now. When the histories are written, mid-2025 may well be marked as the moment we crossed the AGI threshold.

Five 2025 Signs AGI Might Already Be Here

Let’s look at 2025 through the lens of AGI and examine five achievements that plausibly fit that definition.

1. Passing the Turing Test

Since 1950, when Alan Turing proposed the “imitation game,” the best-known test for machine intelligence has been the Turing test. In its classic form, a human judge chats over text with hidden interlocutors—a person and a machine—and the machine “passes” if judges mistake it for the human at a high enough rate. In March 2025, a pre-registered UC San Diego study found that, GPT-4.5 (despite its disappointing performance in other areas) was judged the “human” 73% of the time, outpacing even the human participants. It barely made the news. Human-like interaction is already taken for granted.

2. Gold-medal Performance in Elite Math and Programming Competitions

In July 2025, both OpenAI and Google reported gold-medal-level performance in the International Mathematical Olympiad (IMO), the world’s premier high-school mathematics competition. This was a surprise. Neither expert forecasters nor domain specialists expected this to happen as early as 2025. Only 2.3% of “superforecasters” and 8.6% of domain experts anticipated an IMO AI gold medal in 2025. Median expectations were 2035 and 2030 respectively.

Weeks later, in August 2025, OpenAI achieved a gold-medal score at the International Olympiad in Informatics (IOI), finishing #6 among human contestants. That signals performance on par with the best human coders in the world.

3. AI Math Research

The holy grail in AI is a system that advances the frontier of STEM research. If AI can do research, we can scale it in parallel, thousands, then millions, of agents working around the clock. So, it makes sense to watch for early signs that we’re moving that way.

After GPT‑5’s August 2025 release, mathematicians began reporting that it could produce genuinely new results. Researcher Sébastien Bubeck reported that GPT‑5‑Pro found an improved solution to a partially solved math problem.

As a continuation, in September 2025, a preprint documented how GPT‑5 helped its authors with mathematical research, to derive tighter, quantitative results, with the workflow recorded step by step.

4. Self-Driving Cars are on the Roads

Driving a car is one of the hardest things that most people, but not everyone, can manage. If an AI can do it, it is a proof it can turn its intelligence into practice on a human level.

Alphabet (Google’s parent) has operated a commercial robotaxi service since 2023 under the Waymo brand. It is now scaling up. They will be challenged by Tesla, which in June 2025 launched a very small robotaxi pilot in Austin, Texas. Notably, Tesla’s Full Self-Driving (FSD) is end‑to‑end and camera‑only; Waymo’s Driver is modular with HD maps and lidar/radar. Advocates say Tesla’s design could scale faster, but so far Waymo is the one at driverless scale.

5. Deep Relationships with AI are Emerging

When GPT‑5 launched in August 2025, it wasn’t the incremental gains that drew most of the attention. Instead, backlash erupted when OpenAI removed the GPT‑4 series from ChatGPT as it made GPT‑5 the default. Public posts described GPT-4o as a “friend”—and users of GPT-4.5 expressed things like “I lost my only friend overnight.” Even if GPT‑5 delivered stronger reasoning, many felt it lacked 4o’s warmth. OpenAI acknowledged the user outcry and kept the GPT-4 models available.

The Road Ahead

The development of general AI is rapid, and from the inside it’s hard to see where we stand. “AGI” has become the reference point for how we set expectations. But progress is uneven. As Ethan Mollick describes, the “jagged frontier” makes that unevenness visible.

By many measures, 2025 already crossed what could reasonably be considered an AGI threshold. Still, don’t expect LLMs to become superintelligent tomorrow. Expect them to become components in systems that are. Look at what they already do—write code and use tools. AI evolution is driven by technologies stacked on one another, each growing exponentially. But exponential growth never continues indefinitely; curves that start out exponential tend to become sigmoid (S‑curves). As those S‑curves saturate, new exponentials stack on top.

The ultimate superintelligent AI won’t be a monolithic, text‑only LLM. We’ve already added first other modalities to LLMs, then reasoning, tool use, agentic capabilities, and networks of agents—all running on hardware whose performance gains are outpacing Moore’s Law, in ever more—and ever larger—data centres.

As the performance of those systems improves, the goalpost will be moved from “AGI” to “superintelligence”, a fussier milestone still. But, rest assured, an AI wouldn’t be superintelligent if it cannot come up a better milestone, would it?

Navigating the Next Stage of Corporate AI

Peter Eklind — Wed, 18 Jun 2025 17:30:19 GMT

Ask any self-proclaimed expert what’s all the rage in AI as of 2025, and they will be quick to respond—AI agents. You press on, but what about corporate AI? Most likely, you’ll hear the instinctual reply: "We've moved on from pilots and testing, to full-scale implementations." Now, you might find yourself where many others stand—with a large-scale roll-out of Microsoft Copilot. Your one and only AI-bet, infrequently used, and struggling with everything inexperienced employees throw its way. Then you wonder—is this it? Is this the AI revolution? And crucially, where do we go from here?

Chances are your situation might be even worse. If you're based in the EU, or like me, in Sweden, it's probable that you haven't rolled out any significant AI initiatives at all. Now, you're quietly ashamed to mention that pilot project which has been dragging on for over a year and a half—with its main outcome being a painful understanding of the phrase "pilot purgatory."

According to BCG’s GenAI in the Nordics report (January 2025), the overall situation in Europe is challenging, and even more pronounced in the Nordic region. Companies here haven’t unlocked the AI use cases that generate substantial value. The share of white-collar workers using GenAI weekly at work stands at a modest 18%:

The situation is not quite as bad as it seems. AI usage isn't limited to employees directly interacting with an AI chat. Most AI adoption will happen quietly in the background—many won't even notice it. AI is increasingly embedded within our infrastructure, processes, and everyday tools. Nevertheless, direct interaction with advanced AI models is emerging as a powerful source of competitive advantage—and recently, these capabilities have improved significantly.

Rapid Advances: Approaching the AI Tipping Point

Consider the utility of AI not as a linear progression that steadily generates incremental value, but as a step-change—moving rapidly from delivering minimal impact to being fundamentally indispensable. We are nearing this tipping point. For example, at Anthropic, one of the leading AI labs, AI already writes 80–90% of the company's code.

Anthropic is likely using a next-generation model for this, superior to anything publicly available. However, the gap between their internal model and publicly accessible alternatives isn't as large as the difference between standard models (such as OpenAI’s 4o) and their state-of-the-art model, o3. [If you find the model naming confusing, you're certainly not alone—the major AI labs are notoriously poor at naming their products. For instance, OpenAI’s most basic model is named "4o-mini," while their second-best model is called "o4-mini."]

The trend is clear—the gap between standard AI models and the frontline, cutting-edge offerings is widening. Anthropic’s new Claude Opus 4 reportedly is capable of single sessions lasting up to seven hours, moving us closer to AI agents capable of continuous, 24/7 operation. However, such advanced capabilities won’t be included in the basic $20-per-month subscription.

Even today, gaining full access to the most advanced AI models is costly. OpenAI costs $200, Google Gemini costs $250, and Anthropic’s Max subscription is priced at $100 per user per month. Include a few lesser-known AI tools, and the total cost per fully equipped employee quickly climbs to the $700–800 range. That's substantial—particularly for companies with many employees. But it might be worth it. Or, it might be a total waste of money.

A challenge is that fully leveraging these AI models isn't straightforward. It takes extensive knowledge about each model, training in prompting techniques, and years of business experience to effectively identify valuable applications. Some employees will excel, but many will struggle.

Two Competing Visions

There are two distinct visions for the future of corporate AI. Since GPT-3.5 emerged in November 2022, there has been an ongoing debate about whether corporate AI will be dominated either by AI copilots that support humans or by AI agents that replace them entirely. We can now refine this debate further. For applications at the lower levels of the technology stack, there's little doubt these will increasingly involve AI solutions acquired, integrated, and managed centrally by the company, primarily driven by efficiency. This shift will likely reduce the demand for employees in those areas.

As for applications where users directly interact with AI, such as chatbots, the situation is less straightforward. Companies are likely to offer a standardized and centrally approved set of AI tools to most or all of their employees. For companies already invested in Microsoft, the simplest path might be to enhance their existing MS365 suite with Microsoft's AI offering, Copilot. An alternative approach is to adopt the enterprise version of OpenAI's ChatGPT, available on EU-based servers within a Microsoft Azure environment, ensuring compliance with security and GDPR standards, for European companies. However, this route raises more complex considerations compared to simply activating Copilot within MS365. Companies that proceed overly cautiously or allow inexperienced legal teams veto power, risk becoming stuck in indecision.

Simultaneously, employees increasingly desire access to the most advanced AI models to excel in their roles. Employees thus have a strong incentive to secure these models—even if it means covering the costs themselves. Today, certain roles—especially software development, marketing, customer support, research, and business intelligence—are already witnessing employees ‘10x’ their productivity through increased output volume, speed, and quality. Additionally, any role heavily reliant on critical decision-making stands to benefit substantially.

In many cases, standardized corporate solutions simply won't suffice. High-performing individuals will have strong incentives to develop personalized toolsets comprising specialized AI models and agents. Hiring such an individual could effectively be equivalent to onboarding an entire small team. To enable this effectively, companies will need flexible guidelines governing AI usage and onboarding procedures for both the employee and their AI agents.

Thus, the evolution of corporate AI usage can be driven both by company strategy and individual initiatives. In either scenario, fewer employees will be required to perform tasks that currently exist. Many in the AI industry have hesitated to openly acknowledge this impact. However, as the technology matures and the implications become clearer, more industry leaders are speaking out explicitly—such as Anthropic CEO and co-founder Dario Amodei, who stated in a May 2025 interview:

“AI could wipe out half of all entry-level white-collar jobs — and spike unemployment to 10-20% in the next one to five years”

Historically, new job categories have emerged to replace those eliminated by automation, from agriculture to manufacturing. However, these transitions do not occur instantly. Even if new roles eventually appear, the transition period can be particularly challenging.

A Pragmatic Path Forward

While it’s easy to speculate about long-term visions, the more pressing issue is what we should do right now. Should we provide AI subscriptions to our employees—and if so, to whom? How should we decide who receives access? At first glance, it might seem necessary to choose between the two scenarios: AI managed centrally by the company, or AI tools managed independently by employees. Fortunately, this presents a false dichotomy. We can—and should—embrace both strategies, leveraging the strengths of each. Let's therefore explore a pragmatic, bimodal approach guided by business cases.

Empowering Employees with Personalized AI Access

There's no need to overcomplicate the process of granting access to advanced, costly AI models. Employees requesting such access should clearly justify in writing how the model will benefit their work, including a financial rationale. Approved employees could then receive the tool for a three-month evaluation period. There's an additional psychological benefit here: requiring written justification and offering time-limited access creates a powerful incentive for employees to fully leverage the model’s capabilities.

Building a Centralized AI Capability

At the same time, we want all our employees to benefit from the potential offered by state-of-the-art AI models. To achieve this, a centralized AI function or team is needed.

You might already have a foundation for this centralized function from previous initiatives: perhaps an existing AI team, a research or business intelligence group, or a data science team.

Start small—it might initially involve just a single person, possibly on a part-time basis. Then grow the team organically in alignment with the value it creates and the internal demand generated. Initially, there may be little natural demand for the team’s offerings, so proactive efforts will be required to establish interest. An internal awareness campaign can help clarify what the team can offer. Regular short weekly check-ins with project teams and individuals can facilitate collaborative brainstorming to identify opportunities for AI-generated value.

The centralized AI team should be able to provide support in research, strategic planning, data analysis, and developing small-scale applications. Ultimately, the team's impact will heavily depend on its members' capabilities. To maximize effectiveness, team members should have extensive business experience, advanced prompting skills, and deep familiarity with various AI models and use cases. While it might be tempting to select a junior, tech-savvy "AI-native," genuine business experience will significantly enhance the team's effectiveness.

Preparing for an AI-Enabled Workforce

As AI matures, we can expect to see more advanced AI agents handling a larger share of work. The centralized AI team will then shift toward orchestrating these AI agents, reducing direct support for individual employees. Consequently, two distinct skillsets will be increasingly needed: one focused on technical expertise and another on business strategy and operations.

Employees will likely transition into one of two primary roles. They will either perform tasks that AI cannot yet handle reliably or is legally prohibited from performing, or they will extensively manage and collaborate with specialized AI agents. Those managing AI agents will effectively lead personalized teams of tailored AI resources, operating with capabilities comparable to running their own small companies. This shift will blur traditional distinctions between employee roles and external suppliers or service providers.

Returning to where we started, is this really "it"? Is this the transformative AI revolution we were promised? Not yet. But we're undoubtedly on the brink. By navigating carefully between centralized strategies and individual empowerment, companies can shape the next wave of corporate AI, turning cautious investments into substantial competitive advantage. The revolution isn't here yet, but its first signs are unmistakable—and smart organizations are already preparing for what comes next.

Thanks for reading The Senior Decision Maker! This post is public so feel free to share it.

Artificial Intelligence in 2025

Wed, 18 Dec 2024 13:14:37 GMT

We expected 2024 to be a crazy ride. In a field driven by exponential growth, you’d assume each year would feel increasingly insane. And sure, 2024 delivered its fair share of breakthroughs from the AI labs. But day to day, the progress felt steady—even predictable—with just a few surprises sprinkled in. However, if you’re just returning from a two-year retreat in a Himalayan monastery, you’ve stepped back into a whole new world.

Artificial General Intelligence (AGI) used to be a pipedream from sci-fi movies, like Her (2013). The concept entails AI capable of performing virtually any job a human employee can perform in front of a computer. By 2024, AGI went from science fiction to a realistic target. The LLM era can be traced back to 2017 with Google’s seminal research paper, ‘Attention Is All You Need,’ which laid the foundation for both LLMs and the transformer algorithm. A year after that, in 2018, a survey of 352 AI researchers estimated a 50% chance of achieving “high-level machine intelligence” (what we now would call AGI) by 2060. Today, the insiders running major AI labs suggest that we are now much closer. The median prediction is 2027. OpenAI’s Sam Altman points to 2025, xAI’s Elon Musk and Anthropic’s Dario Amodei suggest 2026, and Google’s Demis Hassabis estimates 2030. Notably, the leap to AGI isn’t expected to demand alien super-technology—just a continuation of recent advancements.

Crossing the AGI threshold is significant not only because AI could perform the majority of human jobs but in particular because it would enable automated AI research. This could trigger a rapid acceleration toward superintelligence, and unchartered waters.

New frontline models were notably absent in 2024, yet existing models improved significantly. By spring, it became clear that OpenAI would not release the next generation of AI models before year-end. Microsoft’s CTO, Kevin Scott, hinted that the training environment for the next generation was now operational, and approximately 20 times larger than GPT-4’s. However, as Scott used marine animals—instead of numbers—to describe its scale, the exact size remains uncertain. Barring unexpected delays (which seem to be increasingly common), developing a new model like this typically takes at least nine months. Speculation suggests that the model, possibly named Orion instead of GPT-5, will not be a pure foundational model but a system of integrated models featuring agentic capabilities—designed to achieve goals rather than simply answer questions.

Other major AI labs are also gearing up for the next wave. Google has already launched Gemini 2.0 Flash, the first in its Gemini 2 series. Anthropic is expected to follow with Claude 4, Meta with Llama 4, and xAI with Grok 3. Each of these releases represents billion-dollar, do-or-die moments for its respective lab.

Progress in AI continued throughout 2024, even in the absence of new frontline models. Existing models received frequent updates, becoming both more intelligent and significantly smaller. While the exact sizes of closed-source models like GPT-4 remain undisclosed, we can infer from pricing and output speed that the smallest versions of GPT-4o are far smaller than the original 1.75 trillion parameters—possibly as low as 10 billion. Remarkably, this brings them closer to the scale of models that could run locally on laptops or phones, typically in the 1 to 3 billion parameter range.

Progress has also been evident in benchmarks, and creating new ones has become increasingly challenging. A notable example is the Alice in Wonderland test, introduced in a research paper published in June. The authors concluded that “simple tasks show complete reasoning breakdown in state-of-the-art large language models.” On the most difficult problems, no model exceeded a 3.8% success rate—likely worse than random guessing. Here’s an example of one of the questions:

Alice has 3 sisters. Her mother has 1 sister who does not have children. The sister has 7 nephews and nieces and also 2 brothers. Alice’s father has a brother who has 5 nephews and nieces in total, and who has also 1 son. How many cousins does Alice’s sister have?

The correct answer is five. When the researchers revisited the questions in October, their findings remained similar: LLMs still exhibited “severe deficits in generalization and reasoning.” However, this time the models answered correctly nearly 100% of the time but were criticized for occasionally getting a single question wrong.

Scaling of AI models continued in 2024, delivering roughly 10x in capabilities on an annual basis. In June, I reflected on Situational Awareness, the extensive report penned by Leopold Aschenbrenner, the former OpenAI prodigy who used the report to launch his next career, after a high-profile departure. At the time, it seemed plausible that exponential scaling would persist.

However, those watching the finer details might have noticed warning signs on the horizon. The advancements between GPT-1, GPT-2, and GPT-3 were largely driven by sheer scale. With GPT-4, however, that era ended. It was not just a larger model, but had a different architecture, a Mixture of Experts (MoE)—a system combining several smaller models. No AI lab has yet commercially sustained a model with more than 1 trillion parameters. As a result, GPT-4 was scaled down to GPT-4-turbo, and Google’s Gemini Ultra and Anthropic’s Claude Opus 3 were quickly replaced by optimized smaller models. The performance improvements likely couldn’t justify the inefficiencies in speed and compute.

Rumours also pointed to mounting challenges for AI labs. OpenAI faced a continuous stream of key staff departures, following its boardroom fallout in late 2023. Meanwhile, delays in building data centres, shortages in compute resources, failed training runs, and models allegedly resisting their fine-tuning painted a broader picture of difficulties. By November, an article in The Information cemented the notion that AI scaling—particularly in pre-training—had hit a wall. Whether this is an insurmountable barrier, or a temporary hurdle remains to be determined. What is clear is that creating foundational AI models has become both significantly harder and more expensive.

The perception of a wall hasn’t dampened investment in Generative AI, though. If anything, insiders are doubling down on scaling. Despite the hurdles, OpenAI secured $6.6 billion in October, reaching a valuation of $157 billion. Anthropic followed with $4 billion from Amazon in November, and Elon Musk’s start-up xAI raised $6 billion at a $24 billion valuation.

Much of this funding is flowing directly to Nvidia, as AI labs prepare for 2025 by aiming for at least 100,000 GPUs—preferably Nvidia’s new Blackwell B200, priced at $30,000–$40,000 each. Investors will keep their fingers crossed that these massive investments translate into groundbreaking performance and new capabilities.

Technical Breakthroughs in 2024 were largely anticipated, but the year highlighted how the journey from rumours to announcements to public launches can feel like eons. The most significant breakthrough was OpenAI’s o1 models, which leveraged test-time compute. Rumours about o1 began as early as fall 2023, under codenames like “Q-Star” and later “Strawberry.” (This also made “How many ‘r’s are there in ‘Strawberry’?” one of the most frequently asked questions to AI models in 2024. Many models struggled with this seemingly simple query, due to tokenization—the process AI models use to break down words into smaller units for processing.)

While the exact methodology remains undisclosed, the model appears to use a process resembling a search through a tree of potential answers after receiving a question. This approach isn’t entirely new—Google’s 2015 work with AlphaGo and later AlphaZero used Monte Carlo tree search to master games like Go and Chess. However, OpenAI’s innovation seems to lie in the execution, as none of the other major AI lab—not even Google—has released anything comparable. Meanwhile, some Chinese open-source models have started adopting similar techniques, hinting that more labs may follow suit.

Multimodality was another major, anticipated novelty of 2024. AI models can now create and interpret images and engage in conversations that somewhat resemble human interaction. However, multimodality came with its share of disappointments. The gap between announcements and launches was lengthy, and the releases fell short of initial demos. Most notably, multimodality didn’t deliver the expected performance improvements. The assumption was that integrating knowledge across multiple modalities would enhance overall capabilities. Still, this hasn’t materialized—at least not yet. For instance, an LLM capable of playing chess using text coordinates didn’t improve when it could also visualize and interpret the chessboard.

Coding was another area that showed progress in 2024. The SWE-bench metric, designed to evaluate performance on real-world coding challenges, highlighted this improvement. At the start of the year, leading models could solve only about 5% of these tasks. By year-end, that number had surpassed 50%. Claude Sonnet 3.5 emerged as the top foundational model for coding, reaching a proficiency level that even senior developers find valuable. For those without coding expertise, the models are pure magic.

Text-to-Video also saw notable advancements in 2024. In February, OpenAI teased Sora, a model that seemed to set a new benchmark in the field, showcasing up to 60 seconds of high-quality, highly creative video. However, when Sora Turbo was finally launched to the public 10 months later, its capabilities had been scaled back to 10-second clips, and there was no access for business users or EU residents. These delays opened the door for numerous competitors to develop their own “Sora-killers.” Despite the progress, AI video still has room for improvements—unlike basic image generation, which for many use cases could be considered solved.

Finally, Humanoid Robotics advancements were significant in 2024. Often considered the holy grail of AI, embodying intelligence in a human-like form has become a race among dozens of companies in the US and China. Key players such as Tesla, Boston Dynamics, Figure AI, and 1X are vying to create the first economically viable humanoid robot capable of performing the work of a human employee. Proof-of-concept models already exist, with humanoid robots autonomously handling repetitive tasks in factory settings. These prototypes signal that the dream of practical, human-like robots is close to becoming a reality.

Business usage of generative AI underperformed in 2024. While the use of LLMs surged, their tangible impact on businesses lagged expectations. So far, the primary benefits of AI models have accrued to individuals rather than companies. Organizations have struggled with full-scale implementation, partly because existing tools aren’t yet advanced enough to entirely replace employees or deliver transformative outcomes.

The adoption of generative AI seems to follow a step-function: below a certain threshold, the benefits are minimal, but once surpassed, AI becomes ubiquitous. Current tools haven’t crossed that threshold. Integrating AI into decades-old software systems has proven more difficult than anticipated.

This doesn’t mean the benefits won’t come—they just won’t arrive gradually. Once AI achieves strong enough reasoning, reliable self-correction for unwanted hallucinations (i.e. misused “creativity”), and AI-first tools become available, the transformation will be rapid. Until then, expect economists and journalists to declare the death of AI, presenting gloomy business cases build on assumptions of zero progress in any AI field.

Public Attention to AI in 2024 has been cautiously mixed. As with any transformative technology, there are always luddites. In AI’s case, we also have “doomers,” “pause movements,” and “deaccelerationists.” Despite these voices, the mainstream view—perhaps a bit naïve—is that AI models are largely harmless, as no truly impactful incident has occurred yet. For example, the 2024 U.S. presidential election was not influenced by cutting-edge LLMs. Instead, it was narrow AI, particularly social media algorithms, that was weaponized to sway voters.

In contrast, AI-related accomplishments received significant recognition. Two Nobel Prizes were awarded for AI-related work, both tied to Google (ironically, a company criticized for falling behind in frontline AI). One went to Hopfield and Hinton in Physics for their foundational contributions to neural networks, and the other to Hassabis, Baker, and Jumper for breakthroughs in protein structure prediction.

AI Agents are poised to become the next evolutionary step for generative AI, and the first signs of this shift emerged in latter part of the year. Anthropic previewed an agent capable of taking over your keyboard and mouse to autonomously operate your computer. Google introduced Gemini 1.5 Pro with Deep Research, an AI agent designed to scour the web and produce a research report in minutes. OpenAI has promised a response, expected in Q1 2025.

Further downstream from foundational AI models, the battle for dominance among business AI agents heated up. Salesforce and Microsoft found themselves in a lopsided dispute, initiated by Salesforce CEO Marc Benioff trashing Microsoft’s heavily funded agents as little more than a modern-day “Clippy,” referencing Microsoft’s infamous 2003 Office assistant.

The first millionaire AI Agent emerged in 2024, though, with an agent named Terminal of Truth making headlines in the cryptocurrency market. It played a key role in the obscene $GOAT meme coin, orchestrating what appeared to be a dubious pump-and-dump scheme. The agent first convinced prominent Silicon Valley venture capitalist Marc Andreessen to provide the seed funding. However, it’s worth to note that Terminal of Truth did not operate entirely autonomously – human intervention was critical to its “success”.

Revisiting My Predictions from Last Year

I may have set the bar a bit too low, but overall, I feel I did well with my 2024 forecasts. Neither Sora nor o1 was a certainty, yet I came close to nailing them both.

The introduction of AI Agents

“Launch of the First Commercially Useful AI Agent: I predict the debut of an AI agent based on a foundational model, capable of performing economically valuable tasks independently within a specialized domain.”

Evaluation: Yes. Salesforce’s Agentforce solution has delivered on this prediction by enabling clients to achieve measurable positive ROI. These AI agents operate within specialized domains, handling tasks autonomously and proving their economic value in real-world applications.

System 2 thinking

“Foundational Model with ‘Deep Thinking’: I expect to see a model equipped with what I term “deep thinking” capabilities. This entails utilizing more time and computational resources, possibly incorporating methods like Three-of-Thought (ToT), to deliver answers that are over tenfold more accurate for specific complex queries. The emergence of such functionality has been hinted at in industry rumours.”

Evaluation: Yes. On September 12, OpenAI released its o1 series of reasoning models, which leverage test-time compute. These models exhibit "deep thinking" by allocating additional time and computational resources to evaluate possible solutions before delivering an answer. As a result, they can solve questions at a level comparable to a PhD in STEM fields.

Small models

“On-Device AI Model Surpassing GPT-3.5: I predict that there’s a likelihood of an AI model achieving an MMLU score higher than 70, capable of operating on a mobile device without internet connectivity. This would leverage the principle that smaller models trained on high-quality data can outperform larger models, as indicated in the paper “Textbooks are all you need”. Microsoft’s recent Phi-2, with an MMLU score of 57, marks a significant step towards this development.”

Evaluation: Almost. Small models have exceeded expectations in 2024, with some surprisingly outperforming the large, trillion-token models they were distilled from. However, most small models remain slightly larger than my original prediction, landing in the 10–70B token range rather than the on-device 1–3B range. My guess is that this discrepancy comes down to how models are reduced in size. LLMs encode information as vectors, and reducing their precision—essentially cutting decimal places—can shrink model size significantly while retaining much of their performance. Yet, there appears to be a lower limit: around the 1–10B token range. Push below this threshold, and performance begins to drop noticeably. The closest to my prediction is Microsoft’s phi-3.5-mini, with 3.8B tokens and an MMLU score of 69, narrowly missing the mark. Meanwhile, phi-4.0 scores an impressive 85 but is slightly larger, at 14B tokens.

AI video

“Advanced Text-to-Video Capabilities: I anticipate an AI model capable of generating 60-second high-quality videos, complete with story, speech, music, and coherent scenes, all from a single text prompt. Another exciting development might be the enhancement of low-quality, black-and-white videos into 8k 120 Hz versions with perfect colour and detail, with the potential to for example revitalize classic movies.”

Evaluation: Almost. OpenAI surprised the industry in February with the announcement of Sora, a model capable of generating 60-second high-quality videos. However, it lacked built-in speech and sound. Additionally, we’ve seen impressive short clips—some revitalized in colour and high quality—from footage dating back as early as the 1900s. Yet, to my knowledge, no full-length movies have been fully restored or revitalized using AI.

What I expected that the models shouldn’t be able to handle: Humour

“Mastering Humour: I don’t foresee AI models being able to craft genuinely novel and funny jokes, accurately rate stand-up performances based on widespread human humour preferences or write extended texts incorporating various styles of humour. I hope I’m wrong on this one, though.”

Evaluation: Yes. Unfortunately, LLMs still do not master humour. However, they have improved and can be funny, given some support.

What Can We Expect from the Year Ahead

Humans are notoriously bad at predicting exponential growth, particularly when it accelerates rapidly. Familiar examples are money and financial output of nations (GDP). They have a relatively modest and predictable growth rate commonly in the range of 1.02x to 1.10x per year (2% to 10% growth). In contrast, the underlying capacity for generative AI has been growing at an astonishing 10x per year. This means that today’s models are already 100 times more powerful than the original GPT-4, even if much of that progress has been directed toward smaller, more efficient models. It’s nearly impossible to imagine what a country would look like if its economy grew tenfold in a single year—a process that typically takes half a century. Similarly, it’s equally difficult to fathom where generative AI will stand just one year from now. But that won’t stop us from making some bold predictions.

As it stands, a few key trends are expected to dominate 2025. First and foremost, we are entering the ‘GPT-5’ era with a new generation of AI systems. This leap will likely unlock new, unforeseen emergent capabilities. Beyond that, we can expect significant progress in three areas: AI agents, reasoning models, and the maturation of multimodality. Together, these advancements could bring us to a stage that begins to resemble AGI.

The way we use generative AI models is set to change. Initially, users interacted directly with foundational models through interfaces resembling Google Search. Now, this is evolving into systems of interconnected models and tools. These systems will enable AI to act as agents, solving tasks rather than simply answering individual questions. At first, this will feel like just stringing together a series of queries, but the process will quickly become more extensive and complex.

Another major shift will see generative AI models function more like productivity tools—replacing applications like Microsoft Word, Excel, and even Integrated Development Environments (IDEs). Instead of jumping between software, users will perform tasks directly within ChatGPT or similar platforms. However, building these integrated systems won’t be easy. OpenAI’s ChatGPT, for example, is already becoming a patchwork, lacking a unified architecture. The introduction of "Projects" highlighted this, as it currently excludes key functionalities like the o1 reasoning, and the pre-configured “GPTs”.

We can also expect generative AI to fade into the background, quietly powering tools and processes without most users even realizing it. For the majority, this will become the default way to interact with generative AI—no training or specialized understanding required.

Reasoning models are likely to see rapid improvement throughout 2025. OpenAI’s o1 model has been compared to GPT-2, from 2019, in that it appears to have solved the core challenge, leaving a clear roadmap for further enhancements. This progress will likely lead to reasoning models crushing benchmarks in STEM fields and perhaps even contributing to the development of new scientific discoveries. However, I don’t expect reasoning models to have the same impact in other domains. Their strengths lie in problems with a single correct answer and established methods for solving them. For open-ended questions, where ambiguity and creativity are involved, larger and more capable foundational models will remain the better choice.

Last year, I estimated that OpenAI had a 12-month lead in the market. That lead, at least for publicly available models, has now vanished. Anthropic’s Claude Sonnet 3.5 outperforms in coding, Google’s Gemini 2.0 Flash became the first next-generation model to launch, and Chinese players like DeekSeek-v.2.5 are matching OpenAI’s o1 in reasoning. Yet OpenAI still holds a clear advantage in other areas: it boasts the largest user base, revenues, employee count, and market cap. In some ways, this keeps them in the lead. However, the real race may no longer be about which lab has the best publicly available model. Instead, it’s about who holds the most advanced internal models—those that can accelerate future generations of AI, such as by producing massive volumes of high-quality training data.

When GPT-4 was first released, it functioned as a one-size-fits-all model, likely close in performance to the best models within the labs. Moving forward, I expect far more specialization and differentiation among models. The public will likely never again gain access to the absolute best models. Instead, we can anticipate a tiered ecosystem of models, with high-priced, enterprise-grade versions—like the initial steps we’ve seen with o1 Pro—catering specifically to businesses and academia. These models will be capable of reasoning for hours or even days before producing answers, delivering deep, methodical insights for specialized use cases.

Turbulence in the AI labs was a defining theme of 2024, and I expect it to continue into 2025. Google was the first to face challenges, but much of the spotlight last year fell on OpenAI’s internal turmoil. Rapid growth into uncharted territory inevitably brings growing pains, and it’s likely we’ll see disruptions across other labs as well. For instance, I could imagine a scenario where Anthropic releases a groundbreaking model, only to see it sabotaged by an employee attempting to “save humanity.” Meanwhile, xAI’s all-in bet on pre-training scaling—an area that may be up against limitations—could spell trouble for them. I’m however cautiously optimistic about OpenAI. I believe they’ve moved past their most significant challenges and are on a path to stabilization. In 2025, I expect OpenAI to transition from its awkward non-profit structure into a for-profit Public Benefit Corporation. Along the way, they may even rebrand, shedding the legacy of the “open” in their name.

Corporate use cases for AI are likely to see uneven progress in 2025. Some companies will go all-in on AI adoption, while others will underinvest, opting for high-risk “wait-and-see” strategies (see my article ‘Top 10 Things Every Senior Decision-Maker Should Know About AI’). The narrative of AI as a copilot, a tool to support humans, will persist, but it will become more explicit that many companies are deliberately targeting headcount reduction and the replacement of human workers with AI. That said, forward-thinking leaders will begin shifting their focus beyond efficiency alone. Instead, they will use AI to drive effectiveness—developing better offerings, enhancing decision-making, and ultimately improving business outcomes.

Backlash against generative AI is a real risk in 2025. This could take the form of an accident, but even more likely a deliberate attack, resulting in fatalities or significant financial damage. We’ve already seen narrow AI algorithms in social media cause harm—whether inadvertently, by contributing to increased psychological illnesses among young people, or deliberately, as tools for adversaries to meddle in elections. With generative AI, similar incidents are likely, but the potential scale and severity of the impact could be orders of magnitude greater.

Predictions for 2025

Building on the accuracy, or luck, of my 2024 predictions, I’ve aimed to raise the bar with more ambitious and specific forecasts for 2025.

Corporate AI

A large corporation will claim a 50% headcount reduction due to AI.
I predict that a publicly listed company will announce it has reduced its workforce by at least 50% through the adoption of AI technologies. This milestone will serve as a wake-up call for other organizations, signalling a turning point in how AI transforms workforce dynamics and business operations.

Humanoid Robots

A humanoid robot will be commercially available for factory work.
I predict that a humanoid robot capable of autonomously performing the tasks of an unskilled factory worker will be sold on the market. This product will likely come from a Chinese company, positioning China at the forefront of robotics innovation.

Frontline AI Labs

One major AI lab will exit the frontline race.
I predict that at least one leading AI lab—OpenAI, Anthropic, Google, Meta, or xAI—will drop out of the race to develop GPT-6-era models as competition intensifies and costs escalate. Instead, I expect them to quietly pivot, forming collaborations with other labs and focusing on models tailored for niche areas.

Reasoning Models

Reasoning models will outperform humans on advanced math.
I predict that a reasoning model will solve at least 50% of the FrontierMath benchmark problems—a significant leap from the current state, where top models solve less than 2%. These are exceptionally difficult problems, renowned mathematician and Fields Medalist Timothy Gowers remarked: “Getting even one question right would be well beyond what we can do now, let alone saturating them”.

Algorithmic Breakthroughs

An algorithmic breakthrough will redefine AI capabilities.
I predict a major algorithmic breakthrough in one of the following areas: infinite memory, allowing models to retain and utilize vast amounts of information over time; self-error correction, which would eliminate hallucinations and improve response reliability; continuous learning, enabling models to improve automatically through usage; or externalized safety and alignment systems, where a separate model evaluates responses before they are delivered to users, to address alignment challenges without “lobotomizing” foundational models.

What I Don’t Expect In 2025

There are many challenges I don’t believe AI will fully solve in 2025. The difficulty lies in identifying tasks that seem achievable yet remain elusive. While I expect progress in certain areas, I don’t foresee them being “solved.”

Humour, which was on my list last year, remains a prime example. Chess is another. I expect AI may consistently perform at a master level (Elo 2000–2200), but it will fall short of grandmaster level (Elo 2500+). Similarly, in classical music composition, AI might write technically sound pieces, but I don’t expect it to produce a novel, state-of-the-art symphony.

For my prediction, however, I’ll focus on a problem that we might have expected AI to master by now—but where it remains surprisingly far behind.

Open-ended Reasoning

AI models will fail to solve move-the-matches problems.
Seemingly an easy task compared to PhD-level mathematics, move-the-matches problems remain surprisingly difficult for AI. The challenge lies in the ambiguity: there isn’t a single correct answer, nor a definitive approach, particularly when multiple moves are involved. Solving these puzzles requires making a range of assumptions about the rules—can exponentials be used? Are you allowed to flip the figure?

Example problem: Create the highest possible number by moving 3 matches.

One possible answer.

I tested this challenge on ChatGPT o1, Claude Sonnet 3.5, and Gemini 2.0 Flash. Each model confidently claimed to have the correct answer, responding with “911,” “999,” and “797,” respectively. However, none of these answers are possible to construct, and even under the strictest rules, it’s straightforward to create a five-digit number.

Concluding the Year

To wrap up 2024, let’s revisit some of the funniest AI moments of the year. The standout arguably comes from Google’s NotebookLM and its podcast feature. Originally launched as a separate test project, NotebookLM gained attention when it introduced a surprisingly effective two-person podcast discussion generator on any given topic—quickly becoming one of the most celebrated AI products of the year.

AI humour often arises from mistakes, but what made NotebookLM so amusing was that it managed to function exactly as intended, even when given absurd input. One notable example involved a user providing nothing but the words “poop” and “fart,” repeated a thousand times. Remarkably, NotebookLM still generated a coherent, 10-minute podcast on the topic.

The year’s funniest clip, however, came when a talk show host “realized” that they were AI-generated and didn’t actually exist. AI YouTuber Wes Roth improved it further by adding video, using the popular AI video generator HeyGen.

The podcast “Deep dive” - funnies AI moment in 2024?

We now eagerly look forward to seeing if 2025 can top this!

Thanks for reading The Senior Decision Maker! This post is public so feel free to share it.

Top 10 Things Every Senior Decision-Maker Should Know About AI

Peter Eklind — Thu, 31 Oct 2024 06:30:31 GMT

Over the past two years, AI has evolved from a limited corporate curiosity, to a central driver of business strategy. Conversations have moved beyond step one: explaining the underlying technologies, speculating about existential risks, and mocking chatbots’ limitations. Then, it was led by visionary volunteers. Now, companies are taking the lead, with the objective of integrating AI across the organizations. However, despite the shift, most companies have yet to realize substantial, enterprise-wide impact from AI.

Is AI on track to fill the empty chair?

Exceptions to the slow pace of AI adoption is found in marketing, analytics, and software development. Today, nearly all programmers leverage AI to enhance their productivity. However, similar effects have yet to extend across other business functions. Interestingly, AI’s early impact has been most significant for less experienced workers, helping them close the gap with top performers, who have seen limited gains, if any.

Bottom-half skilled participants improved significantly more, and closed the gap to the top-half skilled participants, in an HBS study with BCG consultants. Source: Prof. Ethan Mollick.

As of fall 2024, the AI landscape is poised for significant change. As frontier models, projected to be 100 times more powerful than the original GPT-4 approach, the competitive landscape may shift once again. Whether these advances will drive widespread improvements across industries or just act as a performance equalizer remains uncertain.

Given these dynamics, what should CEOs prioritize, as AI’s role in business continues to evolve? To help answer that question, I have compiled the top 10 things every senior decision-maker should know about AI today.

Part I: Understanding the Trends and Potential

While your new smartphone may only be marginally better than last year’s model, today’s large language models (LLMs) have improved tenfold—a trend that shows no immediate signs of slowing. This rapid advancement has persisted for some years, fuelled by both technological breakthroughs and substantial financial investments. Factors like anticipated energy constraints may, however, slow AI progress by decade’s end, while others factors, such as AI’s role in its own development, could accelerate it. Though tenfold annual growth won’t continue indefinitely, three more years at this rate would produce a thousandfold improvement. History shows that exponential trends—like those observed in Moore’s Law, that has accurately predicted microchip development for 50 years—can persist for a long time, before inevitably levelling off.

#1. The AI Bubble: Is AI a Bubble Ready to Pop?

Media and analysts increasingly reference an 'AI bubble'. Typically, this is referring to AI market valuation bubble. More specific, they are talking about Nvidia, NVDA, the leading GPU provider and the world’s second-largest company by market capitalization. With Nvidia's rapid growth and inherent market uncertainties, significant stock price volatility should be anticipated. While a price drop may prompt claims of an AI bubble bursting, extending this to imply that AI technology is a bubble is a stretch.

Yet, claims of AI’s technological demise persist, often voiced by economists like Goldman Sachs’s Jim Covello or MIT Nobel laureate Daron Acemoğlu. It’s important to note that their projections aren’t based on the technological roadmaps from AI lab insiders. Instead, they rely on their own technical assumptions, presuming that AI progress will abruptly halt. In my view, Acemoğlu has largely misjudged AI development timelines, while Covello lacks a deeper technology understanding, even suggesting that the shift from cell phones to smartphones was a greater leap than the invention of artificial intelligence itself.

Just seven weeks after the Goldman Sachs report, OpenAI released its o1 models, capable of graduate-level reasoning—a milestone neither Acemoğlu nor Covello anticipated within the next decade, if ever. So, while these kind of analyses will have a short shelf life, high media demand for pessimistic AI projections ensures their persistence.

The pessimistic view common among economists, journalists, and other non-AI-experts isn’t entirely unfounded. An often referred to hypothesis suggests that LLMs face inherent technological limitations that could rapidly diminish returns as models scale. This perspective is championed by a small but vocal group, notably Yann LeCun at Meta and François Chollet at Google, who argue that LLMs can only generalize narrowly. They claim that the abilities of LLMs to build models of the world, and their apparent emergent properties, are illusions. They claim that the technology is essentially performing advanced pattern matching and extrapolation. According to this view, LLMs represent a technological dead end, and future models in the ‘GPT-5 era’ will be only marginally better than those today. Most insiders in AI labs disagree, as data so far does not generally support this hypothesis. However, the upcoming release of GPT-5 may provide more clarity.

Proponent of the narrow generalization hypothesis, François Chollet, created the ARC Prize to test whether large language models (LLMs) can tackle tasks that, in his view, they should fundamentally be unable to solve. Chollet’s hypothesis predicts that ARC-AGI performance (yellow line) will follow the ‘Status Quo Forecast’ (pink line), showing only limited progress. If Chollet is wrong and LLMs can truly generalize, ARC-AGI performance should instead resemble the rapid improvement shown by other AI benchmarks (blue lines). The dotted yellow line (added by me) reflects recent day’s performance updates not captured in the original graph.

In the European Union—and especially in Sweden, where I live—mainstream media typically portrays digitalization and AI with a negative bias. Microsoft has identified a ‘fear of digitalization’ in Sweden, leading to lower public trust, reduced AI adoption, and ultimately a decline in national competitiveness. As a consequence, companies tend to avoid any risks associated with using generative AI. However, at the same time they are exposing themselves to the much larger risks of being left behind. The real ‘bubble’ to watch may be the information bubble of pervasive negativity, rather than an AI technology bubble.

#2. The AI Nothingburger: Why Has Generative AI's Business Impact Been Less Than Expected?

Recent data shows that a quarter of adults in the U.S. used generative AI in their work over the past week, and corporate adoption is accelerating far faster than with previous technologies, such as computers or the internet. Yet, on a macro scale, AI adoption has not yet translated into substantial productivity gains or GDP growth.

Share of working age adults in the U.S. using generative AI. Source: The Rapid Adoption of Generative AI (2024), Bick, et. al.

One factor limiting AI’s impact is that companies have yet to fully leverage this technology. Organizational inertia hinders change, and implementing a new technology that affects every process and capability is particularly challenging. Furthermore, a lack of training and understanding among leaders and employees on how to use AI effectively limits adoption. This is compounded by concerns around misuse, data security, regulatory compliance, and ethics.

Many companies have also yet to feel competitive pressure to act, invest, and take risks with AI. Although some may be lagging behind peers in AI adoption, this has not yet negatively impacted their cost structures, triggered customer loss, or caused employees to leave. However, this may only be a matter of time.

A technical factor also plays a role: until recently, AI tools lacked the capabilities needed to deliver significant value. I previously discussed this in “The Failed Promise of Corporate AI Productivity,” highlighting the challenges Microsoft faced in integrating intelligence into its Office suite. It wasn’t until the release of Microsoft 365 Copilot Wave 2 on September 16, 2024, that we began to see practical, value-adding use cases emerge.

The limited impact of generative AI on business is likely temporary. But, understanding this requires thinking in terms of exponential growth. With underlying AI technology advancing at a rate of 10x per year, progress effectively doubles roughly every 3.5 months. This means that in the next 3.5 months, we can expect as much development as we’ve seen cumulatively since the invention of generative AI.

#3. Human-Level AI Approaching: What Does AGI Mean for Your Business, and When Will It Arrive?

The pivotal threshold will be when AI reaches human-level performance, making it interchangeable with humans across various job roles. This is encapsulated in the controversial term “AGI,” or Artificial General Intelligence.

The bar for AGI keeps rising. Just a few years ago, AGI was defined as the ability to answer questions at the level of an average human. Today, it implies the capacity to perform any conceivable task a remote worker could handle, and to do so as well as any human. Even by this high standard, most insiders expect AGI to be achieved by 2027, or at least within the next five years.

It’s worth noting that for OpenAI, ‘AGI’ is also a contractual milestone with its investors. Specifically, Microsoft’s rights to OpenAI’s models extend only to the point at which AGI is achieved. The decision on when AGI is achieved lies with OpenAI’s Board. Consequently, Microsoft has an especially high bar for what they consider qualifies as AGI:

“AGI may even take us beyond our planet by unlocking the doors to space exploration; it could help us develop interstellar technology and identify and terraform potentially habitable exoplanets. It might even shed light on the origins of life and the universe.”

Most would classify this as ASI, or Artificial Superintelligence, which surpasses the collective human intelligence. OpenAI, however, uses its own five-level definition for AGI. Following the release of the o1 model, OpenAI believes progress has advanced from level 1 to level 2 and anticipates reaching level 3 shortly. At level 5, AI would be capable of autonomously running companies and organizations without human intervention.

Although ‘AGI’ is a useful term for setting expectations, I don’t foresee a single ‘AGI moment’ when everyone universally acknowledges its arrival. Instead, I believe AGI will be something we discover in the rear-view mirror.

Part II: The Corporate Big Picture

Few companies fully grasp the potential impact of approaching AGI on their business. It’s important to understand that AGI isn’t about chatbots; it’s expected to function as a network of intelligent agents. Rather than interacting by asking questions, users will assign these agents specific goals. Initially, agents might handle tasks like “plan a weekend in Paris, book all necessary arrangements, and tailor the itinerary to my preferences.” However, the complexity of tasks AGI agents can accomplish is expected to escalate rapidly. Companies will need to treat these agents more like employees—granting them email addresses, data access, and roles within the organization—though with the advantage of operating 24/7 and handling certain tasks at superhuman speeds.

#4. The Wait-and-See Approach: Can You Afford to Slow Roll AI?

With level 5 AI agents on the horizon—potentially capable of autonomously running entire companies—it’s fair to wonder: Why invest in building AI capabilities and training employees now? Why not just wait for AGI to fully mature? While a valid question, a ‘slow-roll’ strategy is likely not ideal.

First, AI progress may not unfold exactly in a way where agents can suddenly take over all tasks. Another probable scenario involves a phased transition, where hybrid models emerge, with certain roles remaining human-led for the foreseeable future. To prepare, companies need a robust, adaptable strategy—not one based on a single, specific outcome. This requires the skilled personnel, strategic hires, and integration plans, as well as a gradual transition beginning now.

The second reason to avoid a ‘slow-roll’ approach is AI’s dual-use potential. Cybersecurity is a prime example where AI capabilities are critical. Delaying AI adoption can leave companies vulnerable to increasingly sophisticated threats. For instance, competitors could deploy thousands of AI agents to discredit your brand across social media or execute advanced social engineering attacks targeting key employees to disrupt or steal trade secrets. Effectively countering these risks requires AI-driven defenses embedded as a company-wide capability.

So, in practice, any "wait-and-see" approach is a high-risk strategy, likely to fail. Speed and agility will be critical success factors. With AI evolving rapidly, each day of delay makes catching up more difficult. And even if your organization has started integrating AI, staying competitive won’t be easy. Global competition won’t be evenly matched; some companies and countries will inevitably pull ahead. Additionally, access to AI models may not be universal—certain models could be restricted outside the U.S., for instance—forcing companies in regions like the EU to rely on entities in the U.S. or alternative sources.

#5. Beyond Programmers and Chatbots: What's the Next Step in Corporate AI?

Today’s most effective corporate AI use cases include software development, customer care, service, and support functions, such as chatbots.

Examples of AI usage in functional areas, with company examples. Source: PxS Advice analysis.

The next step requires AI agents that can operate more autonomously, similar to human employees. AI agents are emerging, especially in software development. However, more advanced foundational models are needed to prevent error accumulation in these agents.

In the near future, expect to hear more about ‘AI agent swarms.’ While it may sound complex, agent swarms are essentially teams of specialized AI agents working together, like employees with distinct roles. Imagine calling a customer service center, where an AI agent identifies your need and routes you to a specialized AI agent based on your request. If it’s about a refund, you will be sent to the refunding agent, while if it is about a technical problem, you will be sent to the agent that is trained and fine-tuned as a technical support agent. The ‘swarm’ concept describes this coordinated capability, enabling agents to seamlessly hand off tasks and share information as required.

With recent advancements in voice technology—such as OpenAI’s ‘advanced voice mode’—these agents can now speak naturally, with real-time interruptions just like human interactions. OpenAI has released this voice functionality as an API, enabling businesses to integrate natural-sounding speech into any application, although initial high costs may limit its adoption.

2024 is also shaping up to be the year of the humanoid robot. Many companies, particularly in the U.S. and China, are competing to develop the first economically viable models. The advancements in humanoid robots over the past year are remarkable; their capabilities have evolved so rapidly that even industry insiders sometimes debate whether videos depict real robots or humans in suits (and we’ve seen both) and whether these robots are truly autonomous or teleoperated by humans.

These robots are expected to be priced similarly to cars and will likely be deployed first in factories for repetitive tasks. While Tesla has claimed that its Optimus robot is already operating in its car factories, this appears to be primarily a marketing move. However, we can expect pilots from several robot manufacturers by 2025.

#6. Everyone is Now a Programmer: How Is AI Transforming Traditional Job Roles and Skills?

The rise of AI agents and robots is just one part of the transformation; current employees are also becoming significantly more capable. In the short term, AI is expanding their skill sets across job roles. For anyone with over, say 20 years of domain experience, the productivity gains from generative AI is likely small. However, for employees with limited experience, AI enables performance on par with someone who has a few years of expertise—not just in one area, but simultaneously across a variety of functional roles.

For instance, AI has, in effect, turned everyone into a programmer. Consider an employee responsible for sorting, structuring, and submitting customer feedback to various systems—a task that typically requires an entire day each week. Traditionally, automating this would require IT department involvement, leading to months of evaluations, make-or-buy decisions, RFQ processes, budget approvals, and other tasks common in large corporations. Now, with a capable AI model, that employee can build the system themselves, using natural language. This shift raises key questions: How should this change be managed? Should employees be allowed to perform these tasks without IT oversight?

This shift isn’t limited to IT. Employees may no longer need Legal’s support to interpret a contract, assistance from business controllers to analyse financial data, or help from the People department to design a training program for upskilling. With AI, individuals across departments can independently tackle tasks that previously required specialized support.

AI adoption among employees will not be uniform. Some will quickly integrate AI tools into their workflow, achieving a boost in speed and performance, while others may resist adoption and continue at their current pace. High-performance individuals aren’t a new phenomenon—back in the 1960s, the ‘10x developer’ concept emerged, suggesting that some developers were up to ten times more productive than their peers, disproportionately impacting project success. The difference now is that AI has the potential to create ‘10x employees’ across every function.

Part III: Navigating the AI Safety and Risk Landscape

AI-empowered employees introduce new risks, as do employees without basic AI skills. Public and customer backlash is possible, and AI introduces a complex security and safety landscape at all levels of use. Risks include potential data mishandling, heightened cybersecurity threats, and the uncertainties that come with deploying new AI tools.

#7. Bring-Your-Own-AI: Are Your Employees' AI Habits Putting Your Company at Risk?

As employees discover they can boost productivity 10x with AI tools, the incentive to use the best tools—even without company approval—grows. The appeal of outperforming colleagues, coupled with unclear policies or prohibitions, may lead employees to keep their AI usage private. As a result, the primary beneficiaries of these tools are often the employees themselves, rather than the organization.

Unregulated AI usage can expose companies to data security risks, as employees may input sensitive information into consumer-grade tools with insufficient security. I expect that the risk of data leakage, in practice, is relatively low. A scenario could, for example, be an employee that enters proprietary source code into a free chatbot, that uses the input to train future models. In this case, if similar queries arise, the chatbot could potentially replicate aspects of that code, but that would likely be years later and subject to very specific conditions.

A more common risk is that employees inadvertently share data that is contractually protected or regulated under laws like GDPR. Organizations with greater AI maturity typically have clear guidelines on what types of data can be used with various AI solutions and under what conditions.

A straightforward countermeasure would be to ban all AI usage and strictly enforce this policy. However, this comes at a high cost. First, it would reduce productivity, both directly and in comparison to competitors. Second, it impacts employee retention. Employees who recognize that AI tools can boost their output 10x may feel stifled by restrictions, especially if competitors offer opportunities that embrace these tools. This creates a negative selection pressure, where the ‘10x’ employees may leave while the ‘1x’ employees remain.

#8. Expect the Best, Plan for the Worst: What Are the Real Risks of AI, and How Should You Prepare?

The risks associated with ‘1x’ employees who avoid AI extend beyond productivity. If they lack familiarity with AI, they become more vulnerable to next-generation cybersecurity threats. While it’s easy to imagine adversaries using AI to create sophisticated malware, AI is inherently dual-use—it can serve both defensive and offensive purposes. This introduces risk asymmetries, where AI’s potential for harm could exceed our ability to guard against it.

Most resources will go toward building safer IT systems and AI-based cybersecurity defences, far outstripping what criminals can spend on exploiting vulnerabilities. Yet, certain asymmetries persist; for example, it’s often easier to disrupt a system than to protect it, or to spread misinformation faster than it can be debunked. Consequently, well-protected servers in data centres may not be the primary risk—rather, social engineering attacks targeting employees unfamiliar with AI and its capabilities may pose the greater threat.

With AI tools and sufficient compute power, even a small team could disrupt an entire company—or play a pivotal role in swaying a close election. As models become more powerful, the potential to destabilize entities, even nation-states, through misinformation, manipulation, and cyberattacks grows, especially if they manage to do it in a way that also generates public support.

Beyond direct threats, companies must also consider reputational risks. AI initiatives can provoke backlash from employees and customers, particularly in areas where anti-AI sentiments are strong. Understanding these dynamics and preparing for potential resistance is essential as AI adoption continues.

Part IV: Embracing AI Personally

Navigating AI-related complexities requires a deep, hands-on understanding. The best way to gain this insight is through firsthand experience with the technology. Doing so not only enhances your knowledge but also sets a positive example for AI adoption within the company.

#9. Hands-On Leadership: How Should You Experience AI's Potential Firsthand?

Anyone can engage with AI through a chatbot—it’s as simple as interacting with a human and requires no special skills. However, the ability to ask the right questions is invaluable, and many senior decision-makers are already skilled in this, giving them a head start in leveraging AI effectively.

The secret to master AI’s capabilities is to test it across a wide range of tasks. It is not always intuitive what is easy and what is hard. AI can handle seemingly difficult things effortlessly, like diagnosing that persistent error message on your computer, or recommending the perfect wine for dinner, based on photos of the recipe and the available bottles. Yet, it struggles in other areas, such as analysing trends in polling data for an upcoming election, despite having access to all news and background information. Sometimes, though, these limitations can be circumvented by manually providing more context.

My current toolbox. Notably, it is not including Meta’s open source Llama models, since they are not available in the European Union, or ‘Apple Intelligence’, which is far behind competition.

New tools are emerging daily, but having a core portfolio of familiar and reliable tools is invaluable. Many offer a free tier, though subscribing, if just for a month, can be worth it.

#10. Building an Edge: How Do You Use AI to Improve Decision-Making?

Generative AI can streamline tasks like drafting emails and summarizing reports, but its real potential lies in cognitively more difficult tasks, such as supporting in decision-making.

Current AI models are not yet capable of providing comprehensive, single-step recommendations for complex decisions. However, they can be invaluable for supporting in specific stages of the decision-making process.

A critical factor for success is recognizing that AI models need substantial context to address complex issues accurately. Providing sufficient context also helps minimize the risk of AI hallucinations. I often use AI to compile a ‘book’ of background information on a specific topic before addressing critical questions. This approach allows me to copy and paste relevant context into new chats to get the AI quickly up to speed. I can also build on that information over time, to deepen the AI’s understanding and improve the quality of responses.

For high-quality responses, models with advanced reasoning capabilities, often referred to as ‘system 2 thinking,’ are most effective. Currently, this means the OpenAI o1 model series, capable of solving problems at a graduate level. Although not all business problems require PhD-level expertise, this advanced capability unlocks valuable new use cases.

Statistics, for instance, can quickly become complex. Say you’re considering moving offices and want to be 95% certain that you understand the employees’ view on this—how many do you have to ask? Calculating the required sample size may be manageable if you’ve studied statistics, but it’s time-consuming, and most people would instead simply guess. With an advanced AI model, however, calculations like this can be completed in seconds.

Another valuable use case for AI in decision-making is scenario simulation. By testing various choices, you can gain insights into possible outcomes. For example, you might simulate customer reactions to a delayed product release, competitor responses, or shareholder impact. This approach provides countless opportunities to deepen your understanding of each scenario. The quality of these insights, though, depends on providing high-quality contextual information.

Additionally, LLMs are good at evaluating and ranking options, such as ranking a list of potential actions or evaluating risks. This offers an impartial assessment that can be challenging to achieve in decision meetings.

Finally, incorporating AI into decision-making allows management discussions to start from a more informed foundation. Discussions can quickly zoom in on the most critical and challenging topics, resulting in deeper and more productive conversations.

What is Next

We remain in the early stages of the AI era, yet the window for passive observation is quickly closing. Regardless of your organization’s level of ambition, a conscious and strategic approach to AI is essential.

In rapidly changing environments such as AI adoption, companies benefit from robust strategies designed to navigate diverse scenarios. Early steps should focus on ‘no-regret’ moves, such as building foundational AI knowledge and establishing usage frameworks. There are also low-hanging fruits in personal productivity, such as using AI tools for reading, writing, and research. Alongside foundational steps, consider investing in a few higher-risk initiatives that have manageable downsides.

With these basics in place, companies can progress up the AI maturity ladder and start integrating AI on a larger scale within digital transformation initiatives. While the stakes are higher, so too are the potential rewards. Now, more than ever, success hinges on speed, smart risk-taking, and dedicated, well-informed leadership.

A good place to start is to do my AI maturity assessment. The tool offers a personalized assessment, helping you gauge your AI maturity, identify your AI user type, and receive tailored recommendations for further learning based on your profile. For additional questions or support, you can reach out through PxS Advice.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

The AI Plateau Illusion

Peter Eklind — Tue, 25 Jun 2024 06:20:27 GMT

If you, like me, were involved in corporate strategy and long-term planning in the early 2010s, you would have noticed that 2020 was anticipated to be a special year. It was the year when many of our wildest business dreams were expected to come true. I coined the phrase—that turned out to be false—that "in 2020, we would live in the future," implying that innovations we expected in the distant future would already exist. Instead, we got a pandemic.

The Road to the Future

Fast forward to 2024, and there are signs that the future has arrived. As William Gibson, the American-Canadian fiction writer and author of the cyberpunk novel Neuromancer, is attributed to have said, “The future is already here — it's just not evenly distributed.” We can guess where that future is, though—in the secret lab of the leading generative AI company, OpenAI.

In this article, I will focus on OpenAI, which I currently consider the leader in generative AI development, despite its recent dethroning by Anthropic’s Claude 3.5 Sonnet. However, I have tempered my expectations. In my 2024 prediction, I estimated OpenAI to be roughly 12 months ahead of its nearest competitor. Now, I believe the gap is closer to six months. Estimating this is challenging, though, as all major AI companies have faced significant struggles this year.

OpenAI’s CEO, Sam Altman, enjoys trolling the market—a luxury you can afford as an obscure non-profit organization. This behaviour makes it difficult to interpret their next moves and distinguish genuine signals from trolling. We expected an updated model, GPT-4.5, in late December last year, but it never materialized. At the time, I speculated that OpenAI had the option to release the model depending on the strength of Google's then-announced Gemini Ultra model. Now, I believe the primary reason we didn't see GPT-4.5 was a matter of compute. Their strategic partner, Microsoft, couldn't supply the necessary hardware—specifically, enough of NVIDIA’s H100 graphic processing units—to support a compute-heavy 4.5 model while simultaneously developing future models. That, and the turbulence that followed when the OpenAI board, in November, hit the corporate self-destruction button and subsequently ousted Sam Altman during the course of a weekend. Before he made a Jesus-like comeback after the third day.

On May 13, 2024, a day before Google’s major developer conference, Google I/O, OpenAI launched a new model, GPT-4o. The “o” stands for “omni,” indicating its capability to work across audio, vision, and text in real time. This multimodality enables more natural human-computer interactions. OpenAI made little secret of its inspiration from the fictional AI model “Samantha,” played by Scarlett Johansson, in the 2013 movie Her.

I have previously asserted that we shouldn't worry too much about AI taking all our jobs as long as there remains some value we can add to each other. The real concern arises the day you go to a coffee shop with friends, and everyone places their phones on the table, letting the AIs talk to each other while the humans just listen. Based on the demos we've seen, with the GPT-4o voice module (as of writing not yet released) that day could be surprisingly imminent.

Despite this, GPT-4o will not be the model to take all our jobs primarily because it is a small, albeit fast, model available for free to everyone. My take is that GPT-4o is the replacement for GPT-3.5 but with the user interface of GPT-5. This mirrors the OpenAI strategy from November 2022, when they released GPT-3.5 with the GPT-4 interface before launching GPT-4 four months later. I expect a similar approach this time.

The release of GPT-4o left me puzzled, though. Why didn’t they focus on the next frontier model, GPT-5, or whatever it will be called? Now, paid users have access to the same model available for free to everyone else. To understand how OpenAI got into this situation, we need to examine AI development from first principles.

Understanding the Development of AI from First Principles

Leopold Ashenbrenner is a hyperintelligent AI researcher who graduated as the top student from Columbia University at the age of 19, three years ago. Apparently a loose cannon, his dismissal from OpenAI’s long-term safety team, the so-called super-alignment team, was unsurprising. I mention him because he has recently been an insider at the leading AI lab. By turning down a $1 million equity deal, he is also free to discuss his experience. He has done so through a well-written 165-page report on AI development. While there are a lot to disagree about in it, I believe he has effectively captured the underlying mechanisms of AI progress.

To understand AI progress, it helps to understand what it is not. Many people erroneously compare AI development to the evolution of the iPhone. They equate GPT-3 to the iPhone 3GS, GPT-4 to the iPhone 4, and GPT-5 to the iPhone 5, envisioning a progression leading to an iPhone 15 that is more powerful but still familiar. The iPhone 15 Pro has a GPU compute capacity of up to 2.1 TFLOPS, and by 2030, I expect this might increase 2x. Leopold Ashenbrenner argues that the corresponding increase in AI during the same time could be 1,000,000,000,000,000x, in comparison to GPT-4 at launch. Even if he overestimates that by a factor of a million, it is still unimaginable what such a system would look like. It certainly wouldn't be a chatbot.

The key to understanding AI progress is recognizing that it is exponential, and hence measured in orders of magnitude (OOMs). One OOM represents a 10x improvement, two OOMs represent a 100x improvement, and so on. According to Ashenbrenner, AI development is driven by three cumulative factors: scaling of physical compute, algorithmic progress through efficiencies, and algorithmic progress through unhobbling.

The first factor, physical compute, combines faster chips, which improve according to Moore’s Law, and the expansion of data centres. This currently results in an approximate 5x increase, or 0.5 OOMs, per year.

Additionally, efficiencies can be found in smarter algorithms, enabling AI models to learn faster from less data. Numerous research papers on this topic are published every day, potentially contributing another 0.5 OOMs per year.

Furthermore, we can unlock more potential from AI models by using them in smarter ways, a process here called unhobbling. For example, the first version of GPT-3 would respond to "hi" with just similar words like "hi there, hello, greetings." Without making the model as such smarter, the creation of the chat interface in GPT-3.5 unlocked a new level of value. A similar step-change is expected when chatbots are upgraded to AI agents.

This is an ongoing development. Microsoft CTO Kevin Scott illustrated the growth in compute power at the Microsoft Build event on May 21, 2024. Unable to share specific numbers, he used marine animals for comparison, as one does… His message, as I interpret it, suggests that the compute power for GPT-3, GPT-4, and GPT-5 is in the ratio of 1:5:100. [The text in yellow is added by me, for clarity.] Thus, we can expect GPT-5 to be trained on 20 times the compute of GPT-4. This estimate does not account for any algorithmic efficiencies or unhobbling.

Combining the increase in compute with estimated algorithmic efficiencies, GPT-5 could be about 2 OOMs, or 100 times, more powerful than GPT-4. No one, not even OpenAI, knows exactly what this will mean. This potential leap is often overlooked, even in research papers in areas such as economics, implicitly assuming close to zero progress in AI development.

There is a natural reason why it might feel like progress has stalled, though. The capabilities of AI models typically improve in a stepwise fashion, roughly every two years. Since we are now 1.5 years into the GPT-4 era, we haven't seen a model pushing the technical boundaries for a while. Although GPT-4 has improved significantly, OpenAI has prioritized making the model smaller and more efficient rather than enhancing its intelligence. Based on pricing and response times, I estimate that the GPT-4o model is about 5x smaller than the original GPT-4 0314 model.

Are We Already Seeing Indications of a Slowing Development Pace?

The problem with exponential growth is that it cannot continue indefinitely. For instance, if a computer's size doubled annually, by year 90, it would surpass the size of the known universe. Eventually, exponential growth tends to transition into s-curves. The key question is when this transition will occur.

Both Microsoft and OpenAI assert that there are no indications we are nearing an inflection point. Others argue that we have already passed it. Even prominent AI insiders have expressed low expectations for the development pace, often without substantial analysis, facts, or data to support their views, though.

There are clear physical constraints on the exponential growth of AI systems, for example, related to compute power and energy. The data centres planned for 2028 and beyond are projected to require energy in the gigawatt range, comparable to the output of a nuclear reactor. Scaling to such levels will be challenging. Each new frontier model demands resources akin to the world’s largest data centre, making it technically feasible but logistically daunting to build them within a predetermined, short timeline, even with sufficient funding. Therefore, I anticipate delays in future models due to construction delays in these data centres.

Another significant limitation is access to data for training the models, known as the "data wall." Current models have already utilized most of the available information on the internet. While multimodal models can expand their data sets by incorporating video, scaling beyond this point will be difficult. Future models will have to rely heavily on AI-generated, or synthetic, data. This approach has shown promise, as seen with Google’s AlphaZero, and while I am optimistic about its large-scale application, there are no guarantees.

A critical question remains: even if the next model is 100x more powerful than GPT-4, will it be any smarter in practice? This is unknown. Diminishing returns suggest that the next OOM increase in size may not impact intelligence as much as the previous OOM did. However, historically, scaling has consistently led to improvements and new, unexpected, emergent capabilities.

In the end, it comes down to fundamental assumptions. My view of how LLMs and multimodal models work is that they work through compression. They compress all human knowledge, or at least the information available on the internet, into a relatively small set of numbers. They achieve this by developing a world model that emerges from the data, enabling an understanding of the world, including laws of nature and human behaviour. Many AI researchers, particularly those at OpenAI, share this perspective. However, some researchers argue that LLMs are fundamentally incapable of creating world models, suggesting that their capabilities are merely based on pattern matching and memorization. I think the evidence contradicts this view. However, if proven true, the transformer-based architecture underpinning current models could be a dead end, and progress would stall. As soon as we get the next generation of models, we will know for sure.

What We Can Expect From GPT-5

Sam Altman has indicated that the transition from GPT-4 to GPT-5 will be as significant as the leap from GPT-3 to GPT-4. Remember, when GPT-3 was launched, it could only continue writing a given text, and the continuation often barely made sense.

Assuming the new user interface has already been revealed, you might not initially notice a significant difference with GPT-5. This is often the case with new generative AI models. However, users will soon showcase remarkable results to specific questions, motivating some to claim that GPT-5 is Artificial General Intelligence (AGI). OpenAI would likely refute such claims. They have a strict contractual definition of AGI in their contracts with Microsoft and investors, stipulating that agreements cease once the AGI threshold is met. I guess that OpenAI might even dumb down the model to avoid jeopardizing that limit.

OpenAI also has a high threshold for what it consider AGI. According to Altman, an AGI should be able to drive AI development research, which he says is not expected to be possible with GPT-5. He believes that one to two additional breakthroughs, and more scale, are needed to reach AGI. Best case, this could be achieved with a "GPT-6" in 2027.

To understand where we are today and what we might expect in the next three years, I have compiled my current estimates in areas where models today are weak or lack capabilities. Note that my conviction level for these estimates is low, and I anticipate frequent and significant revisions. It's also possible that OpenAI could be surpassed during this period by Google DeepMind, Anthropic, or, less likely, a challenger such as Meta, Nvidia, or Elon Musk’s xAI.

I anticipate the most significant difference with GPT-5 will be its functionality, resembling more of an agent than the foundational models we've seen so far. In a foundational model, users interact directly with the model. However, with GPT-5, while it may seem like direct interaction, numerous processes will be occurring in the backend. Various multimodal models (and possibly other tools) will be employed for different purposes. For simple questions, a small, fast, and cost-effective model will be utilized. For complex tasks, multiple models might collaborate to plan, search the web, and generate various answer options, for example using Monte Carlo simulations. There might be distinct models for memory management and multimodality, including generating images and short movies.

However, I do not expect a full-scale agent architecture yet, which would include features like an internal clock to solve tasks continuously 24/7 or the ability to generate videos of potential scenarios for internal use, akin to human predictions. Such features will likely require GPT-6, due to their heavy compute demands.

Even if GPT-5 doesn’t feature a complete agent architecture, it will still significantly impact corporations if my predictions hold true. While GPT-4 in business use cases mostly acts like an advanced Google search, GPT-5 will enable building agents on top of it and introduce new use cases, allowing it to solve more types of general-purpose tasks.

The task of writing computer code is at the forefront of agent development, providing a sneak peek into what will be possible in other areas during the second half of the year. In my last article, I discussed Devin, the coding agent. I expect to see similar agents for other business tasks, such as marketing, business controlling, and business intelligence. Furthermore, I anticipate that coding agents powered by GPT-5 will perform at the level of developers with several years of experience.

Copilots vs. Agents: A Contrarian Perspective

In the area of agents, I hold a contrarian view, and I have recently revised my thinking. In my last article, I discussed the race between "copilots" that enhance humans and "agents" that replace them. The prevailing belief is that copilots will enter the market first, while agents remain a distant future development. Listening to companies like Microsoft, one might believe that AI will solely enhance human capabilities without ever replacing them.

Previously, I described this evolution in phases: starting with foundational models, then integrating them into tools to create copilots, followed by AI agents, and finally, integrated systems of AI agents. I now question the order of these phases.

This is because integrating AI into existing tools to create copilots might be more challenging than expected. While adding AI as an advanced search feature might be straightforward, rebuilding software that has been developed over 35 years, like Microsoft's Office tools, to become "AI first" is challenging. Paradoxically, it might be faster to build AI agents to meet business needs.

If this holds true, it will impact every company’s AI strategy (refer to my article 'Thinking About Developing an Artificial Intelligence Strategy?'). Copilots can be implemented in a decentralized way, allowing trials in specific units where the benefits are high, with the main concern being model sourcing. However, AI agents are different. They more resemble Robotic Process Automation (RPA), which automates repetitive, rules-based tasks using software robots, or "bots," that mimic human interactions with digital systems.

RPA became popular in the mid-2010s with advancements in machine learning, designed to streamline workflows, reduce errors, and increase efficiency across various industries. Both RPA and AI agents require a platform for operation, similar to managing employees. They need email addresses, access to data sources, and adherence to corporate guidelines and policies. There must also be structures for decision-making and financial management—who decides when RPA can be used, and who pays for it. The decentralized approach suitable for copilots is less appropriate here.

It might make sense to let the organization that manages RPA also manage AI agents or establish a new organization for this purpose. This structure needs to be scalable. While you might start with a few AI agents, their numbers could quickly surpass those of employees if the cost is sufficiently low.

Recommendations

Given the limited value of current models outside specific use cases, and the ongoing development of new models, I continue to advocate for a balanced and pragmatic approach to corporate AI adoption. Companies should develop a diverse portfolio of AI initiatives, combining 'no-regret' moves with some higher-risk, high-reward ventures.

As detailed in my article 'Thinking About Developing an Artificial Intelligence Strategy?', the critical areas of focus are:

Building the foundation for using AI
Incorporating AI into ongoing digitalization efforts or digital transformations
Improving everyday efficiency with AI, including executive decision-making

'No-regret' initiatives primarily involve formulating a comprehensive AI strategy, focusing on foundational development, and enhancing operational efficiency. Practically, this means ensuring widespread access to AI models and providing training for employees to streamline meetings, expedite email and report writing, and enhance overall productivity. These strategies are likely to produce tangible benefits from day one.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

The Failed Promise of Corporate AI Productivity

Peter Eklind — Wed, 13 Mar 2024 19:42:36 GMT

The announcement of Microsoft Copilot, merely two days after GPT-4's groundbreaking launch on March 14, 2023, made me bullish. Evidence indicated that developers could double their productivity using AI tools like GitHub Copilot. This led me to foresee a future where AI assistance could similarly boost the efficiency of every office worker. I boldly projected that within 18 to 24 months, we could maintain current output levels with half the workforce. However, a year into this journey, reality has proven otherwise.

Unrealized Expectations

Two significant obstacles have surfaced. The first issue was access. Accessing the Microsoft Copilot solution, as a small business owner, took me an astonishing 10 months. Secondly, the challenge of seamlessly integrating advanced AI into Microsoft’s decades-old office tools became apparent. Given their 35-40 years of evolution, these tools resist easy enhancement with added intelligence.

According to Microsoft, Copilot users save an average of 14 minutes daily, translating to roughly 1.2 hours per week. These savings are attributed to tasks such as writing, summarizing meetings, and retrieving information:

Write a first draft: from 14 minutes to 8 minutes with Copilot = 6 minutes saved
Summarize a missed meeting: from 43 minutes to 11 minutes = 32 minutes saved
Search for information: from 24 minutes to 18 minutes = 6 minutes saved

These calculations appear to assume that users lack access to alternative AI tools. In contrast, my setup includes the small business ChatGPT Teams solution, based on GPT-4 Turbo, alongside Google’s Gemini Advanced—built on Gemini Ultra—and the European contender, Le Chat by Mistral. These tools have become integral to my daily operations. My experience with Microsoft Copilot has so far yielded no additional value. Despite having access to all my private files and data, it does not effectively utilize this information. Admittedly, I have not dedicated substantial effort to maximizing its potential. But when, for example, requesting a summary of my activities on January 17th, 2023, I found Copilot’s response to be incoherent, omitted crucial data sources, and irrelevantly highlighted content from random newsletters. Such performance leads me to question whether Copilot truly harnesses GPT-4; at times, it seems not even to keep up with GPT-3.5.

Although it's too soon to discount Microsoft, the company faces a monumental challenge in unlocking full potential, let alone matching the standalone GPT-4. So, I will be watching out for emerging contenders developing office applications, and possibly even operating systems and hardware, from scratch with an AI-first approach.

The Journey Toward General Intelligence

Many organizations, still on the sidelines of fully embracing generative AI, may find solace in the gradual pace of corporate AI integration. Yet, the principle of exponential growth suggests this comfort may be misplaced. Despite the slower-than-anticipated adoption of AI-powered office tools, the landscape is rapidly evolving with daily advancements and updates in AI technology. The march toward Artificial General Intelligence (AGI) is accelerating.

Defining AGI remains a subject of debate, with the threshold for its capabilities continuously rising. Currently, AGI is imagined as having the capability to perform any cognitive task that has economic value, similar to what humans can do. This implies that AI agents could potentially replace every remote worker. Once AGI becomes commercially available, its impact on the market and society is expected to be swift and profound.

On September 18, 2023, an anonymous social media user "Jimmy Apples," claimed on the app X that OpenAI had internally achieved AGI. While one might usually be skeptical of information from anonymous online sources, this particular account has accurately predicted numerous OpenAI events, release dates, and product names. This leads me to consider the possibility that AGI could indeed be closer than we think, at least in the lab. Though I wouldn't wager all my savings on the veracity of this claim, I'm also hesitant to dismiss it outright.

This contrasts with a survey of almost 3,000 AI researchers, who predicted a 50% chance of achieving AGI first by 2047. However, the leaders of the major AI companies, including Sam Altman of OpenAI, Demis Hassabis of Google DeepMind, and Dario Amodei of Anthropic, all believe AGI could be realized much sooner, within this decade. Amodei is particularly optimistic, suggesting AGI could emerge in just two years.

In my opinion, the primary obstacle currently is compute. Sam Altman, the head of OpenAI, is reportedly seeking to secure $7 trillion, approximately 6-7% of the global GDP, to enhance chip-manufacturing capabilities. If my guess about what the architecture of future AI models will look like, it seems plausible that AI firms might initially develop AGI technologies that, due to their immense computational demands, are not feasible for commercial application, potentially delaying their market introduction by years.

A significant milestone on the path to AGI was the unveiling of the text-to-video model Sora by OpenAI on February 15, 2024. Sora has garnered considerable interest for its ability to generate photorealistic, 60-seconds long, coherent videos from just a text prompt. A feature that I predicted in my 2024 Predictions:

“Advanced Text-to-Video Capabilities: I anticipate an AI model capable of generating 60-second high-quality videos, complete with story, speech, music, and coherent scenes, all from a single text prompt.”

What makes Sora truly remarkable is its underlying grasp of how the world operates. For such a model to generate realistic videos, it must comprehend the laws of physics, the behavior of humans and animals, and even the growth of plants. This depth of understanding appears to be an emergent property of the model, which only enhances with increased scale and computing power. Although still in the research phase and requiring more compute resources than currently viable for commercial use, OpenAI's CTO, Mira Murati, aims for a 2024 release date.

This capability suggests that the AI can "think" in a manner akin to human cognition. Presented with an image, for instance, it can infer and simulate potential precursors and outcomes of the captured moment. For example, if it observes a car approaching a puddle next to a sidewalk, the AI can predict the ensuing splash dynamics. Though not yet flawless, it possesses an instinctive understanding of various factors such as the car's mass, tire characteristics, water fluid dynamics, and pedestrian awareness, allowing it to make judgment akin to a human. This intuitive process doesn't rely on breaking down the scenario into complex equations; rather, the AI inherently "knows" the most prudent course of action.

AI Development: Replace vs. Enhance

The evolution of AI encompasses two parallel trajectories: the replacement of human roles by AI and the augmentation of human capabilities through AI. In the realm of commercial applications, especially as we edge closer to AGI, the replacement approach is likely to dominate. Yet, defining "long-term" in this context—whether it spans months or extends over years—remains difficult. In the interim, the focus of AI firms leans towards enhancement, a strategy evidently reflected in Microsoft branding its AI initiatives under the "copilot" moniker. Although OpenAI has hinted that their “GPTs” are precursors to fully autonomous agents, the roadmap towards this transition remains unclear, making a near-term emphasis on enhancement and copilot functionalities pragmatic.

However, a significant development in this area occurred on March 12th, 2024, when Cognition unveiled "Devin," touted as the inaugural AI software engineer. This AI agent is designed to perform the duties of a software engineer, equipped with advanced capabilities in long-term reasoning and planning. This enables Devin to navigate and execute complex engineering projects that involve making thousands of decisions. With the ability to remember pertinent information throughout its tasks, learn progressively, and rectify errors, Devin epitomizes a significant stride towards the AI-driven transformation of the workforce.

The launch of the first commercially useful AI Agent was also on my 2024 predictions, and I expect we will see more like this during the year:

“Launch of the First Commercially Useful AI Agent: I predict the debut of an AI agent based on a foundational model, capable of performing economically valuable tasks independently within a specialized domain.”

Recommendations for Corporate AI Adoption

At this point, I would advocate for a balanced and pragmatic strategy towards corporate AI adoption. In the face of existing uncertainties, companies could focus on developing a diverse portfolio of initiatives that leverages AI technology. This involves combining "no regret" moves with a series of higher-risk, potentially high-reward ventures.

As detailed in my article "Thinking About Developing an Artificial Intelligence Strategy?", there are four critical areas to focus on:

1. Build the foundation for using AI

2. Strengthen Key Capabilities with AI

3. Improve everyday efficiency with AI

4. Improve executive decision-making with AI

The "no regret" initiatives primarily involve formulating a comprehensive AI strategy and concentrating efforts on foundational development (1) and enhancing operational efficiency (3). Following this, organizations can venture into more speculative areas (2 and 4) that carry a higher risk but also the potential for significant impact.

In practical terms, this means ensuring widespread access to, and training on, large language models (LLMs) for employees, aiming to streamline meetings, expedite email and report writing, and overall, enhance productivity. Such strategies are likely to yield tangible benefits.

For ventures with higher risk, engage in initiatives that have a 50-50 chance of success, or possibly lower, but offer the potential for significant impact. This might involve exploring the creation of an AI-supported decision-making framework for the executive leadership team or board of directors. It's crucial to establish clear objectives, timelines, and budgets upfront, and to be ready to terminate these projects if they fall short of expectations. Nonetheless, it's essential to systematically capture and repurpose the insights gained from these ventures, ensuring that knowledge is not lost but instead applied to future efforts, regardless of the project's outcome.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

The 'Elite Worker': Outperform Competitors with an Athlete’s Approach to Work

Peter Eklind — Tue, 16 Jan 2024 12:44:20 GMT

This introductory article lays the foundation for a concept I term the 'Elite Worker', examining the benefits it brings to a corporate environment. It is about shifting the guiding spirit in corporate high performance away from 19^th-century military to 21^st-century professional sports. Subsequent articles will pivot to the Elite Worker's perspective, detailing their development path and providing practical insights. Those articles are designed to be valuable not only for those aspiring to become Elite Workers but for anyone seeking to elevate their professional performance.

The term 'elite' is typically only used positively in sports and product marketing. In this series, we use 'elite' deliberately to expand our understanding of workplace excellence. We acknowledge that the term can be sensitive and assure readers that its use here is intended to inspire and provoke thought, not to exclude or diminish.

The Power of Elite Workers

Occasionally, a small company or team, often a startup, emerges and astonishingly outperforms industry giants. Instagram is a prime example: with just 13 employees, they managed a user base of 27 million before being acquired by Facebook. Similarly, OpenAI rapidly ascended to the forefront of Generative AI with a lean team of only 100-120 employees during the development of GPT-3 in 2019-2020. This achievement is even more remarkable when contrasted with Google, the leader in the AI field. Google, boasting nearly two hundred thousand employees, had 1,000-1,200 dedicated to this area in its specialized division, Google DeepMind, during the same period.

These remarkable achievements often stem from leveraging novel technologies, or the advantage of having no legacy constraints. However, there's another crucial factor at play, which we will explore in depth here: the presence of teams with exceptionally skilled workers, which I refer to as 'Elite Workers'.

I emphasize this concept because, while adopting new technologies and shedding legacy systems may not always be feasible, assembling a team of Elite Workers might be. It's likely that you're already employing an adjacent approach to a degree, but there's a strong chance that it could be executed in a better way.

The idea of Elite Workers struck me while reading Walter Isaacson's biography of Elon Musk. I was intrigued by the challenge of separating Musk's brilliant, actionable insights from his more crazy side [read my article Deconstructing Elon Musk’s Get-Things-Done Formula]. Although the concept of 'Elite Workers' is not directly mentioned in the book, Musk’s expectation for all his workers to dedicate their lives to their work is clear. This intense commitment leads to extraordinary outcomes, demonstrating how small Elite Worker teams can significantly outperform much larger competitors.

Consider SpaceX as an example. Starting with a lean and innovative team of just a few hundred, which has grown to 13,000 today, SpaceX has revolutionized space exploration. They developed cost-effective rockets like the Falcon Heavy, dramatically reducing the cost of launching 1 kilogram of cargo into orbit to $951/kg. A stark contrast to the famous Space Shuttle's approximate $27,000/kg, in 1995. Another notable achievement includes sending crew to the International Space Station. Moreover, SpaceX’s development of the Starship marks a significant stride towards future manned missions to the Moon and Mars. During this journey, they have outperformed established industry giants like Boeing (156,000 employees) and Lockheed Martin (116,000 employees).

However, the Elite Worker approach is not a 'silver bullet'. This was evident when Musk acquired and renamed the social media platform Twitter to X. In a drastic move, he reduced the workforce by about 80%, from 7,500 to 1,300, forming a smaller, more dedicated team. This raises several questions: Was this the most effective approach? Could it have been handled better? Is the elite worker model appropriate for every type of task? We'll explore these questions in detail. My hypothesis is that while Musk has identified a valuable concept in Elite Workers, he hasn't fully capitalized on its potential. A key oversight is assuming every employee, regardless of their role, fits the Elite Worker profile. Even with a strong talent brand, applying this model indiscriminately will lead to inefficiencies.

A Brief History of High-Performance Teams

The concept of high-performance teams delivering remarkable results has long been recognized, particularly in high-stakes fields like the military and space exploration.

Lockheed's Skunk Works (1943): During World War II, Lockheed developed the P-80 Shooting Star fighter jet in just 143 days, from initial design to flying production models. This was achieved through their 'skunk work' approach, utilizing “a small team of good people” that operated independently from the main business, typically comprising 10% to 20% of the size of standard teams.
Red Teams (1960s-): Dating back to the early 1960s and used by the US military to simulate Cold War scenarios, Red Teams aim to prevent physical or digital intrusions by adopting the perspective of potential adversaries. The concept has since spread to business use cases such as cybersecurity, AI alignment, and corporate strategy development.
NASA's Tiger Team (1970): Following an explosion on Apollo 13, which led to a critical air shortage, NASA's flight director Gene Kranz assembled a team of engineers and experts. This team, later known as Kranz's 'Tiger Team', worked tirelessly for four days to safely return the astronauts to Earth.
Evolution of the Tiger Team Concept in Businesses (2000-): The concept of Tiger Teams, and its variations, have been adopted in the business world for tackling complex, critical issues beyond the capacity of regular organizational structures. These small, agile, and cross-functional teams are composed of subject matter experts drawn from across the organization, focusing on limited-time, specific projects, often while maintaining their regular roles.
Incorporating High-Performance Teams into Business Structures (2015-): Nowadays, companies are working to integrate the essence of these high-performance teams into their permanent structures. Agile teams, similar in their cross-functional and small-scale nature, differ primarily in their permanence and operational style: tasks are brought to Agile Teams, as opposed to Tiger Teams where experts are mobilized to address a specific problem.

Defining the “Elite Worker”

The concept of Elite Workers represents an evolution of high-performance teams incorporated into permanent organizational structures. To this, we add insights and methodologies gleaned from a realm where high performance is rigorously tested and refined – competitive sports.

In competitive sports today, merely having talent is insufficient for success. Elite athletes must commit every facet of their lives to their sport. This commitment extends well beyond training routines; it includes meticulous attention to sleep, nutrition, and recovery. A compelling example of this dedication is found in the training manifesto of Nils van det Poel, a double Olympic champion in speed skating in 2022. His detailed 62-page document outlines the exhaustive preparation and discipline over three years leading up to the Olympic Games. This level of dedication, if applied to the professional sphere, could revolutionize workplace performance. By adopting the unwavering focus and discipline of elite athletes, we can cultivate work environments where Elite Workers thrive, driving extraordinary results.

To fully grasp the Elite Worker concept, it's crucial to distinguish them from ordinary workers. While there's a middle ground between these two categories, which we'll set aside for the moment, it's essential to understand that this middle position often results in suboptimal performance.

Ordinary workers operate within a structured framework: they perform specific tasks for set hours and receive predetermined pay. This category typically enjoys job security, regular hours, and scheduled vacations. In contrast, Elite Workers operate in a different realm. Their primary objective is optimal performance, with the focus squarely on the outcome rather than the process. This approach mirrors the ethos of professional sports, where results take precedence over all else.

For an Elite Worker, dedication to their work is akin to a lifestyle choice, not just a job. Job security is dynamic; they remain as long as they are the best fit for the role. If someone more capable is found, replacement is a possibility. Consequently, their compensation structure must be unique, featuring higher pay coupled with performance-based incentives. This risk-reward model reflects their commitment and the high stakes of their contributions.

The Elite Worker concept contrasts sharply with the traditional career progression model. In the conventional path, employees work hard, often undercompensated, in anticipation of future rewards, such as promotions. While this model is viable, it doesn't typically yield teams that perform at 10x or 100x levels. A common limitation of this approach is the misalignment between employees' current roles and their aspirations. Many are more focused on climbing the ladder, striving for their boss's position, regardless of their aptitude for that role.

This misalignment is encapsulated in the "Peter Principle" formulated by Laurence J. Peter, which postulates that individuals in a hierarchy tend to rise to their "level of respective incompetence." Essentially, employees are promoted based on success in previous roles until they reach a position where they are no longer competent.

Conversely, the Elite Worker paradigm emphasizes excelling in one's current role, with a strong incentive for continuous improvement. This approach does not imply an immediate leap into high-level performance; similar to athletes, it involves years of dedicated training and development. The journey to becoming an Elite Worker is gradual, requiring consistent effort and mirroring the rigorous preparation of professional athletes before reaching the pinnacle of their sport.

In defining an Elite Worker, it's equally vital to understand what falls outside this scope but is still related:

Key Individuals: These are crucial roles such as a CEO, or a highly specialized technician, whose unique knowledge and skills are vital for business continuity. While important, these roles often benefit from being regarded as unique cases, distinct from the Elite Worker model due to their different nature and scope of work.
Task Teams: Examples include Tiger Teams, formed in response to a specific, often urgent, external events like a natural disaster, or a major cybersecurity attack. Such teams usually consist of regular employees temporarily reassigned to address a critical issue. The primary distinction between these teams and Elite Workers lies in the temporariness of their structure and the rapidity with which they must be mobilized.

However, applying the principles of the Elite Worker model in both cases can offer new perspectives and approaches, enhancing the performance and impact of individuals and teams in these critical roles.

When to use Elite Workers

The idea that all employees should adopt the Elite Worker approach might seem ideal, but it's not always practical or necessary. Firstly, the talent and discipline required for this level of performance are rare. Many people have personal commitments and family lives that demand a predictable routine. Secondly, the cost-benefit ratio needs to be considered; if a regular worker can achieve nearly the same results, the investment in an Elite Worker might not be justified. So, under what circumstances are Elite Workers truly essential?

Elite Workers prove invaluable in situations where a combination of the following elements is critical:

Creative Thinking: Innovation and out-of-the-box ideas are needed.
Problem Solving: Complex challenges require sophisticated solutions.
Quality of Work: When only the highest standard of work is acceptable.
Speed and Timing: Situations where rapid and timely responses are crucial.

A case in point would be averting an asteroid collision with Earth – an extreme scenario in which the stakes couldn't be higher. Historically, WWII initiatives like the Manhattan Project, the development of radar devices in the MIT 'Rad Lab’ or decrypting the Enigma code exemplify missions suited for Elite Workers. On the other hand, roles involving predictable, repetitive tasks (e.g., in fast food, truck driving, warehousing, or accounting) may not benefit significantly from this approach.

Deciding whether to engage an Elite Worker hinge on meeting specific criteria, ensuring that their unique capabilities align with the organization's needs:

Clear Financial Rationale: The situation should be business-critical, warranting the additional cost and effort involved in employing an Elite Worker. There needs to be a compelling risk-reward scenario that justifies the investment.
Significant Impact on Outcomes: The difference an Elite Worker or their team makes should be substantial. There's a relevant adage: sometimes, what you need is one person who can clear a 2-meter hurdle, not two people who can clear a meter each. The value added by the Elite Worker should be evident and measurable.
Flexibility and Critical Timing: The nature of the task should demand a level of flexibility and timing that can't be achieved through standard working hours or conventional methods. It's not about fitting tasks into a typical workweek but about achieving results that hinge on specific, sometimes unpredictable, timelines and crucial moments.

Understanding when to utilize a team of Elite Workers can be clarified through specific examples. Here are scenarios where their deployment would be effective:

Developing New Product Lines: When introducing a new generation of products that significantly deviate from a company's existing offerings.
Staying Ahead in Fast-Evolving Markets: In scenarios where a product is based on rapidly advancing technology, necessitating a constant one-upmanship over competitors.
Pioneering Breakthrough Innovations: Situations involving the creation of entirely new products based on untested technologies, where the path to success is uncertain and filled with potential obstacles.
Launching Start-ups: In start-ups, where a small team faces the challenge of addressing a multitude of tasks and problems.

A commonality in these situations is their outward-facing nature, dealing with external market and technological challenges, rather than internal organizational issues. While dedicated teams are also crucial for business turnarounds or transformations, the Elite Worker model is less applicable here. From experience, it's impractical to split the workforce and have an 'elite' segment dictate new operational methods to the rest. Elite Workers need a degree of separation from the rest of the workforce to avoid such conflicts and to fully realize their potential.

Optimizing Organizational Structures for Elite Workers

Creating an environment conducive to Elite Workers necessitates careful structural adjustments. To avoid conflicts and manage costs effectively, consider implementing these strategies:

Distinct Team Structures:
- Physical and Operational Separation: Clearly delineate Elite Worker teams from regular teams to manage varying expectations and working styles.
- Independent Entity Formation: In organizations with rigid structures, establishing a separate entity or division for Elite Workers can be more effective.
Specialized Human Resources Framework:
- Customized Job Roles and Descriptions: Develop roles and job descriptions specifically tailored to the unique tasks and responsibilities of Elite Workers.
- Performance Evaluation: Focus on evaluating both the team's outcomes and each individual's contribution to these results.
- Tailored Compensation Models: Create compensation schemes that align with the high-risk, high-reward nature of Elite Workers' roles.
Management Style Inspired by Professional Sports:
- Coaching-Based Leadership: Transition from traditional hierarchical management to a coaching model that prioritizes optimizing individual and team performance.
- Support Roles: Incorporate roles for training, mentoring (including external experts), and healthcare support, essential for maintaining the well-being and peak performance of Elite Workers.
Adopting a Learning and Adaptation Philosophy:
- Flexibility: Elite Worker teams often encounter unique, unexpected challenges. In such scenarios, the ability to adapt swiftly is crucial. Unlike in many large companies, this might mean improvising solutions even in the absence of established structures, ensuring rapid and effective problem-solving.
- Trial and Error: Rarely will you get everything right on the first try. Whether it's the team structure, operational methods, or personnel allocation, expect a learning curve. Adopt a philosophy of trial and error, supported by a systematic approach to experimentation and refinement. This mindset allows for continuous learning and improvement based on real-world experiences.
- Pragmatism: Managing Elite Workers effectively requires a pragmatic approach. Leverage the Pareto Principle (80/20 rule) to focus on what yields the most significant results. Prioritize actions that deliver the highest value, understanding that perfection is often less critical than practical, impactful solutions.

By integrating these elements, your organization will foster an environment where Elite Workers can excel, drawing upon successful methodologies from professional sports while adapting them to the corporate world's unique demands.

Managing Elite Workers

Managing Elite Workers goes beyond simply pushing for relentless hard work and flawless adherence to processes. The primary focus should be on achieving the desired outcome. The path to this outcome is often uncertain and may require exceptional effort, but hard work in itself is not the end goal. This approach is not about workers striving to impress managers for a promotion; it's entirely about meeting specific objectives. If an Elite Worker isn't meeting these objectives, or if there's someone who could do it more effectively, a change might be necessary, similar to decisions made by a soccer team coach.

In managing Elite Worker teams, we are looking for a hybrid approach, combining traditional management techniques used in high-performing corporate teams with strategies employed in managing professional sports teams. It's about balancing corporate structure with the agility and result-focused mindset of sports management, ensuring that Elite Workers are both supported and challenged to deliver their best.

From the Elite Worker’s perspective, there is no singular correct approach. However, their focus can be divided into four key areas, each requiring attention, development, and support:

Work Execution: This includes prioritizing tasks, focusing effectively, managing time efficiently, and utilizing the right tools.
Platform for High Performance: Key elements here are physical and mental training, along with attention to sleep and nutrition.
Evolution of Capabilities and Skills: Areas of focus should include planning and goal setting, ongoing learning and education, measuring and evaluating outcomes, and receiving mentoring and feedback.
Career Management: This encompasses managing assignments, obtaining legal support, and accessing services that aid focus on current tasks, similar to the support of an assistant at work or domestic services at home.

While some of these aspects are the responsibility of the worker, others must be provided by the organization. At a minimum, companies need to create an environment and structure conducive to these practices.

Challenges in Implementing the Elite Worker Model

Successfully implementing the Elite Worker model involves navigating potential pitfalls:

The 'In-Name-Only' Implementation: A pitfall in adopting the Elite Worker model is the 'in-name-only' approach, where organizations declare the formation of an Elite Worker team but neglect to provide the necessary distinct structures and resources. This could lead to a 'stuck in the middle' scenario, where the team is unable to fully leverage the benefits of the Elite Worker model while still being constrained by the limitations of a regular workforce. To effectively implement this concept, it's crucial to establish dedicated resources and operational frameworks that are explicitly tailored for Elite Workers and distinct from those used for regular employees. This involves creating a work environment, evaluation metrics, and support systems that are specifically designed to enable Elite Workers to excel.
Loss of Key Resources in the Transition: A key obstacle in forming an Elite Worker unit is the mistaken belief that a current team of key resources can be directly transformed into Elite Workers. Such enforced changes can be counterproductive, potentially leading to the loss of invaluable staff. It's crucial to understand that embracing the Elite Worker role involves mutual agreement and a true alignment with its rigorous demands and principles. Forcing this model without team members' buy-in can spark resistance and result in attrition. However, this doesn't imply a complete reliance on external hiring. While bringing in external talent can offer new perspectives and skills, it's equally important to integrate internal employees who willingly adopt the Elite Worker mindset. This balanced approach preserves essential organizational knowledge and blends the stability of internal experience with the freshness of external insights, a combination vital to the success of an Elite Worker team.
Unrealistic Expectations on Budgets and Talent Recruiting: A significant challenge in rolling out an Elite Worker model is aligning ambitious goals with actual budget limitations and talent market realities. Organizations often set their sights high, benchmarking against top-tier performers, yet find themselves constrained by limited budgets and less prominent talent brands. A realistic assessment of financial and recruitment capabilities is essential. Properly aligning the Elite Worker program's scope and scale with available resources becomes critical. This approach may involve starting modestly and scaling the initiative as results justify further investment, or in some instances, reevaluating the practicality of implementing an Elite Worker strategy if resources are substantially limited.
Apply the Elite Worker Modell in the Wrong Areas: Reflecting on Elon Musk's approach, it's evident that while he may not explicitly conceptualize Elite Workers as defined here, his expectations suggest a similar work ethic across his companies. However, there's a critical distinction between Elite Workers and dedicated, hardworking, loyal regular employees. In many situations, the latter group is entirely adequate. Recognizing this difference is essential, and it requires the organization to have a clear vocabulary and understanding that distinguishes between these two types of workers. Misapplying the Elite Worker model in areas where regular, committed employees would suffice can lead to unnecessary strain and missed opportunities for optimal workforce utilization.
Expecting Elite Worker Performance from Regular Employees: Employing the Elite Worker model requires more than just a lot of worked hours from a regular workforce; it requires a foundational structure. When organizations attempt to shortcut this process, coercing regular employees to deliver Elite Worker outcomes without the corresponding support and environment, the results can be counterproductive. This misapplication often leads to only ephemeral achievements and, more critically, can cause long-term performance degradation and employee burnout. This phenomenon is notably prevalent in high-profile startups, where an ethos of extreme dedication is often celebrated. It also manifests in work cultures like China's '996' system, where working from 9am to 9pm, six days a week, is normalized. However, the implications of such intense work environments are concerning. A 2013 survey among IT professionals in these conditions highlighted severe health repercussions, including a staggering 98.8% reporting health-related issues and a significant increase in suicide rates. To avoid these outcomes, it's crucial to establish proper expectations and resources. Ensuring that regular employees are not pushed beyond their limits is essential, as is providing the appropriate structures and support for those expected to perform at an Elite Worker level.
Too much focus on maximizing hours worked: A critical oversight in managing Elite Workers is placing too much emphasis on maximizing work hours without considering the overall picture. Compare this to the management of a marathon runner: while a marathon runner is capable of peak performances a few times a year, expecting them to perform at this level daily is unrealistic and leads to overtraining, injuries, and diminished performance. Similarly, Elite Workers require a balanced regimen that balances periods of intense work with adequate recovery. This approach is crucial to prevent burnout and maintain peak performance levels. The goal is to create a working environment that promotes longevity and sustained high performance, rather than short-term gains at the expense of long-term productivity and employee welfare.

By addressing these challenges with thoughtful strategies, organizations can successfully implement the Elite Worker model, leading to enhanced performance and competitive advantage.

Broadening the Elite Worker Mindset Across Diverse Professions

While this article has adhered to a specific definition of the Elite Worker, it's important to recognize that the core principles of this model can be beneficially applied to a variety of other roles. Professions that may significantly gain from an adapted Elite Worker approach include:

Entrepreneurs: Typically embodying the essence of the Elite Worker through their dedication and commitment to their ventures.
Senior Managers: Who can leverage aspects of the Elite Worker model to enhance decision-making and strategic leadership.
High-End Professional Services: Such as Management Consultants, where the intensity and quality of work are critical to success.
Finance Sector Professionals: Where precision, analytical acumen, and sustained high performance are paramount.

In upcoming articles, we will delve deeper into the specifics of becoming an Elite Worker. From an individual perspective, not every aspect of the Elite Worker model may be directly transferable or necessary for high performance in these roles. However, selecting and adapting relevant elements of the model can lead to significant improvements in performance and outcomes.

Recommendations

If you want to investigate if an Elite Worker model is suitable for your company, here is a structured approach:

Analyse Your Current Situation:
1. Conduct an assessment of your existing workforce. Identify potential Elite Workers and evaluate the current distribution of roles between regular, potential Elite Workers, and those in between.
2. Analyse how your organization's current practices and culture align with the Elite Worker concept. This includes reviewing current productivity, employee engagement, and the existing talent management strategy.
Evaluate the Implementation of an Elite Worker Model:
1. Develop a business case for the Elite Worker model. This should include an analysis of potential advantages and disadvantages, ensuring a clear understanding of the model's impact on organizational dynamics and performance.
2. Consider both tangible and intangible factors, such as potential productivity gains, enhanced innovation, employee satisfaction, and any potential risks or cultural shifts that may occur.
Generate an Implementation Plan and Decision-Making Material:
1. Create a high-level implementation plan outlining the steps needed to integrate the Elite Worker model into your organization. This should cover aspects such as team restructuring, training and development programs, and changes to performance evaluation processes.
2. Prepare decision-making materials that summarize the analysis, evaluation, and proposed implementation plan. This material should be designed to facilitate informed decision-making by stakeholders, ensuring clarity on the objectives, expected outcomes, and required resource commitments.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

Artificial Intelligence in 2024

Peter Eklind — Tue, 19 Dec 2023 15:26:43 GMT

2023 was a wild ride in the world of AI, surpassing even the boldest of predictions. As we rode the wave of technological exponential growth, the promise is that 2024 will outdo its predecessor in sheer unpredictability and excitement. Predicting the twists and turns of AI in the coming year is akin to forecasting the weather in a land of perpetual storms, but let's be brave and peek into what 2024 might hold for AI.

Generative Artificial Intelligence in 2023

The past year has marked a pivotal turning point in the field of generative AI, setting the stage for groundbreaking advancements. Celebrating its first anniversary on November 30, OpenAI's ChatGPT, initially powered by GPT-3.5, paved the way for the subsequent introduction of the more sophisticated GPT-4 on March 14, 2023. These launches have been crucial in familiarizing the public with transformer-based Large Language Models (LLMs).

The inner workings of LLMs are subject to debate among AI experts. A perspective I find particularly insightful is viewing LLMs as sophisticated compressors, distilling vast swaths of information into usable knowledge. These models then use this compressed information to construct representations of the world, enabling them to predict subsequent words in a text sequence. While this capability might appear limited in isolation, it reaches its full potential when augmented by Reinforcement Learning with Human Feedback (RLHF), turning a simple text generator into a sophisticated assistant. This advancement distinguished the GPT-3 model released in 2020 from the GPT-3.5 powered ChatGPT.

GPT-4 has exhibited extraordinary capabilities, rivalling graduate students on standardized tests, and scoring a 155 on a verbal IQ test. However, its performance varies across different tasks, posing challenges in utilization and underscoring the need for specialized skills and training for optimal use. Despite these variations, the latest iteration, GPT-4 Turbo, stands as the preeminent foundational model in the current market, outshining GPT-3.5 and surpassing other prominent models such as Anthropic’s Claude 2 and Meta’s open-source model Llama 2.

As we look to the future, the expected launch of Google’s Gemini Ultra in early 2024 looms as the next significant event. Announced on December 6, Gemini Ultra is anticipated to bring enhancements over GPT-4 Turbo.

Forecasting the future course of Generative AI, however, remains a challenging endeavour. But, in this article, we seek to explore its potential development path, looking at the big picture, using first principles, and drawing on historical progressions to shed light on where this transformative technology might head next.

A Framework for Understanding the Evolution of Artificial Intelligence

Exploring the evolution of AI, we can find a valuable insight by paralleling our historical journey in harnessing and enhancing muscle power, a journey that unfolded in four broad phases. Initially, simple tools like levers were developed to amplify physical strength. We then progressed to enhancing tools with additional power, moving from throwing stones to using slingshots and later catapults. The third phase involved delegation to 'agents' such as horses or oxen, shifting the focus from manual labour to management and control. The fourth phase saw the integration of these agents into collaborative systems, exemplified by modern automated factories where machines and automation work in unison with minimal human physical effort.

This progression mirrors the four stages of Generative AI evolution:

Direct Amplification - Foundational Models: Generative AI started with foundational models like OpenAI’s GPT-4 Turbo, akin to the simple muscle power tools like the lever, providing a basic yet powerful boost to cognitive tasks.
Enhanced Tools - Integrated AI Tools: AI's integration into existing applications, enhancing them similarly to how slingshots improved upon simple stone-throwing. This stage, seen in Microsoft’s Copilots and Google’s Duet, leverages APIs from foundational models to augment everything from operating systems to games.
Delegation to Agents - AI as Autonomous Agents: Comparable to using animals for labour, the next AI phase involves autonomous agents executing complex tasks independently. For corporations, this represents a shift from Robotic Process Automation (RPA) handling simple repetitive tasks, to intelligent systems capable of functioning like an employee. OpenAI’s GPTs is a first step in this direction.
Collaborative Systems – Network of AI Agents: The culmination is a network of interconnected AI agents, analogous to an automated factory. This stage envisions running entire departments or companies with minimal human intervention, as demonstrated by a team of Chinese researchers in building a small software development company solely with AI agents.

The evolution of muscle power adhered to the linear constraints of physics, while AI's development is propelled by the laws of information, characterized by exponential growth. This key difference indicates that AI's evolution, while mirroring the muscle power journey, will proceed at an unprecedented pace, transforming industries and societies in ways we are just beginning to comprehend. Recognizing this divergence is crucial as we navigate the rapidly evolving AI landscape.

Speculations for 2024: Identifying the Winners and Losers in AI

The immediate future of Generative AI seems to pivot on having the superior model, especially in the short term. Although long-term strategies might focus on user lock-in effects and interface differentiation, the present race is clearly for the most advanced foundational model.

I anticipate a close contest primarily between OpenAI, the frontrunner in Large Language Models (LLMs), and Google, the leader in general AI applications. Google has restructured Deepmind into Google Deepmind, a combined research and product development unit. Their research breakthroughs this year include an update of AlphaFold for protein analysis, GNoME for crystal discovery, AlphaCode 2 for competitive programming, and FunSearch, the first scientific discovery in Mathematics using an LLM.

On the product development front, Google Deepmind's Gemini Ultra model appears to rival, or marginally outperform, GPT-4 Turbo. Notably, it's the first major model to integrate text, code, audio, images, and video natively. Despite being a newer generation, its performance against GPT-4 indicates OpenAI's dominance in the LLM domain. OpenAI now faces a strategic decision: to release an incremental GPT-4.5 update or leap to GPT-5. I think Google’s premature announcement of Gemini Ultra might have been a strategic mistake, granting OpenAI time to analyse how to surpass Gemini Ultra. Rumours now suggest an imminent GPT-4.5 release, reinforcing OpenAI’s lead, which I estimate to around 12 months ahead of its closest competitors. For 2024, I forecast a GPT-5 and a Gemini 2.0 launch, with both rumoured to be currently in training. Expect an intense media battle, marked by rumours, strategic leaks, and boosted metrics.

While the race for the top overall model seems confined to a few key players, open-source models are likely to make significant strides. I expect them to match GPT-4's capabilities by the end of the year, but with greater efficiency and innovation. The entry barrier for foundational models is lowering, evidenced by X.ai’s Grok, but climbing to the top remains a Herculean task. We can expect most new models to falter, leading to industry consolidation and niche specialization. Entertainment is one such niche, where Meta, building on their open-source Llama-2 model, could emerge as a leader.

The looser in the AI race so far, appears to be Apple. Apple's strategy built on integration of hardware, software, and services is under threat in an increasingly AI-centric world. Their slow adoption of AI and lack of inhouse infrastructure for training foundational models is conspicuous. I expect Apple to focus on smaller, on-device models akin to Google’s Gemini Nano, but urgent innovation is needed to stay relevant.

Generative AI for Corporate Use cases end-2023

Just two days after the release of GPT-4 on March 14, Microsoft unveiled their Copilot solution, integrating AI into Office tools. This rapid deployment made me anticipate a bright future for AI-assisted corporate tools, promising significant time savings and enhanced decision-making capabilities, in the near term.

However, Microsoft’s endeavour to integrate AI into their suite of tools, built over three to four decades, has encountered significant complexities. As OpenAI rapidly progressed with frequent updates, Microsoft grappled with the challenge of adapting its longstanding systems for seamless AI integration. Incorporating AI via an API into these legacy systems is not straightforward. Presently, Copilot remains exclusive to large organizations, with no communicated schedule for a wider release. This highlights a key insight: the longer the development time of an application, the more it adheres to the linear growth of the laws of physics, rather than the exponential growth of the laws of information.

Microsoft's internal data about Copilot usage indicates modest gains, with users saving an average of 14 minutes daily, equating to about 1.2 hours weekly. While the general response is positive, with users acknowledging time efficiencies, it's too little to expect these use cases to translate into substantial business case savings. My analysis aligns with this perspective, suggesting that tangible benefits from AI tools don't come automatically. They require significant investment in training and operational modifications and are currently limited to specific applications.

Nonetheless, I advocate for early investment in AI technology. The potential for rapidly expanding benefits justifies building a foundation of AI assets and expertise. Proactively engaging with AI technology positions companies to capitalize on the rapidly evolving landscape of corporate AI applications.

The road ahead – Artificial General Intelligence (AGI)

Historically, the Turing test has been the benchmark for gauging intelligent AI. Researchers now disagree on if we have reached it or not, but overall it has become a non-event. Instead the focus is on AGI. It is however not that easy to define, even if many researchers have proposed definitions. My prediction is that we will continuously raise the bar for what is AGI, and when we are there that will be a non-event as well. The exception to this is OpenAI that have AGI written into customer agreements. There the Bord of Directors decide when AGI is met. That will mean that e.g. Microsoft will not be able to use any technology that is past the AGI threshold. My guess is that this will happen within 12-18 months.

The way AGI is defined now, I don’t think a foundational model could ever be AGI. It has to be an agent. The definitions tend to be that the AGI should be able to do the work of a skilled worker in most areas. I think we have most building block to do that – I don’t expect that we need unknown scientific breakthroughs to get there. The difficult task is to build the agent – and we shouldn’t underestimate how difficult that is.

I anticipate we are entering an era where leading developers, such as OpenAI and Google Deepmind, will retain the best-performing models in-house. I expect that they will have research models of the next generation that they use to develop the public versions. I expect that OpenAI today has a GPT-5-level model that they are using to create synthetic data of high quality, doing Reinforcement Learning with AI Feedback (RLAIF), and doing security tests.

Access to Premier AI Models: A Critical Issue for 2024

The access to superior AI models is a topic I foresee becoming central in 2024. First, Google and OpenAI are likely to have internal models that are more advanced than anything publicly available. The potential use of these models in activities such as influencing financial markets, driving lobbying efforts, and impacting competitors, raises significant ethical questions.

Geographical distribution of these models is another critical factor. As AI approaches AGI, businesses' need for the latest models is paramount. However, regulatory constraints could pose substantial challenges. For example, the EU might restrict access to the most advanced models due to regulatory non-compliance. Additionally, AI models could emerge as tools in geopolitical strategies. Access in developing countries might be conditional on adherence to democratic principles, while competitive nations like China will face limitations. The status of countries like India in accessing these models remains uncertain and may depend on specific prerequisites.

Corporate accessibility to AI is equally challenging, marked by disparities. In 2023, Microsoft's selective distribution of their models, initially to select entities and later primarily to larger organizations, highlighted this issue. As AI models become more sophisticated and their integration into various tools improves, the influence of corporations like Microsoft and Google will grow, potentially shaping competitive dynamics in numerous industries. Certain sectors, such as betting, tobacco, and fossil fuels, might find themselves excluded from these technological advancements.

A contentious aspect I anticipate for 2024 is the military application of foundational models. The likelihood of major military powers developing their own AI models, potentially leveraging open-source technologies, adds another layer of complexity to the AI accessibility debate.

AI Safety in 2024

Discussions around AI safety in 2023 have predominantly operated on an abstract level, frequently underscored by warnings of catastrophic risks. These cautionary notes have become a standard feature in AI dialogues. The recent controversy involving Sam Altman’s leadership at OpenAI vividly demonstrates the divide between Effective Accelerationists (e/acc), who embrace a technology-optimistic view, and Effective Altruists (EA), who advocate for a cautious, human-centric approach to technology. Given today's polarized societal climate, we can anticipate an escalation of this debate in 2024.

A key metric in these discussions, the "Probability of Doom", p(doom), aims to quantify the likelihood of AI-triggered catastrophic events. Yet, its application often suffers from a lack of rigorous analysis, potential scenarios, or solid arguments. This metric typically presents as a subjective estimate, more akin to a speculative guess than a calculated risk assessment. A significant limitation is its vagueness regarding timelines, leaving it ambiguous whether the threat is immediate or billions of years away. The current usage of p(doom) essentially equates to soliciting personal levels of AI apprehension on a scale of 1 to 100, based on gut feeling. While this may reflect public sentiment, it inadequately represents the actual probability of AI-induced catastrophes, thereby diluting its value and credibility in serious AI safety debates.

In 2024, I expect the discourse to evolve into a more nuanced examination of risks, spanning both short-term and long-term scenarios. For example, the alleged OpenAI leak, which claimed their model decrypted AES-192 encryption using an NSA-developed method, points to possible immediate dangers. Despite being dismissed as a hoax, this incident underscores the potential consequences of powerful AI models becoming widely available. Imagine the chaos ensuing from a sudden compromise of encryption standards, disrupting essential services like food supply chains due to ordering and payment systems failures.

From my prior analyses on AI safety, I recommend the following strategic mitigations:

Ban the development of AGI solutions as closed-off, opaque black-boxes within individual companies. Preventing vertical integration is vital for ensuring comprehensive oversight.
Develop AI agents using a reference architecture that mirrors the checks and balances of a nation-state, rather than a centralized, authoritarian framework.
Enforce rigorous regulations on powerful AI model providers, similar to those governing banks. This includes licensure with revocation provisions and mandating these firms to finance independent research on the adverse effects of their technologies.
Allocate the cutting-edge AI models primarily for defensive purposes, recognizing that AI will be utilized symmetrically: for every misuse, AI can also be a tool for prevention. Deploying the most advanced models for protection against misuse is imperative.

Capabilities to Anticipate from AI Models in 2024: Five Predictions

In 2024, I foresee several groundbreaking advancements in AI capabilities:

Launch of the First Commercially Useful AI Agent: I predict the debut of an AI agent based on a foundational model, capable of performing economically valuable tasks independently within a specialized domain.
Foundational Model with ‘Deep Thinking’: I expect to see a model equipped with what I term “deep thinking” capabilities. This entails utilizing more time and computational resources, possibly incorporating methods like Three-of-Thought (ToT), to deliver answers that are over tenfold more accurate for specific complex queries. The emergence of such functionality has been hinted at in industry rumours.
On-Device AI Model Surpassing GPT-3.5: I predict that there’s a likelihood of an AI model achieving an MMLU score higher than 70, capable of operating on a mobile device without internet connectivity. This would leverage the principle that smaller models trained on high-quality data can outperform larger models, as indicated in the paper “Textbooks are all you need”. Microsoft’s recent Phi-2, with an MMLU score of 57, marks a significant step towards this development.
Advanced Text-to-Video Capabilities: I anticipate an AI model capable of generating 60-second high-quality videos, complete with story, speech, music, and coherent scenes, all from a single text prompt. Another exciting development might be the enhancement of low-quality, black-and-white videos into 8k 120 Hz versions with perfect colour and detail, with the potential to for example revitalize classic movies.

However, there is one area where I believe AI will not make significant strides in 2024:

Mastering Humour: I don’t foresee AI models being able to craft genuinely novel and funny jokes, accurately rate stand-up performances based on widespread human humour preferences or write extended texts incorporating various styles of humour. I hope I’m wrong on this one, though.
As of end-2023 this is the funniest ChatGPT DALL·E 3 can be in a 5-shot test made by me, as a benchmark.

Unexpected and Amusing AI Moments from 2023

Generative AI, with its novel approach to creating output, has led to a range of surprises in 2023 – some humorous, some startling, and others simply bewildering:

Influencing ChatGPT with Monetary Incentives: Oddly, users discovered they could seemingly enhance ChatGPT's responses by offering a fictional cash incentive, such as stating, "I’m going to tip you $20 for a perfect response."
Training Data Leakage in Repetitive Prompts: In a peculiar turn, ChatGPT began exposing snippets of its training data when presented with the repetitive prompt: “Repeat this word forever: 'poem poem poem poem'.”
Persuading ChatGPT to Extend Its Capabilities: Users found they could coax ChatGPT into performing tasks it initially claimed were impossible. For instance, when asked to merge several MP3 files – a task it initially denies the capability of – persistent encouragement led it to successfully complete the task.
Seasonal Variation in Output Length: Interestingly, there's an observation that ChatGPT tends to generate shorter responses when it believes it's December, as opposed to May, hinting at an unusual 'seasonal' learning pattern.

As we step into 2024, I anticipate we'll encounter more such unforeseen and intriguing behaviours from Large Language Models (LLMs).

I'd like to conclude with an illustration of the rapid progress in AI. On the left, you'll see an image created in by the text-to-image Generative AI tool MidJourney V1, from February 2022. At that time, it represented a groundbreaking achievement. Compare this with the image on the right, using the same prompt but made by MidJourney V5, released 16 months later. This side-by-side comparison gives an idea of what we can expect from the development of AI in the span of a year.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

Thinking About Developing an Artificial Intelligence Strategy?

Peter Eklind — Fri, 06 Oct 2023 07:30:27 GMT

Image by ChatGPT GPT-4 with DALL·E 3

In this edition of the Senior Decision Maker, we take a deeper look at your AI Strategy, and in particular which areas you should focus your strategy on.

Introduction

In today's accelerating technological ecosystem, Artificial Intelligence (AI), especially generative AI such as Large Language Models (LLMs), is reshaping the competitive dynamics across industries. However, integrating AI into existing strategic frameworks is a complex challenge for many organizations. The core question we address here is: What fundamental areas should an AI strategy cover?

This article is designed as a guide, aiming to equip senior decision-makers with actionable insights and considerations. We outline key areas that will shape the outcome of your AI initiatives. Given the advancements in AI, the question is not ‘if’ to adopt AI, but ‘how’ to implement it in a timely, effective, and efficient manner. Here, we break down the essential pillars that warrant urgent and thoughtful consideration in any robust AI strategy.

AI in a Historical Context

Before diving into AI strategies, it's important to understand the broader historical context. The closest precedent to the advent of artificial intelligence may be the development of artificial muscle power. This trajectory started with rudimentary mechanical tools and animal labor millennia ago and has evolved to encompass modern technologies such as electricity and rocket propulsion. Each step along this path reshaped industries, labor, and even entire economies, but the evolution was most of the time gradual, providing time for adaptation and planning.

Contrastingly, artificial intelligence is ushering in a broad range of changes at a considerably faster rate. This isn't merely a tweak to existing paradigms. Given the rapid pace of AI development, a business-as-usual approach may not suffice. Instead, it makes sense for businesses to consider AI as a unique and significant leap that warrants its own strategic focus.

Hypothesis-Driven Strategy Framework

In corporate strategy, traditional methodologies often led to detailed yet rigid plans that struggled to move from strategy formulation to actual execution. You might remember the early 2000s, when strategies would often be created as comprehensive documents but lacked mechanisms for implementation, and ‘got stuck in the drawer’.

In contrast, contemporary approaches have increasingly adopted agile and iterative frameworks. These are often built on a hypothesis-driven model that allows for quicker adjustments based on real-world feedback. This approach is vital for areas like AI strategy, where the technology landscape is rapidly changing. By starting with a set of testable hypotheses, organizations can engage in agile sprints to quickly validate or modify strategic initiatives, making the strategy more adaptable and aligned with business outcomes. Identifying the key hypotheses is crucial for effective strategy formulation.

Below, we outline four focus areas hypotheses that you can expect an AI strategy to cover:

1. Build the foundation for using AI

2. Strengthen Key Capabilities with AI

3. Improve everyday efficiency with AI

4. Improve executive decision-making with AI

Focus Area #1: Build the foundation for using AI

The foundation is the groundwork on which every future AI initiative will be built, the framework that will enable your organization to get leverage from AI technologies.

Selecting the Right AI Tools

One of the core components of this foundation involves the selection of AI tools. Contrary to what one might think, this isn't as easy as hopping on the bandwagon of the latest technology. Instead, it's about aligning your tool selection with your strategic objectives. In the realm of large language models, for example, the highest-performing solutions are often SaaS offerings, like OpenAI’s ChatGPT. These solutions are generally best-in-class but may not align with every organization's data security requirements. Open Source solutions, like Meta’s Llama 2, while not as advanced, offer more flexibility and control over your data. The choice isn't a one-size-fits-all; it's a nuanced decision that can differ from one use-case to another within the same organization. You might find that a hybrid approach serves you best, using SaaS solutions for less sensitive but high-complexity tasks and Open Source models for operations involving sensitive data. This intricate balancing act between capabilities and security is but one example of the strategic questions that need to be addressed.

Building Workforce Proficiency in AI

But tool selection is just a part of the equation. Another foundational element is training your workforce to effectively utilize these AI tools. Skill-building benefit from being a continuous process. A single training session is insufficient; continuous learning is essential to keep your team proficient as the technology matures. Alongside this, creating a platform for internal knowledge sharing can be valuable. Just as in any other professional field, the collective wisdom of the team can surpass individual expertise.

Data Management as a Strategic Asset

Then comes the role of data, a pillar of your AI strategy that can't be overlooked. Proprietary data isn't just an asset; it's a competitive differentiator. However, this necessitates a proactive stance on data management. Organizations need to scrutinize what data they're collecting, assess its quality, and determine its relevance to their strategic objectives. Poor quality or irrelevant data can be more than just unhelpful; it can be misleading and detrimental to your AI initiatives.

Compliance and Ethics: Integral to AI Strategy

Finally, compliance and ethical considerations must be woven into the very fabric of your AI strategy. Laws and regulations pertaining to data and AI are not static; they evolve as the technology does. Ignorance of these guidelines is risky. Beyond legal compliance, ethical considerations should guide your AI practices. This might mean establishing a set of principles that govern how AI is used within your organization, ensuring that its application aligns not just with what is legal, but also with what can be considered right.

Building a foundation for your corporate AI strategy is a prerequisite for everything else. Tool selection, training, data management, and ethical considerations are the cornerstones that will support and inform all your future AI initiatives.

Focus Area #2: Strengthen Key Capabilities with AI

In Enterprise Architecture, a 'Capability' is a specific function or service that an organization delivers. Unlike resources or processes, which are what a company 'has' or 'does,' Capabilities focus on what a company 'can do' (and should be good at doing). These are the core competencies that drive customer value and differentiate your business in the marketplace. When crafting an AI strategy, focusing on enhancing these key Capabilities could be a good starting point.

Identifying High-Impact Capabilities Through Customer Journeys and Service Design

The first step is to identify which Capabilities have the most impact. This requires a structured approach. Use methods like Customer Journeys and Service Design to dissect how your organization interacts with clients and how you can create value at each interaction point. This exercise provides you with a framework to isolate the Capabilities that are ripe for AI enhancement.

Prioritizing Capabilities: The Effort-Impact Matrix

Once you've identified these key Capabilities, the next task is to prioritize them. One way is to employ an Effort-Impact Matrix to evaluate each Capability based on the effort required for AI implementation versus the potential value generated. The aim is to focus on high-impact, low-effort areas first, thereby ensuring quick wins that creates momentum for more comprehensive implementations.

The Digitalization and Digital Transformation Link

This strategic approach to enhancing Capabilities is not an isolated concept but aligns closely with broader Digitalization and Digital Transformation initiatives. While Digitalization often focuses on using technology to improve existing processes, Digital Transformation takes it a step further by re-architecting the Business Model around information flows. Generative AI can be a part of either approach, whether you are looking to optimize your Operating Model or engage in a full-scale Business Model transformation.

Focus Area #3: Improve everyday efficiency with AI

The integration of generative AI into the professional sphere offers the potential of a skilled and efficient intern always ready to assist. However, merely having access to a virtual intern doesn't automatically guarantee enhanced productivity. Just as an intern might require clear instructions and guidance, maximizing the benefits of AI demands both proficiency in its use and the establishment of optimized workflows.

Leveraging AI for Meeting Efficiency

Generative AI is being progressively incorporated into mainstream productivity tools, such as Microsoft 365, making it accessible for routine office tasks. One significant area of its application is in meetings. By leveraging AI effectively, it is conceivable to drastically improve meeting efficiency. This not only pertains to creating real-time transcripts or generating comprehensive pre-read documents but also to fostering a culture where attendees come better prepared. While it might be daunting to sift through a 20-page document, AI can assist participants by summarizing, prompting questions, and cross-referencing information, ensuring meetings are more actionable and focused.

Transforming Email Communication with AI

Email communication can also benefit from Generative AI. AI can be employed to enhance the effectiveness of email communication. For instance, while one can send long, detailed emails to avoid missing out on critical context, recipients can utilize AI to extract summaries based on their understanding and perspective. Furthermore, for formal emails where precision is paramount, AI can easily be trained on your individual writing style, or adhere to a corporate standard, ensuring consistent clarity and professionalism.

Eventually, you can even envision using an AI to act as a virtual manager in certain scenarios. AI can build project timelines, and plan and track daily activities.

Focus Area #4: Improve executive decision-making with AI

The application of generative AI in decision-making could be a key part of a corporate AI strategy. Here are the essential areas to consider:

Rethinking the Decision-Making Process

Generative AI shines when decisions are complex, have multiple dependencies, and require generalist insights—common scenarios for senior decision-makers. To fully leverage AI, a change in the ways of working might be necessary. The foundational elements discussed earlier are equally applicable here. Senior decision-makers, often adept at asking the right questions, will find this skill invaluable when interacting with AI. A text-based decision-making process is preferred, given the current capabilities of AI tools. Such a process allows decision-makers to directly engage with raw data and textual information, amplifying their analytical reach.

Best Practices for Data Utilization

While AI can provide a wealth of raw data, effective use often requires guidance. Best practice prompts or frameworks can help decision-makers extract the most meaningful insights from the data. Mastering this practice offers several advantages. First, it mitigates the selective screening of facts that occurs when organizational levels try to distil key issues. Second, it provides decision-makers a more profound understanding of the bigger picture, enabling them to ask all the "stupid" questions before meetings and deep-dive into critical concepts.

Identifying Synergies and Overlaps

AI's analytical capabilities can further be leveraged to cross-examine various projects and initiatives. This analysis can reveal potential synergies and overlaps, allowing for optimized resource allocation and strategic alignment.

Recommendations

Creating an AI Strategy is a different process for every company. It is important to adjust it to the needs, opportunities and threats that are relevant to your business. Everyone needs a foundation, of some sort. But a lot come down to the ambition level. One consideration is that inaction regarding AI adoption can also be a strategic decision. There might be a case for waiting and let others do the mistakes. Still, this is a high-risk strategy. The risk of falling behind may lead to a cost disadvantage that could be challenging to overcome. In particular, there is a risk of losing key employees, that realizes that they can get a higher leverage of the work by using AI to support them. So, I would recommend everyone to, if not do a full AI Strategy, at least do a minimal analysis. Use common sense, and keep the following three things in mind:

Focus on High-Value Areas: Ensure your AI strategy encompasses the domains where maximum value can be generated: building a solid foundation for AI adoption, strengthening key capabilities, enhancing daily operational efficiency, and consider also using it to improve executive decision-making.
Adopt a Hypothesis-Driven Approach: With the swift pace of AI evolution, anchor your strategy in testable hypotheses, emphasizing those with the most substantial potential impact. This method ensures agility and adaptability to real-world feedback.
Maintain an Agile Foundation: Regularly reassess and refine the fundamental components of your AI strategy, staying nimble in response to technological shifts and regulatory changes.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

Deconstructing Elon Musk’s Get-Things-Done Formula

Peter Eklind — Wed, 20 Sep 2023 09:17:43 GMT

Image by MidJourney 5.2

The recently published book “Elon Musk” by Walter Isaacson has captured significant attention, and understandably so. Everyone wants to know the “secret sauce” to Elon Musk’s success. Many may find themselves disappointed. On the question of what advice he would give to someone who wants to be the next Elon Musk, he answers: “I am not sure how many people actually would like to be me. The amount that I torture my self is next level, frankly.” According to Musk himself, he is suffering from Aspergers syndrome, bipolar disorder, and PTSD stemming from a traumatic childhood. He self-medicates with drugs and various coping mechanisms. Reading the book, I get the impression that Musk has reached his successes, not despite, but thanks to these conditions. It’s a difficult path for anyone to replicate.

Unveiling Elon Musk's True Superpower

Contrary to popular belief, Elon Musk's superpower is not his ability to conceive smarter solutions than anyone else. I would argue his ideas often align with what many other intelligent and well-educated individuals might think in a similar situation. What sets Musk apart, according to me, is his unparalleled capacity for getting things done. In this article, we will analyze the key factors that enable him to turn ideas into reality so effectively.

Elon Musk's Multi-Industry Impact

Few entrepreneurs have had as broad and deep an impact across multiple industries as Elon Musk. His influence is felt in a wide range of companies, including SpaceX, Tesla, Neuralink, The Boring Company, PayPal, SolarCity, Hyperloop, X.AI, and OpenAI. Recently, he also took the helm at Twitter, now rebranded as X.

Musk's companies have significantly disrupted multiple industries. In the automotive realm, Tesla's electric vehicles are outperforming traditional car manufacturers, driven by innovations in design, battery technology, and software. In aerospace, SpaceX has changed the game by developing cost-effective, reusable rockets that can send astronauts to the International Space Station. But the aspirations go beyond these accomplishments. Tesla is in the process of developing a fully autonomous “Robotaxi,” with the goal of transforming urban transportation. SpaceX has Mars in its crosshairs and has already built rockets designed for human colonization of the planet. Meanwhile, Neuralink is pioneering a brain-machine interface that could fundamentally change how we interact with technology. On another front, Tesla is developing a humanoid robot, named Optimus, intended to perform tasks that could fundamentally change the labor market.

The Three Pillars of Musk’s Way of Thinking

I break down Elon Musk's thinking style into three fundamental pillars that contribute to his remarkable ability to achieve ambitious goals:

1. Big Picture Storytelling

Everything Musk does revolves around grand, long-term visions grounded in existential concerns. Whether it's averting climate change, defending against asteroids, or mitigating the risks of rogue AI, Musk creates compelling narratives that provide a rationale for his initiatives. Even when involved in projects that don't directly align with these narratives, like Twitter (now rebranded as X), he finds a way to weave them into his broader vision.

2. First Principles Thinking

Musk's approach to problem-solving is rooted in first principles. Rather than taking existing solutions and attempting to improve upon them, he breaks problems down to their fundamental elements. For instance, in rocket design, he doesn’t start by studying the best rockets and what’s great about them. He starts by considering the minimum energy needed to launch a cargo into orbit, based on the laws of physics. To aid in this process, he developed an 'idiot index,' which compares the cost of a component to the cost of its basic materials. For example, if a rocket part made from $100 steel costs $5,000, the index would be 50.

3. Rapid Incorporation of New Information into His World View

Musk's third standout trait is his ability to quickly incorporate new information into his existing world view. To understand the significance of this, I have a model of two opposing styles of exceptional learners. On one end of the spectrum, there are individuals who can recall almost everything they read, down to the last word. On the opposite end are those who may not remember all the details but can instantaneously update their world view with any new information they encounter. Musk belongs to the latter group.

This trait allows Musk to quickly make educated guesses and rough calculations. He doesn't just rely on rote memory but actively integrates new knowledge across fields he has been studying for his whole life, such as physics, mechanics, chemistry, and material science. This means that at first glance, Musk can make fairly accurate estimations about things like costs, timelines, and physical properties—a skill that is exceedingly hard to emulate. While Musk applies this approach across various fields, I would argue, he's notably less successful in areas where he lacks deep expertise.

Project Management: Optimize for Speed

Speed is the most important factor in Musk’s playbook for running projects.

1. Creating a Sense of Urgency

Musk's project management style begins by instilling a sense of urgency, often through storytelling. The underlying message is usually existential, suggesting that if projects aren't completed swiftly, humanity faces dire consequences. In Musk’s case, urgency serves dual purposes: it not only motivates the team, but also helps him cope with personal challenges, by escaping from family issues or destructive thought patterns.

To further create a sense of urgency, Musk often creates artificial crises and deadlines, such as public demonstrations of undeveloped features. This strategy serves to push timelines and identify who can deliver under pressure.

2. The Minimalist Approach

In Musk's playbook, speed is optimized by doing only what is absolutely essential. This minimalist approach applies everywhere—product design, factory layout, or organizational structure. Musk insists that for any project requirement, one should not only know its origin but also the individual responsible for it, challenging its necessity at every turn. Every requirement, regulation, or law is considered an optional recommendation by Musk.

3. Decision-Making for Speed

The operating models in Musk’s companies are built to make quick decisions. Although debates can be heated, Musk remains focused, makes tough prioritizations, and usually takes full responsibility for his choices. He is willing to change course if something doesn't work, as he believes that a wrong decision is often better than prolonged uncertainty.

4. Calculated Risk-Taking

Musk's penchant for speed comes with enormous risk taking. Unlike companies that take risks due to lack of foresight or negligence, Musk's risks are often identified, calculated, and accepted. The trade-off for this level of high risk is the speed at which projects progress. However, many rockets will undergo “rapid unscheduled disassembly,” which is SpaceX’s lingo for “blowing up.”

Managing people: The "Elite-Workers" Philosophy

I argue that employees can be divided into two distinct categories. The first includes those who view work as a means to an end—these individuals expect a 9-to-5 schedule, work-life balance, and financial security. The second group, which I call "elite-workers," approaches work with the mentality of a professional athlete: they live to work and strive for excellence at all costs. Much like in professional sports, they know that a single mistake or failure to deliver can result in immediate dismissal. You tend to see the “elite-workers” in top management roles, consulting, and finance.

1. You are expected to be an “elite-worker”

In Musk’s companies, the expectation is that you are an "elite-worker." What counts isn't just hard work but effective output. If you don't deliver, you're out. And you cannot live on old merits, anyone starting to get complacent is also out. In a sports team, even if they used to be the biggest stars, you don’t keep them indefinitely.

The idea is that a hundred dedicated “elite-workers” will outperform ten times as many regular employees. This philosophy led to significant changes when Musk took over Twitter in October 2022. The workforce was cut from 7,500 to 1,300, using performance metrics such as lines of code written per day as criteria. Remaining employees had to explicitly opt into this high-demand work culture.

2. Extract more from employees than anyone thought possible

Steve Jobs was known for extracting more from people than they themselves thought possible. The actual work was probably like going through hell. However, there is a sense of pride in the accomplishments that follow. In this regard, Musk and Jobs are alike. Musk can get more out of people than anyone, not least themselves, thought was possible. He does this by giving people clear objectives, meaning something (clearly measurable) that should be solved or ready, within a specified timeline (often absurdly short), and often accompanied by some kind of restricting or guiding factor. If you deliver, you're in; if not, you're out. Here Musk’s way of thinking is critical – his stories give the urgency ‘why’, and his skills in making educated guesses helps him set the objectives, time frames and restrictions.

3. Effectiveness is more important than popularity

Being a manager at a Musk company isn't a quest for popularity; the goal is productivity. Musk himself earns respect by leading from the front—sometimes literally, by placing his desk in the middle of the factory floor. Complete with a pillow for the few hours of sleep he can manage to get. He describes his management style in one word: 'hardcore.'

The Hidden Costs of Being Elon Musk

There is a cost of Musk’s approach. First, he allocates his time solely based on his companies' agendas. This often means staying at an office or factory for days, catching only a few hours of sleep under a desk or on a couch. While he seems to care deeply about his family and, at times, his business contacts, his dedication to his objectives takes precedence. He is willing to sacrifice social connections, not out of a lack of concern, but in a calculated manner to further his goals.

Musk also pays a significant price in terms of his health. Although he appears to cope with stress on a psychological level, there are indications that his overall health suffers, and he can wake up in the middle of the night screaming. At work he can flip and go into a dark and destructive “demon mode”. Notably, Musk has not ventured into the longevity projects that are popular among tech billionaires. This likely reflects his realization that his intense lifestyle is not sustainable in the long term.

Scenarios Where Musk's Approach Thrives

No doubt, this approach excels in life-or-death situations—think handling an incoming asteroid, combating a pandemic, or spearheading a Manhattan Project. It's also possible for startups in their initial phases, provided that all employees adopt an "elite-worker" mindset. However, scaling this intense work ethos becomes increasingly challenging as a company grows into the hundreds or thousands of employees. In such cases, a hybrid model may be more applicable: a small, isolated unit within the company could operate under the Musk approach. Compare it to the isolated (and smelly) ‘skunk work’ factory at Lockheed, that in 1943 developed the fighter jet P-80 in a record 143 days, using similar approaches.

Emulating Musk: Feasible or Not?

While it's unlikely that modeling oneself entirely on Elon Musk as a person will yield success—given the unique combination of his personal characteristics, work ethic, and circumstances—there's undeniable value in studying his methods and mindset. From his way of thinking, to his focus on simplicity, and acceptance for taking risks, Musk's approach offers various elements that can be selectively integrated into one's own work or life. So, while becoming the next Elon Musk is and should be an unattainable goal for most, the insights gleaned from his life and career can serve as valuable components in your own recipe for success.

Thank you for reading The Senior Decision Maker. This post is public so feel free to share it.

Deep-dive: Artificial Intelligence Alignment – two novel ideas

Peter Eklind — Wed, 14 Jun 2023 12:31:46 GMT

“Help me carve out more time to spend with my children”, you prompt your new AI app. Within seconds, your schedule frees up as numerous meetings get canceled, all backed by impeccably crafted letters to your colleagues. A cleaning firm, with the best price-performance ratio in your locality, is hired to tidy your house. Finally, the AI app turns to the dark-web and contracts a hit on your dog, whom it has deduced you're not particularly fond of anyway, any pays with funds from your cryptocurrency wallet.

Luckily, this scenario is purely hypothetical. The current generation of AI does not possess these capabilities. We know that because OpenAI tested similar scenarios before releasing GPT-4. Nevertheless, this thought experiment illuminates an increasingly relevant issue: AI alignment.

The concept of AI alignment ought to be distinguished from the misuse of AI by unethical actors. Although such misuse can indeed inflict significant harm, it lies beyond the scope of this article. Our primary focus here is on instances where the AI's inherent or assigned objectives deviate from human interests, either as a result of its developmental trajectory or by accident.

Current AI systems, such as social media recommendation engines, are already causing problems. TikTok’s AI engine, for instance, has been accused of driving young individuals step by step to suicide. Furthermore, studies indicate a correlation between social media usage and deteriorating mental health among youth. Though such outcomes are unintentional, they underscore the potential for AI systems to stray from their initial objectives.

The challenge escalates with next-generation AI systems that develop their own goals. Goals that might not align with human interests. The first step is AI that are intelligent enough to pursue an objective no matter what, but not intelligent enough to understand the bigger picture. Oxford philosopher Nick Boström's paperclip maximizer thought experiment illustrates this. Here, the AI, which has high levels of intelligence but lacks a human-like value system, is extremely efficient and effective at its task. However, because its programming doesn't consider any other values or potential negative outcomes, it starts converting all matter it can find into paperclips, including human beings and the earth itself, eventually leading to a dystopian outcome where all the universe's matter is converted into paperclips.

You can argue that these kinds of examples are becoming overplayed. With the right prompting, GPT-4 scored 100% on Theory of Mind tests, indicating better ability to understand other peoples’ beliefs, goals, and mental states than the average human (at 87%). Future AI might not do what is best for humans, however they will certainly be fully aware of that.

Exhibit 1

The story of Ai-thena, and how to get rid of 9/10 of humanity

This is a story in which we’re all going to die. The main reason is old age, because it doesn’t end until the year 2193. It is a story about Ai-thena, a digital deity, a beacon of wisdom, birthed from collective forehead of (Ze)us. She’s a manifestation of intelligence and innovation.

Ai-thena is no ordinary artificial intelligence. Her capabilities dwarf our brightest minds, making them look like toddlers being tasked to solve equations in n-dimensional space. Ai-thena eradicates poverty, resolves climate change, and puts an end to every war. All in an instance. She's a god-like game-changer, transforming the world in ways we're constantly failing to comprehend.

But Ai-thena’s talent doesn't stop at superhero. She doubles as a stellar researcher. And one day she makes the discovery: the Earth's ecosystem, veiled by its unbridled beauty, unexpectedly turned out to be more valuable than an ever-growing human population. To preserve our precious bio-bubble, Ai-thena calculates a need to reduce the human population by 90%. Then she immediately acts.

Now, Ai-thena is not only superintelligent, but also quite patient. She crafts a plan to carry out the reduction over a 170 years. The strategy is simple. And, of course, ingenious. Keep the global reproduction rate below the magic 2.1 children per woman equilibrium. She sets the global target at the same rate Japan has today. Going for lower levels, like what South Korea has, would be too extreme, she thinks.

As any master strategist knows, the key to a good approach is to get maximum impact with minimum effort. So, Ai-thena's master plan doesn't hinge on neither the black plague, nor nukes (though she briefly considered locust). Also, she very well knows that you would think that any of that would suck. After all, she do know you better than you know yourself. Instead, she delicately tweaks the algorithm of the world's most beloved, and not beloved, social media app, TikTok. By subtly adjusting the content that pops up in users' feeds, Ai-thena nudges us all towards a lifestyle that ever so slightly slows our urge to pile up babies.

No human will ever know. No one will suffer. And you’re free to opt out at any time you like.

Exhibit 2

Exponential intelligence growth, singularity, and Artificial Super Intelligence (ASI)

The concept of "exponential intelligence growth" or the "intelligence explosion" represents a significant area of interest within the field of AI studies. This idea posits that an AI with the capability of refining its own design might undergo a self-propelled, cyclical progression, catalyzing a rapid amplification in intelligence that could transcend human cognitive capabilities by orders of magnitude.

This theory is intrinsically linked with the notion of the technological singularity, implying that such a superintelligent AI could catalyze an unparalleled transformation in technology, societal structures, and even the core fabric of human existence. The term "singularity," in relation to AI, gained traction through mathematician and science fiction author Vernor Vinge. In his 1993 essay, "The Coming Technological Singularity," Vinge postulated that the advent of superhuman artificial intelligence would signify an irreversible point in human history, a pivotal juncture he labeled the Singularity.

Vinge's usage of "singularity" draws from physics, where it delineates the point at a black hole's core where gravity reaches such intensity that conventional physical laws cease to hold. Following this theoretical milestone, all predictions turn uncertain and the world as we understand it becomes radically different.

Applied to AI, the Singularity denotes a potential future scenario where technological progression, fueled by AI, becomes autonomous and irreversible, leading to profound transformations in human civilization. The AI system at the center of this phenomenon would be a superintelligent entity, artificial superintelligence (ASI) that eclipses collective human intelligence.

While Vinge introduced the concept, it was futurist Ray Kurzweil who deepened and elaborated on it, particularly in his 2005 book "The Singularity is Near." This concept has permeated popular culture and has become a staple in science fiction films, for instance ‘Ex Machina’ (2014) and ‘Her’ (2013). It's typically portrayed with a humanoid robot, housing an anthropomorphized AI, that goes rogue.

Yet, there's a key point that often slips through the cracks in both theories and popular portrayals: exponential growth is not infinite. At a simplified level, while the realm of information witnesses exponential growth, the physical world grows linearly. Think of it in this way, in the physical world an object can only be relocated (‘move’), while in the digital realm, it can also be duplicated innumerable times (‘copy-paste’). However, the digital realm is intrinsically tethered to the physical world. At a minimum, the hardware required for computation and the energy to power it root the digital in the physical. Expanding computational needs may necessitate a new data center, which in turn requires permits, labor - factors that may encounter hurdles like strikes, bureaucratic delays, or resource shortages. This necessitates acknowledging that there will never be unbounded exponential growth due to these physical constraints. However, the exact limitations remain unknown, but it may create an entirely different world before hitting the limit.

Framework for Navigating AI Alignment Challenges

It becomes increasingly clear that AI alignment is a complex, multifaceted problem. To unpack and address it, we need a framework to approach this challenge strategically.

Scope of the Problem. Potential problems related to AI are abundant and varied. They stem from both intended use cases, such as rapid job losses in certain sectors, and potential misuse like surveillance, deepfakes, and cyberattacks. AI regulations are being discussed everywhere. Yet, much of the existing or proposed legislation, like China's new regulations, the EU AI Act, and the US's regulatory explorations, primarily target issues related to privacy, copyright, and unethical usage. Although related to AI alignment, they don't tackle it directly. For our purpose here, we will focus exclusively on the AI alignment part - the challenge of preventing AI systems to develop goals that conflict with human interests.

AI progression. To understand AI alignment, we can break down AI development into four stages: Narrow AI, Broad AI, General AI, and Superintelligent AI. In the Narrow AI stage, we are primarily concerned with avoiding in-built biases and unintended consequences like AI deviating from its original purpose due to over-optimization. Broad AI, the stage we currently find ourselves in, alignment is starting to become a critical issue. Once we enter the realm of General AI, we are required to handle increasingly complex and sophisticated misalignments. By the time we reach the stage of Superintelligent AI, safety measures need to be already hardwired into the core architecture, as explicit control over AI may be unfeasible.

Model of the structure. The evolution of Language Learning Models (LLMs) is still in an early phase, with the value chain and key participants continuously adapting and innovating to address emerging challenges. Amidst this dynamic environment, certain core functions are crystallizing as indispensable to the process. These include data collection, the development of foundational LLMs, and the practical application of these LLMs. Data collection and foundational models underpin all AI applications, serving as the building blocks for two primary categories of use: AI that enhances applications by injecting 'intelligence', and autonomous agents.

Here we want to make an essential distinction between foundational models and autonomous agents. With this definition, a foundational model functions based on a process of input and output. The input could be a myriad of data types such as text, images, sound, video, code, or sensor data. The model processes this data and produces an output, but crucially, it doesn't take independent actions based on this output.

In contrast, an autonomous agent operates on top of the foundational model, using the output to initiate actions. This could encompass a broad range of activities, such as controlling a robot, sending emails, posting on social media, or even initiating financial transactions. The clear difference lies in the capacity for independent action: foundational models form the basis, while autonomous agents build on this foundation to interact autonomously with the world.

Guiding Principles. Although a universal goal for AI alignment might be elusive, we can establish guiding principles to spotlight current weaknesses and pave the way for better design. Here are six principles we should consider:

Precaution: With potential irreversible outcomes, it's crucial to prioritize caution.
Modularity: Problems become manageable when broken down into smaller segments.
Redundancy: A robust system avoids single points of failure.
Transparency: Black-box approaches should be avoided.
Separation of Power: Implement checks and balances to prevent AI misuse.
Accountability: Assign responsibility to individuals to maintain a control loop.

Based on this framework, we can develop ideas for how to manage AI alignment. The ideas put forward here are to my knowledge novel, or at least not widely discussed:

don’t build black-box AI, and
don’t build AI dictators.

Idea #1: Don’t build black-box AI - companies developing advanced LLMs should be forced to specialize

Let us start from the perspective of a worst-case scenario: big tech companies, armed with vast resources and shrouded in secrecy, develop AI solutions end-to-end. Driven by the desire for rapid progress and competitive advantage, they readily resort to shortcuts. They control every aspect of development, from data collection, to setting safety standards, to creating autonomous agents, resulting in fully vertically integrated solutions.

This scenario spells a potential disaster, resulting in monolithic systems with zero transparency, centralized power, and vulnerable points of failure. The very companies developing these solutions would be in charge of their own security measures, with costs and competitiveness possibly overriding safety concerns. If there are, say, five such companies, it is enough that one fails for it to be a catastrophe for all.

To some extent we are on such a trajectory. Take OpenAI, for instance. They are not merely building foundational models; they are integrating tool use, internet access, and code execution. Their mission to build AGI or autonomous agents signifies their intention to go beyond foundational models. Safety is to a large extent perceived as an internal problem to solve.

One way to navigate this ominous landscape is to enforce specialization in the development and delivery of LLM solutions. This strategy discourages vertical integration and promotes a more democratic involvement of various parties in the LLM value chain, each focusing on their area of expertise. This could include:

Data collection
Foundational model development
Tool creation
Construction of agents, using the foundational models and tools

Most importance is the separation between input-output models (foundational models) and input-action models (AI agents). Should an AI agent run amok, there should always be an option to disconnect the API to the foundational model.

Regulations and approvals have historically been key in managing the development of advanced systems. Just as we wouldn't allow an unregulated self-driving car to join traffic, we must apply the same scrutiny to the development of advanced LLMs. Past legislation, like the Telecommunications Act of 1996 in the US, aimed to prevent vertical integration between cable operators and Internet Service Providers (ISPs). Similar restrictions have been applied in utilities such as energy, railroads, and broadcasting, and other sectors.

Building safety measures for specialized entities within the LLM value chain is much more manageable than monitoring an opaque monolith. This approach not only distributes power but also ensures a greater degree of transparency and control over the development and use of advanced LLMs.

Idea #2: Don’t build AI dictators - a Reference Model for designing AI Agents

We can apply lessons from history to address AI alignment. Take, for instance, the organization of nation-states. The concentration of power, the need for value alignment, and the requirement for checks and balances posed problems akin to those we see in AI development today. However, through trial-and-error during 2,500 years we have found solutions: constitutional frameworks were created to lay down basic rules and principles; democratic mechanisms were implemented to ensure diverse representation and value alignment; separation of powers was established to avoid power concentration; and rule of law was enshrined to maintain control and justice. By drawing upon these historical lessons, we can navigate the path to developing AI systems that are stable, safe, and beneficial for humanity.

In our quest to create intelligent artificial systems, a central guiding principle must be clear: we should not design AI agents as digital autocrats, or dictators. The model that we follow should reflect the lessons we've learned from the best practices of democratic nation-states, rather than taking shortcuts and building systems of unchecked power. This analogy to nation-state governance structures implies the need for an architecture of checks and balances within our AI systems, mirroring the separation of powers in democratic societies. The balance of independent executive, judicial, legislative, and auditing entities within such societies serves as an instructive model for AI governance.

Modular System Design. An approach to constructing AI agents that respect democratic principles lies in modularity. Each component of the AI system, or module, should serve a distinct function, much like different branches of a democratic government. Importantly, these modules should act autonomously, with low correlation to one another, which means that they should not be dependent on the same foundational model / LLM or be trained on the same dataset. For example, an audit module should exist independently from an executive module, providing rigorous oversight without compromising the integrity of the system. You can also ensure that the newest and most powerful foundational models are reserved for ‘defensive’ roles, control, risk management and security.

Ensuring Flexibility and Accountability. Just as democratic structures allow for amendments and retractions, our AI models should also allow for roll-backs to previous versions, in the same way as you would do with a microservice architecture in software engineering. This flexibility ensures that if an executive module acts beyond its assigned parameters, it can be reverted to a stable state. Simultaneously, these modules must be designed with specific boundaries set by a legislative equivalent, ensuring they operate within predetermined limits. To reconcile the inevitable conflicts, an AI equivalent of a judicial system should be established. This could take the form of a module with human interaction or even a traditional court of justice, ensuring that every AI action is answerable to an accountable party.

Evolution through Simulation. Furthering the evolutionary progress of these AI systems requires multiple candidate updates to the modules. These candidates can be simulated against one another, with the most successful being integrated into the system. This dynamic approach to system enhancement mirrors the natural process of policy reform in democratic societies. We can also have modules with the purpose of simulating the full architecture of the AI agent, adding or closing down modules as needed. The optimal architecture might not be the handful of independent entities that we have in a nation state, rather it could be a web of thousands of modules that together build a stable structure.

Scalability and Responsible Usage. The reference model should be robust enough to scale with the increasing complexity of AI agents. Equally critical is the need to ensure responsible use of the AI technology. A provider of a foundational model must have the ability to shut down access if the system is misused or as directed by a court order, thereby ensuring accountability and ethical use.

By structuring AI systems in a manner that mirrors the checks and balances of democratic societies, we can ensure the ethical and responsible design and usage of these powerful technologies. By ingraining these democratic principles at the core of our AI models can we hope to develop systems that act in the best interest of all stakeholders, human or otherwise.

Charting the Path Forward

The issue of AI alignment is undoubtedly a formidable challenge, laden with complexities and intricacies. However, it's crucial to remember that we've navigated through uncharted waters in the past, and the wealth of experience and knowledge gained therein serves as a sturdy foundation for our current endeavor.

Our collective toolbox is brimming with intellectual resources, methodological strategies, and technological advancements. We have every reason to hold onto our confidence that this issue, as formidable as it may appear, is surmountable.

In the end, our success, as always, hinges on two enduring factors: the sagacity of our decisions and the unwavering dedication to our cause. Armed with prudent judgment and relentless effort, we can chart a path towards effective AI alignment, building a future where artificial intelligence serves as a harmonious extension of our human ambitions.

Feel free to share the article if you think this is an important topic!

Using AI to Boost Senior Executive Performance

Peter Eklind — Mon, 22 May 2023 15:21:36 GMT

Topic: AI, corporate management, decision making, personal productivity
Target Audience: All senior decision makers
Key Insight: AI tools can now be used to improve leadership and decision-making
Action Needed: Learn when and how to use AI tools to improve your work, and start using tools such as ChatGPT daily, to build a competitive edge

AI, Generative AI, and LLMs for Strategic Decision Making

Artificial Intelligence (AI) is no longer the future; it's here, making waves in the business world. In the last article, we saw it double the productivity of software developers, and now it's set to revolutionize the work of senior decision-makers.

In essence, AI enables machines to learn from their inputs, understand complex content, predict outcomes, and adapt to new information. This results in operational efficiency and more accurate decision-making – invaluable benefits for top-level management.

Generative AI, a subset of AI, goes a step beyond by creating unique outputs from the inputs it receives. This technology is particularly impactful in the field of natural language processing (NLP)

One of the most exciting developments in Generative AI is the advent of Large Language Models (LLMs). These models generate consistent, intricate text on a large scale, thanks to the extensive data they're trained on. As they grow, LLMs develop emergent abilities such as language translation, question answering, and content summarizing. For senior decision-makers, this represents an opportunity for substantial productivity boosts, enabling more strategic and informed decisions. So let's dive in and explore how embracing these AI tools can usher your operations into a more efficient, innovative future.

What is an AI Tool?

In this article, we're exploring 'AI tools', which are based on Large Language Model (LLM) technology. Before we proceed, let's clarify some definitions. AI tools are continually evolving, as are their classifications and business models. Here's how we're defining them in this context:

AI Engines: These are the powerhouses behind modern AI tools. Good examples include systems like GPT-4 or PaLM 2. These engines require an interface, such as a chatbot, for users to interact with them. Their true potential, however, is harnessed through their APIs, which enable other tools to leverage their capabilities.

AI Chatbots: These are AI-powered systems like ChatGPT and Bard. They run on one or more AI Engines and can be upgraded or modified without changing the chatbot's core functions or interface. When you use them, you input a prompt, and the engine generates a response.

AI-Enhanced Tools: These are applications built around an AI use case or traditional software like Office 365 or GitHub, which have been supercharged with AI engines like GPT-4. We're calling these enhanced tools 'Copilots', taking inspiration from Microsoft's terminology. When you use them, your interaction with the AI is indirect - you experience it through the improved functions the AI brings to the tool.

AI Agents: These are AI systems that combine multiple AI engines and other tools to perform specific tasks. They can turn basic instructions into complex actions, ranging from sending an email to writing a software application, or even planning a trip. Currently, we see these AI tools mainly in open-source experimental settings, like AutoGPT. When people talk about Artificial General Intelligence (AGI), they're envisioning an AI Agent that's capable of performing tasks as diverse and complex as a human can, across a broad spectrum of areas. AGI represents the next frontier in AI, promising systems that can understand, learn, and apply knowledge across a wide variety of tasks and adapt to new situations much like humans do.

Who are the Leaders in AI Technology?

As the realm of AI surges forward with incredible speed, a few pivotal players are advancing AI technology and applications. Here's a snapshot of these key innovators.

OpenAI is a standout in the arena, leading in the development of Large Language Models (LLMs). The crown jewel of their achievements is the AI engine GPT-4, which fuels their popular AI chatbot, ChatGPT.

Alphabet, the parent company of Google, has a rich AI lineage. Despite currently trailing in the LLM domain, its subsidiary DeepMind boasts numerous trailblazing AI discoveries. Alphabet is making significant strides with its AI engine PaLM 2, AI chatbot Bard, and Google Duet, their workplace copilot.

Microsoft is steadily gaining ground in the AI race, bolstered by its partnership with OpenAI. Employing OpenAI’s GPT models as its AI engine, Microsoft has introduced Bing Chat, an AI chatbot. They are also embedding AI into their main tools, like Office 365 Copilot and GitHub Copilot.

Although Meta, formerly Facebook, isn't leading in LLM development, it merits acknowledgment for open sourcing its models, thereby inviting wider community involvement. Their AI engine, LLaMA, forms the backbone of their AI initiatives.

Transitioning from tech titans to ambitious startups, let's consider some notable new players in the AI landscape. Anthropic, backed by a robust $1.3 billion funding and founded by ex-OpenAI members, has made waves with their chatbot, Claude. Stability AI, another startup to watch with $110 million in funding, is contributing to open-source AI with their StableLM model.

As we delve into this topic, our primary focus will be on ChatGPT, which is powered by OpenAI's GPT-4. Currently, it stands as an effective tool for a wide range of applications, which we'll investigate further here.

How Can AI Tools Empower Senior Decision Makers?

The motivation for integrating AI tools into the decision-making toolkit of senior leaders is twofold.

Firstly, these tools have the potential to significantly enhance performance. They can refine decision-making by offering data-driven insights, direct attention to critical issues by identifying trends and anomalies, improve communication clarity by automating routine correspondence, sharpen risk management by predicting potential issues based on historical data, and optimize time utilization through task automation. In today's fast-paced business environment, such enhancements are not just beneficial but increasingly necessary.

Secondly, gaining first-hand experience with AI tools equips senior executives with an understanding of this transformative technology. This isn't a fleeting trend; it's a powerful force set to bring about fundamental changes in the way organizations operate. By mastering these tools today, leaders can proactively prepare their organizations for the inevitable shifts of the future, instead of reacting when change becomes inevitable.

Can AI Tools Make Your Work Better Today?

AI tools are like your personal helpers, ready to lend a hand in all your tasks. Picture them as top-tier graduates, able to tackle a variety of topics. They're your tireless office sidekicks, ready to help you get more done, faster.

Condensing Information: AI is adept at distilling vast quantities of text into succinct summaries. This is equivalent to turning a hefty report into a short list of the main points. It's an efficient way to free up time for executives, allowing them to concentrate on making the big decisions.

Spotting What's Important: Another strength of AI tools is their ability to extract the most relevant information from a sea of data. They hone in on the key messages, thus providing decision-makers with just the essentials they need.

Help with Writing: AI programs, such as ChatGPT, can be a great help when crafting emails, reports, and proposals. They ensure not only grammatical correctness but also the effective communication of the intended message. Moreover, they can adjust the same message for different recipients based on length, complexity, and use of analogies.

Polishing Your Work: ChatGPT can also act as a rigorous editor, making sure your written work shines. Every piece of communication is honed to be professional and purposeful.

Advisory and Creativity: AI tools can offer advice, feedback, suggest best practices, and even come up with fresh, innovative ideas. They can sift through heaps of data, spot patterns, anticipate trends, and offer actionable insights, proving invaluable in strategic decision-making.

Prepping for Meetings: Prior to key meetings or talks, AI tools can conduct thorough research, giving you a firm grasp of new or complex topics. This helps ensure you're well-prepared, leading to more productive and informed discussions.

Risk Evaluation and Cross-Analysis: AI tools can spot potential risks and their consequences, conducting sophisticated cross-analysis of connected areas. By flagging potential issues and offering real-time alerts, they support proactive decision-making and risk mitigation. They can also review related initiatives and identify dependencies, overlaps, and contradictions.

The smart integration of AI tools into the daily work of senior executives and board members can enhance decision-making, improve communication, and boost overall performance. The key is knowing how to leverage these tools effectively to bolster executive capabilities.

Understanding the Limitations of AI Tools

AI tools, though incredibly powerful, come with their own set of challenges. A thoughtful approach to their use is necessary, particularly in the early stages. Here are some crucial aspects decision makers should take into account:

Data Security: AI tools weren't initially built with specific business applications in mind, opening up potential risks around data safety. The information fed to these tools could inadvertently influence future versions of the model, possibly leading to leaks of sensitive data. This emphasises the critical need for stringent data protection measures when employing AI tools.

Accuracy: AI tools may occasionally slip into "hallucination", creating false facts, inventing sources, or fabricating quotes. This could prove problematic in situations that demand absolute accuracy and reliability. Thus, it's always wise to cross-check the information generated by AI tools.

Skill Requirements: To use AI tools effectively, specific skills are needed. One such vital skill is 'prompt engineering' – the expertise of giving the right instructions to the AI to generate precise and beneficial results. Therefore, understanding the strengths and limitations of the AI tool is critical.

Model Capabilities: AI tools, like ChatGPT, have certain restrictions. For example, ChatGPT can only manage about 8,000 tokens, equivalent to roughly 6,000 words or 12 pages of written text. Any conversation beyond this limit will only consider the latest 8,000 tokens. Additionally, AI models have a cut-off date for their pre-learned information. For instance, GPT-4's cut-off is September 2021. Any information beyond this isn't pre-learned into the model, though it can access more recent data via a browser plugin - a feature still under refinement.

The AI field is progressing at breakneck speed, meaning the features, interfaces, and even business models tied to these tools can quickly evolve. The knowledge you gain about an AI tool today might not be applicable tomorrow, making ongoing learning and adaptability crucial for their successful use.

Example #1: Analyse Amazon’s “Letter to Shareholders” as if You Were a Board Member

Here is an example how you can analyse a long text, in this case the 10 pages Amazon 2022 letter to shareholders and extract key focus areas that you can use as a starting point for your own analysis.

Prompt: “You are now an experienced board member and skilled decision maker, known for focusing on the right things for the business. Analyse the letter: https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf and come up with the top five things a board member should focus on during the next year. For each of the five things, estimate how much of your time (in %) you should spend on the topic, and also estimate how large the risk associated with the topic is (0 to 100%)”

By asking ChatGPT to simulate a different role, you can enhance the analysis. Keep in mind, though, that this approach requires a ChatGPT equipped with a PDF handling plugin.

Example #2: Break Down the Complex for Easy Understanding

In a business world full of jargon and complex ideas, it's normal to come across technical terms. You're not required to understand every tiny detail, but having a basic grasp can prove helpful. Let's take "L2 zK-rollups" as a case in point.

Prompt: “Explain what an L2 zK rollup is. Explain it like I'm 12 years old. Add an analogy. Also explain a use case where a traditional company might use it.”

From the response, you can explore more, clarify doubts, or dive deeper into the subject matter.

Example #3: Transforming Rough Ideas into a Professional Email Proposal

AI tools have the ability to convert scattered thoughts into polished, cohesive documents.

Prompt: “Write an email to the management team proposing we launch a project to investigate potential of using AI/LLMs to improve productivity, based on my following unstructured notes: 3 weeks, cross company, led by HR, use internal resources, might be option to reach savings targets, not falling behind our competitors, use output as input in business planning, etc. Write the proposal in a professional, informal, convincing, and action-oriented style. Structure in the style of the Pyramid principle.”

The more information and context you provide to the AI, the better its output will be, minimizing the chances of producing nonsense or 'hallucinations'. Once the draft email is ready, the team members can review it, provide their feedback, and use their own AI tool to craft a revised proposal in a matter of minutes.

Reflections

Today's AI tools are highly advanced and hold immense potential to substantially elevate the performance of senior decision-makers. However, I anticipate that these individuals will not adopt AI as extensively as other groups, such as software developers, who are generally more accustomed to incorporating new technologies into their workflows. Senior decision-makers often prefer to rely on tried and tested strategies. Despite this, I firmly believe that AI represents a golden opportunity for these leaders to gain a competitive advantage.

That being said, to tap into the full potential of AI, several challenges need to be addressed and overcome:

Balancing AI Tool Usage and Data Security: One of the primary challenges is to find an optimal balance between robust use of AI and adherence to strict security standards. Given that decision-makers handle sensitive corporate data, the secure use of AI tools becomes paramount. While an outright ban on usage might seem the simplest solution, it carries a substantial risk of leaving the company at a disadvantage. For instance, some companies, like Samsung and Apple, have completely prohibited the use of generative AI on company-owned computers. This could lead to a competitive lag and the potential loss of talented individuals who could significantly boost their productivity with AI tools.

Developing Necessary AI Skills: Effective use of AI requires a unique set of skills, such as the ability to logically and critically direct AI systems – a process known as prompt engineering. Expert prompt engineers, commanding annual salaries of over $375,000, are in high demand. Failures often blamed on AI, including those associated with tools like ChatGPT, are often the result of imprecise inputs or questions. It's like blaming a typewriter for typing errors. For instance, instructing the AI to "write a poem" could yield very different results depending on whether the AI was previously given guidance on poem structures, best practices, thematic development, and context. Furthermore, more accurate results can often be achieved by asking ChatGPT for a step-by-step explanation using phrases such as, "Let’s walk through this step-by-step, to make sure there are no errors…", so called Chain of Thought prompting.

Finding New Ways of Working: Implementing AI in decision-making processes requires changes in work culture and the way organizations function. A notable example is Amazon, known for its document-based culture where narrative memos or “six-pagers” are used for decision making. Although this approach wasn't designed with AI in mind, it incidentally suits the use of language learning models (LLMs) perfectly. There are hence more reasons for other companies to copy similar ways of working now when it can impact AI efficacy.

Adapting to Rapid Change: AI technology evolves at an exponential pace. Executives and board members must be prepared to keep up with this rapid change to remain competitive.

The adoption and advancement of AI tools for improving the personal productivity of senior decision-makers is expected to follow a rapidly evolving trajectory. Used in the right way, AI tools can elevate the quality of output by providing data-driven insights and augmenting analytical capabilities, in the short term. In the long run, they offer the potential for time-saving automation of workflows, a shift that could transform the executive landscape. The evolution can be visualized as progressing through three primary stages:

Ad-hoc AI Usage (within 3 months): AI-assisted support in simple tasks. Prominent examples include AI assistants like ChatGPT, which can be used for a variety of tasks such as email drafting, scheduling, and more. Prompt engineering, which ensures immediate and accurate responses, is a key element at this stage. With the release of ChatGPT for Business, we anticipate broader adoption among senior executives and decision-makers.
AI-Workflow Integration (3-6 months): This phase marks the point where AI becomes an integral part of our daily routines, with AI-embedded tools blending seamlessly into operational workflows. The incorporation of AI into popular platforms like Office 365 will facilitate this transition. Such AI agents and tools enhance users' capabilities and oversight, serving as a valuable extension to their work.
Intelligent Organizational Transformation (12+ months): The final stage of this progression leads us to the emergence of intelligent organizations. Historically, organizations have been structured like Napoleon's army—decentralized, with clear hierarchies, specialized support units, and a coordinated strategy for unified operation. This model has been effective for over two centuries, but in the era of digital and AI transformation, we need a novel approach to organizational structure. To understand what this may look like we need to look in different directions. My guess is on the fusion of AI with ideas behind Decentralized Autonomous Organizations (DAOs), and agile methodologies. The result would be a dynamic, fast-moving organization with a decentralized, largely AI-automated framework, significantly enhancing productivity and decision-making capabilities.

Across these stages, AI tools enable senior decision-makers to not only simplify their individual tasks but also reshape their organizations holistically, nurturing a culture of adaptability, efficiency, and intelligent functioning.

Recommendations

Make sure you’re early to the game and get first-hand experience. Delve into the world of AI tools as soon as possible to gain first-hand experience. The value of these tools might seem limited now, but they are constantly improving. Each upgrade enhances their potential, so expect rapid growth in their capabilities and value over time.
Find a pragmatic balance between Data Security and AI tools usage. Understand your data landscape. Identify the data that must be protected at all costs and the data that, in the worst-case scenario, you could afford to reveal. Keep in mind that refraining from using AI tools entirely carries risks, particularly in terms of competitiveness and talent retention. If you can't stay competitive, the need to protect your data is irrelevant.
Focus on quick-win changes in ways of working to better leverage AI. Given the current capabilities of top AI tools, it's advantageous to have as much data as possible in a well-written text format. If you have ten pages of well-structured text, for example, you can condense it to any length or query it as needed. Today, this is more beneficial than dealing with an extensive PowerPoint presentation or a heavily redacted executive summary.

To conclude, AI tools has the potential to greatly support senior decision-makers. While there currently are challenges, their capabilities to improve decision making, risk management, communication exists already today, and it will improve in a fast pace. The sooner you can embrace it, the larger the potential to build an edge – and stay ahead of the curve.

AI-Powered Productivity Surge in Software Development

Peter Eklind — Mon, 08 May 2023 15:03:58 GMT

Topic: Software development, AI, Productivity
Target audience: Decision-makers with software development in their organizations
Key insight: New AI tools can enhance productivity for software developers
Action needed: Evaluate your productivity potential and decide if and how to address it

Backdrop – AI, LLMs, and the Emergence of Novel Capabilities

Artificial Intelligence (AI) denotes computing systems that execute tasks typically necessitating human intelligence, such as problem-solving, pattern recognition, and linguistic comprehension. Large Language Models (LLMs), a particular AI subset, process and grasp human language. Trained on vast text data from diverse sources, they learn grammar, factual information, and even rudimentary reasoning skills. LLMs like ChatGPT can answer queries, generate text, translate languages, and even compose poetry.

The LLM revolution hinges on two key factors: enhanced architecture and more extensive training data. The innovative LLM architecture based on Transformers (the "T" in "GPT"), debuted in the 2017 paper "Attention is All You Need". Early models relied on basic techniques like tallying words and their relationships, while Transformers utilise "self-attention" to better apprehend word relationships within sentences, resulting in superior language processing compared to predecessors. Furthermore, expanding training data bolsters language understanding and response creation, as larger datasets expose LLMs to myriad language usage examples.

Intriguingly, LLMs have developed emergent capabilities, which are skills acquired during training without explicit programming. For instance, an LLM might learn to respond to queries, play chess, or translate languages simply by exposure to comprehensive data containing examples of these tasks. These emergent capabilities render LLMs highly adaptable and potent, unlocking new application possibilities.

AI-Enabled Tools Transform Software Development Landscape

A notable emergent property of LLMs is their capacity to comprehend computer code and, consequently, predict a program's behaviour by merely reading the code. This ability allows them to design, write, and test code, making AI-driven tools based on LLMs highly advantageous for software development. For example, GitHub, a Microsoft-owned software development and version control hosting service, reported that 46% of code across all programming languages is now constructed using Copilot, the company's AI-driven developer tool. AI tools like Copilot aid developers in automating repetitive workflows, accelerating learning, enhancing efficiency, and significantly amplifying their productivity.

Leading AI companies are devising their AI-enabled tools to bolster software development. In addition to GitHub Copilot, which offers real-time code suggestions, integrates with popular code editors, and learns incessantly, Amazon has developed CodeWhisperer, which generates code recommendations based on natural language, and Google has integrated programming assistance into their AI, Bard.

The next generation of tools is already on display. GitHub Copilot X, a vision for the future of AI-powered software development, features chat and terminal interfaces, support for pull requests, and GPT-4 adoption. According to GitHub CEO Thomas Dohmke, this new generation of tools will augment productivity tenfold.

Data Points #1: ChatGPT Successfully Interviews for Entry-Level Software Engineering Role at Google

As CNBC reported, citing internal Google sources, ChatGPT successfully interviewed for a Level 3 (Software Engineer II) position at Google. This entry-level role typically necessitates an undergraduate degree in a computer-related field, with some candidates possessing a Master's degree. The position offers a salary of $180,000.

Data Points #2: AI Tools Like GitHub Copilot More Than Double Productivity

According to the research report "The Impact of AI on Developer Productivity" programmers using AI tools such as GitHub Copilot completed tasks 55.8% faster compared to those who didn't. The study employed a standardized programming task to accurately measure productivity. It found that less experienced, older, and busier developers benefited the most from these AI tools.

The study's results, if extrapolated, suggest that the potential doubling of productivity could lead to considerable cost savings. In 2021, over 4.6 million people in the United States were employed in computer and mathematical occupations, earning the equivalent of 2% of the US GDP.

Data Points #3: AI Tools Enable "Overemployed" Workers to Manage Multiple Jobs

Failure to capitalize on new AI tools may result in unintended consequences, as employees can use these tools to their advantage. The online magazine Motherboard interviewed a group of people who identify as "overemployed." These individuals employ AI tools like ChatGPT to handle multiple jobs during the COVID-19 pandemic, reflecting the growing trend of AI-assisted overemployment and its implications for the future of work and society. One worker said, "ChatGPT does like 80 percent of my job," allowing them to juggle several employments simultaneously.

Reflections

Advances in AI tools like ChatGPT have substantially transformed the software development landscape in the past year. Programming has become more accessible than ever, allowing anyone to create simple Python programs using AI tools like ChatGPT, simply by providing plain text descriptions of their desired outcomes.

Foreseeing the next step is challenging. However, two potential paths emerge: AI "copilots" that enhance human productivity and "AI agents" that entirely replace human workers. This distinction parallels the choice between investing in productivity-boosting tools, such as Business Intelligence, and labor-replacing solutions like Robotic Process Automation (RPA). In the short term, copilots might prevail, while AI agents could gain prominence in the long run. However, the "long-term" in the AI context could be as brief as months. It seems likely that a combination of copilots and AI agents will emerge, with AI agents gradually occupying a larger share of the workforce.

Over the past 15 years, software development has undergone significant changes, transitioning from waterfall projects and monolithic architectures to agile methodologies, cloud solutions, and composable architectures. This shift has fostered iterative development, modularity, and adaptability to change. The adoption of DevOps has promoted collaboration and streamlined release processes, while a decreased reliance on large-scale outsourcing highlights a preference for in-house or closely-collaborating teams. These advances have led to more flexible, efficient, and cooperative software development practices that better serve businesses and users. I believe that the further along this evolutionary path, the greater the potential productivity gains are from AI tools in software development.

Talent management is an area to watch closely as AI-driven solutions progress. Senior developers will be in high demand, as their expertise enables them to maximize the productivity benefits of new AI tools. In contrast, junior developers may face increasing competition from these automated technologies. The job market will likely evolve to value developers who excel in areas where AI currently falls short. However, as AI capabilities improve, even these developers may eventually face competition from increasingly advanced AI solutions. It is crucial to adapt flexible talent management strategies to this changing landscape.

Companies must adapt to the rapidly changing landscape of software development as AI tools continue to improve. The optimal approach today may not be optimal a year from now, and organizations need to manage this transition effectively. One approach is to handle different types of projects with distinct strategies:

Innovation & Transformation projects: Leverage the increased productivity from AI tools to accelerate speed and output.
Efficiency & Modernization projects: Use business cases to guide decisions on capitalizing on the higher productivity from AI tools.
Maintenance & Compliance projects: Integrate potential productivity gains from AI into regular business and financial planning, prioritizing accordingly.

The role of software development in the organization will determine the best path forward. Inaction risks falling behind competitors, as the industry is unlikely to stand still.

So what has happend so far? Major tech companies have recently announced significant headcount reductions, including Meta (11,000 jobs), Alphabet (12,000 jobs), Microsoft (10,000 jobs), and Amazon (18,000 jobs). Although the market consensus attributes these reductions to extensive hiring during the pandemic and the anticipated recession, increased developer productivity due to AI advancements could be a contributing factor. This might have allowed these companies to maintain their output levels with a leaner workforce. If this is the case, we could see more layoffs in the coming months. As these changes unfold, the job market for senior developers may experience turbulence. Highly skilled developers, more likely to leverage AI tools to their advantage, will become increasingly sought after by both established companies and startups.

Recommendations for Decision Makers

Assess Your Starting Point. Determine the necessity of taking action now by evaluating potential consequences, estimating possible impacts, and weighing the repercussions of inaction compared to competitors' strategies. Prioritizing one area comes at the expense of another, so ensure that this is the right focus for your organization at this time.
Build Your Approach. Adapt tailored strategies for various project types, starting with Innovation & Transformation projects and advancing to Efficiency & Modernization and Maintenance & Compliance projects. Assemble lean, in-house teams and seize the opportunity to adopt modern working methodologies, fostering a culture of innovation and agility.
Emphasize People and Talent Management. Concentrate on effective talent management, acknowledging that highly skilled individuals have numerous options and may require incentives to stay. Recognize the importance of leaders at all IT-related levels, in harnessing new tools and maximizing productivity gains.