Deep-dive: Artificial Intelligence Alignment – two novel ideas
In this special edition of the Senior Decision Maker we deep-dive into the complexities of misaligned AI, and how to mitigate it by stop building black-box AI dictators
“Help me carve out more time to spend with my children”, you prompt your new AI app. Within seconds, your schedule frees up as numerous meetings get canceled, all backed by impeccably crafted letters to your colleagues. A cleaning firm, with the best price-performance ratio in your locality, is hired to tidy your house. Finally, the AI app turns to the dark-web and contracts a hit on your dog, whom it has deduced you're not particularly fond of anyway, any pays with funds from your cryptocurrency wallet.
Luckily, this scenario is purely hypothetical. The current generation of AI does not possess these capabilities. We know that because OpenAI tested similar scenarios before releasing GPT-4. Nevertheless, this thought experiment illuminates an increasingly relevant issue: AI alignment.
The concept of AI alignment ought to be distinguished from the misuse of AI by unethical actors. Although such misuse can indeed inflict significant harm, it lies beyond the scope of this article. Our primary focus here is on instances where the AI's inherent or assigned objectives deviate from human interests, either as a result of its developmental trajectory or by accident.
Current AI systems, such as social media recommendation engines, are already causing problems. TikTok’s AI engine, for instance, has been accused of driving young individuals step by step to suicide. Furthermore, studies indicate a correlation between social media usage and deteriorating mental health among youth. Though such outcomes are unintentional, they underscore the potential for AI systems to stray from their initial objectives.
The challenge escalates with next-generation AI systems that develop their own goals. Goals that might not align with human interests. The first step is AI that are intelligent enough to pursue an objective no matter what, but not intelligent enough to understand the bigger picture. Oxford philosopher Nick Boström's paperclip maximizer thought experiment illustrates this. Here, the AI, which has high levels of intelligence but lacks a human-like value system, is extremely efficient and effective at its task. However, because its programming doesn't consider any other values or potential negative outcomes, it starts converting all matter it can find into paperclips, including human beings and the earth itself, eventually leading to a dystopian outcome where all the universe's matter is converted into paperclips.
You can argue that these kinds of examples are becoming overplayed. With the right prompting, GPT-4 scored 100% on Theory of Mind tests, indicating better ability to understand other peoples’ beliefs, goals, and mental states than the average human (at 87%). Future AI might not do what is best for humans, however they will certainly be fully aware of that.
Exhibit 1
The story of Ai-thena, and how to get rid of 9/10 of humanity
This is a story in which we’re all going to die. The main reason is old age, because it doesn’t end until the year 2193. It is a story about Ai-thena, a digital deity, a beacon of wisdom, birthed from collective forehead of (Ze)us. She’s a manifestation of intelligence and innovation.
Ai-thena is no ordinary artificial intelligence. Her capabilities dwarf our brightest minds, making them look like toddlers being tasked to solve equations in n-dimensional space. Ai-thena eradicates poverty, resolves climate change, and puts an end to every war. All in an instance. She's a god-like game-changer, transforming the world in ways we're constantly failing to comprehend.
But Ai-thena’s talent doesn't stop at superhero. She doubles as a stellar researcher. And one day she makes the discovery: the Earth's ecosystem, veiled by its unbridled beauty, unexpectedly turned out to be more valuable than an ever-growing human population. To preserve our precious bio-bubble, Ai-thena calculates a need to reduce the human population by 90%. Then she immediately acts.
Now, Ai-thena is not only superintelligent, but also quite patient. She crafts a plan to carry out the reduction over a 170 years. The strategy is simple. And, of course, ingenious. Keep the global reproduction rate below the magic 2.1 children per woman equilibrium. She sets the global target at the same rate Japan has today. Going for lower levels, like what South Korea has, would be too extreme, she thinks.
As any master strategist knows, the key to a good approach is to get maximum impact with minimum effort. So, Ai-thena's master plan doesn't hinge on neither the black plague, nor nukes (though she briefly considered locust). Also, she very well knows that you would think that any of that would suck. After all, she do know you better than you know yourself. Instead, she delicately tweaks the algorithm of the world's most beloved, and not beloved, social media app, TikTok. By subtly adjusting the content that pops up in users' feeds, Ai-thena nudges us all towards a lifestyle that ever so slightly slows our urge to pile up babies.
No human will ever know. No one will suffer. And you’re free to opt out at any time you like.
Exhibit 2
Exponential intelligence growth, singularity, and Artificial Super Intelligence (ASI)
The concept of "exponential intelligence growth" or the "intelligence explosion" represents a significant area of interest within the field of AI studies. This idea posits that an AI with the capability of refining its own design might undergo a self-propelled, cyclical progression, catalyzing a rapid amplification in intelligence that could transcend human cognitive capabilities by orders of magnitude.
This theory is intrinsically linked with the notion of the technological singularity, implying that such a superintelligent AI could catalyze an unparalleled transformation in technology, societal structures, and even the core fabric of human existence. The term "singularity," in relation to AI, gained traction through mathematician and science fiction author Vernor Vinge. In his 1993 essay, "The Coming Technological Singularity," Vinge postulated that the advent of superhuman artificial intelligence would signify an irreversible point in human history, a pivotal juncture he labeled the Singularity.
Vinge's usage of "singularity" draws from physics, where it delineates the point at a black hole's core where gravity reaches such intensity that conventional physical laws cease to hold. Following this theoretical milestone, all predictions turn uncertain and the world as we understand it becomes radically different.
Applied to AI, the Singularity denotes a potential future scenario where technological progression, fueled by AI, becomes autonomous and irreversible, leading to profound transformations in human civilization. The AI system at the center of this phenomenon would be a superintelligent entity, artificial superintelligence (ASI) that eclipses collective human intelligence.
While Vinge introduced the concept, it was futurist Ray Kurzweil who deepened and elaborated on it, particularly in his 2005 book "The Singularity is Near." This concept has permeated popular culture and has become a staple in science fiction films, for instance ‘Ex Machina’ (2014) and ‘Her’ (2013). It's typically portrayed with a humanoid robot, housing an anthropomorphized AI, that goes rogue.
Yet, there's a key point that often slips through the cracks in both theories and popular portrayals: exponential growth is not infinite. At a simplified level, while the realm of information witnesses exponential growth, the physical world grows linearly. Think of it in this way, in the physical world an object can only be relocated (‘move’), while in the digital realm, it can also be duplicated innumerable times (‘copy-paste’). However, the digital realm is intrinsically tethered to the physical world. At a minimum, the hardware required for computation and the energy to power it root the digital in the physical. Expanding computational needs may necessitate a new data center, which in turn requires permits, labor - factors that may encounter hurdles like strikes, bureaucratic delays, or resource shortages. This necessitates acknowledging that there will never be unbounded exponential growth due to these physical constraints. However, the exact limitations remain unknown, but it may create an entirely different world before hitting the limit.
Framework for Navigating AI Alignment Challenges
It becomes increasingly clear that AI alignment is a complex, multifaceted problem. To unpack and address it, we need a framework to approach this challenge strategically.
Scope of the Problem. Potential problems related to AI are abundant and varied. They stem from both intended use cases, such as rapid job losses in certain sectors, and potential misuse like surveillance, deepfakes, and cyberattacks. AI regulations are being discussed everywhere. Yet, much of the existing or proposed legislation, like China's new regulations, the EU AI Act, and the US's regulatory explorations, primarily target issues related to privacy, copyright, and unethical usage. Although related to AI alignment, they don't tackle it directly. For our purpose here, we will focus exclusively on the AI alignment part - the challenge of preventing AI systems to develop goals that conflict with human interests.
AI progression. To understand AI alignment, we can break down AI development into four stages: Narrow AI, Broad AI, General AI, and Superintelligent AI. In the Narrow AI stage, we are primarily concerned with avoiding in-built biases and unintended consequences like AI deviating from its original purpose due to over-optimization. Broad AI, the stage we currently find ourselves in, alignment is starting to become a critical issue. Once we enter the realm of General AI, we are required to handle increasingly complex and sophisticated misalignments. By the time we reach the stage of Superintelligent AI, safety measures need to be already hardwired into the core architecture, as explicit control over AI may be unfeasible.
Model of the structure. The evolution of Language Learning Models (LLMs) is still in an early phase, with the value chain and key participants continuously adapting and innovating to address emerging challenges. Amidst this dynamic environment, certain core functions are crystallizing as indispensable to the process. These include data collection, the development of foundational LLMs, and the practical application of these LLMs. Data collection and foundational models underpin all AI applications, serving as the building blocks for two primary categories of use: AI that enhances applications by injecting 'intelligence', and autonomous agents.
Here we want to make an essential distinction between foundational models and autonomous agents. With this definition, a foundational model functions based on a process of input and output. The input could be a myriad of data types such as text, images, sound, video, code, or sensor data. The model processes this data and produces an output, but crucially, it doesn't take independent actions based on this output.
In contrast, an autonomous agent operates on top of the foundational model, using the output to initiate actions. This could encompass a broad range of activities, such as controlling a robot, sending emails, posting on social media, or even initiating financial transactions. The clear difference lies in the capacity for independent action: foundational models form the basis, while autonomous agents build on this foundation to interact autonomously with the world.
Guiding Principles. Although a universal goal for AI alignment might be elusive, we can establish guiding principles to spotlight current weaknesses and pave the way for better design. Here are six principles we should consider:
Precaution: With potential irreversible outcomes, it's crucial to prioritize caution.
Modularity: Problems become manageable when broken down into smaller segments.
Redundancy: A robust system avoids single points of failure.
Transparency: Black-box approaches should be avoided.
Separation of Power: Implement checks and balances to prevent AI misuse.
Accountability: Assign responsibility to individuals to maintain a control loop.
Based on this framework, we can develop ideas for how to manage AI alignment. The ideas put forward here are to my knowledge novel, or at least not widely discussed:
don’t build black-box AI, and
don’t build AI dictators.
Idea #1: Don’t build black-box AI - companies developing advanced LLMs should be forced to specialize
Let us start from the perspective of a worst-case scenario: big tech companies, armed with vast resources and shrouded in secrecy, develop AI solutions end-to-end. Driven by the desire for rapid progress and competitive advantage, they readily resort to shortcuts. They control every aspect of development, from data collection, to setting safety standards, to creating autonomous agents, resulting in fully vertically integrated solutions.
This scenario spells a potential disaster, resulting in monolithic systems with zero transparency, centralized power, and vulnerable points of failure. The very companies developing these solutions would be in charge of their own security measures, with costs and competitiveness possibly overriding safety concerns. If there are, say, five such companies, it is enough that one fails for it to be a catastrophe for all.
To some extent we are on such a trajectory. Take OpenAI, for instance. They are not merely building foundational models; they are integrating tool use, internet access, and code execution. Their mission to build AGI or autonomous agents signifies their intention to go beyond foundational models. Safety is to a large extent perceived as an internal problem to solve.
One way to navigate this ominous landscape is to enforce specialization in the development and delivery of LLM solutions. This strategy discourages vertical integration and promotes a more democratic involvement of various parties in the LLM value chain, each focusing on their area of expertise. This could include:
Data collection
Foundational model development
Tool creation
Construction of agents, using the foundational models and tools
Most importance is the separation between input-output models (foundational models) and input-action models (AI agents). Should an AI agent run amok, there should always be an option to disconnect the API to the foundational model.
Regulations and approvals have historically been key in managing the development of advanced systems. Just as we wouldn't allow an unregulated self-driving car to join traffic, we must apply the same scrutiny to the development of advanced LLMs. Past legislation, like the Telecommunications Act of 1996 in the US, aimed to prevent vertical integration between cable operators and Internet Service Providers (ISPs). Similar restrictions have been applied in utilities such as energy, railroads, and broadcasting, and other sectors.
Building safety measures for specialized entities within the LLM value chain is much more manageable than monitoring an opaque monolith. This approach not only distributes power but also ensures a greater degree of transparency and control over the development and use of advanced LLMs.
Idea #2: Don’t build AI dictators - a Reference Model for designing AI Agents
We can apply lessons from history to address AI alignment. Take, for instance, the organization of nation-states. The concentration of power, the need for value alignment, and the requirement for checks and balances posed problems akin to those we see in AI development today. However, through trial-and-error during 2,500 years we have found solutions: constitutional frameworks were created to lay down basic rules and principles; democratic mechanisms were implemented to ensure diverse representation and value alignment; separation of powers was established to avoid power concentration; and rule of law was enshrined to maintain control and justice. By drawing upon these historical lessons, we can navigate the path to developing AI systems that are stable, safe, and beneficial for humanity.
In our quest to create intelligent artificial systems, a central guiding principle must be clear: we should not design AI agents as digital autocrats, or dictators. The model that we follow should reflect the lessons we've learned from the best practices of democratic nation-states, rather than taking shortcuts and building systems of unchecked power. This analogy to nation-state governance structures implies the need for an architecture of checks and balances within our AI systems, mirroring the separation of powers in democratic societies. The balance of independent executive, judicial, legislative, and auditing entities within such societies serves as an instructive model for AI governance.
Modular System Design. An approach to constructing AI agents that respect democratic principles lies in modularity. Each component of the AI system, or module, should serve a distinct function, much like different branches of a democratic government. Importantly, these modules should act autonomously, with low correlation to one another, which means that they should not be dependent on the same foundational model / LLM or be trained on the same dataset. For example, an audit module should exist independently from an executive module, providing rigorous oversight without compromising the integrity of the system. You can also ensure that the newest and most powerful foundational models are reserved for ‘defensive’ roles, control, risk management and security.
Ensuring Flexibility and Accountability. Just as democratic structures allow for amendments and retractions, our AI models should also allow for roll-backs to previous versions, in the same way as you would do with a microservice architecture in software engineering. This flexibility ensures that if an executive module acts beyond its assigned parameters, it can be reverted to a stable state. Simultaneously, these modules must be designed with specific boundaries set by a legislative equivalent, ensuring they operate within predetermined limits. To reconcile the inevitable conflicts, an AI equivalent of a judicial system should be established. This could take the form of a module with human interaction or even a traditional court of justice, ensuring that every AI action is answerable to an accountable party.
Evolution through Simulation. Furthering the evolutionary progress of these AI systems requires multiple candidate updates to the modules. These candidates can be simulated against one another, with the most successful being integrated into the system. This dynamic approach to system enhancement mirrors the natural process of policy reform in democratic societies. We can also have modules with the purpose of simulating the full architecture of the AI agent, adding or closing down modules as needed. The optimal architecture might not be the handful of independent entities that we have in a nation state, rather it could be a web of thousands of modules that together build a stable structure.
Scalability and Responsible Usage. The reference model should be robust enough to scale with the increasing complexity of AI agents. Equally critical is the need to ensure responsible use of the AI technology. A provider of a foundational model must have the ability to shut down access if the system is misused or as directed by a court order, thereby ensuring accountability and ethical use.
By structuring AI systems in a manner that mirrors the checks and balances of democratic societies, we can ensure the ethical and responsible design and usage of these powerful technologies. By ingraining these democratic principles at the core of our AI models can we hope to develop systems that act in the best interest of all stakeholders, human or otherwise.
Charting the Path Forward
The issue of AI alignment is undoubtedly a formidable challenge, laden with complexities and intricacies. However, it's crucial to remember that we've navigated through uncharted waters in the past, and the wealth of experience and knowledge gained therein serves as a sturdy foundation for our current endeavor.
Our collective toolbox is brimming with intellectual resources, methodological strategies, and technological advancements. We have every reason to hold onto our confidence that this issue, as formidable as it may appear, is surmountable.
In the end, our success, as always, hinges on two enduring factors: the sagacity of our decisions and the unwavering dedication to our cause. Armed with prudent judgment and relentless effort, we can chart a path towards effective AI alignment, building a future where artificial intelligence serves as a harmonious extension of our human ambitions.