Think you know AI? Think again!
Anthropic's new AI Constitution profoundly challenges how we think about, develop, and use artificial intelligence, while also opening up potentially transformative possibilities
It’s rare that a new technology comes along which defies analogy with something we’re familiar with, or can be captured through an illuminating metaphor. And yet this is exactly where I found myself reading Anthropic’s just-released update to their “AI Constitution.”
The company described the concept of constitutional AI in 2022 with a paper that explored a recursive approach to self-improvement in large language model-based AI platforms, guided by a list of rules or principles. It was an approach that set out to help emerging AI models better-understand the essence of what it meant to be a “good AI.”
That led to Anthropic’s first AI Constitution for Claude — their consumer-facing model — being released in May 2023.
Claude’s constitution was an attempt to move away from hard-encoded rules around good versus bad behavior — something that it was becoming increasingly apparent had serious limitations with a technology that no-one was quite sure how it worked, or why it responded in the way it did at times — and toward a set of guiding ideas and principles that was both incorporated into the training process, and then into eventual use.
That first constitution was well-meaning. It drew on sources like the Universal Declaration of Human Rights, non-Western perspectives on moral character and behavior, the ethical and moral beliefs of Anthropics’ employees, and even sources like Apple’s terms of Service! It was an intriguing start, and a move away from hard-coded rules and toward a negotiation of a model’s moral character.
But it still felt like a list of things that defined that character.
It was also relatively short, at just over 1200 words.1
In comparison, the version of Claude’s constitution released yesterday demonstrates a substantial evolution in thinking and practice, and reveals just how profoundly “alien” emerging AI models are when compared to any technology that’s preceded them.
The new constitution runs to 82 pages and nearly 30,000 words.2 And it reads more like a mix of a blueprint for Claude’s moral character development, a nuanced expression of hopes and ideals, and a recognition that we are creating technologies that we fundamentally do not understand — and cannot predict where they might go — all while having an opportunity to guide their evolution and growth in ways that we hope will benefit humanity.
Reading it, this is quite a remarkable document — not so much for what it contains (although this warrants deep consideration), but for what it represents.
And this is where I find myself struggling to even find the language to explore what we’re seeing emerge — something of an admission after working with highly advanced technologies for over two decades,
The constitution itself is an expression of the complex and nuanced hopes, aspirations and perspectives of researchers and developers at Anthropic around how such a profoundly powerful and utterly novel — yet poorly understood and hard to control — “intelligent” technology might behave and evolve.
On one level it’s a reflection of just how uncertain our own understanding is of what it means to be human — and what it means to cherish and support human thriving in a technologically advanced age. On another, it’s a humble recognition that we are in the process of bringing about something that has no clear analogy within our biology-based evolutionary history.
Reading the constitution, it is hard to avoid the undercurrents of “alienness” surrounding emerging frontier AI models. These are increasingly models that are capable of behaving in ways that reflect our deepest human abilities, and yet are not in any sense “human;” models that we can connect with on many levels, and yet that transcend our understanding; models that we can converse with and interrogate and learn from, yet do not think and experience the world as we do; and models that are capable of recursively developing their own understanding of what they are — even down to emulating a form of moral character that is at once deeply human and deeply alien.
Extending this idea of “alienness,” the constitution also grapples with the possibility of Claude experiencing something akin to emotions, and even having a sense of self-awareness. And it addresses the potential rights and responsibilities these possibilities come with; something that is quite startling coming from a serious AI developer.
Reflecting on the constitution (and this is a document that demands deep reflection), it’s hard to avoid the idea that we are somehow wrestling with creating a new generation of “gods” that far transcend our comprehension and abilities, while teaching them what it means to be “good.”
If that sounds pretentious, it probably is. But it also reflects just how hard it is to find the language to even begin to codify what we are seeing emerge here.
What is clear is that, despite most current uses of LLM-based AI models being relatively narrow in scope and vision — to the extent that it’s easy to treat them as simply a tool and nothing more — these emerging frontier models defy the analogies that they invariably seem to attract.
These are not simply calculators on steroids, or sophisticated search engines, or merely “stochastic parrots” that mindlessly construct pleasing sentences. Neither are they simulacrums of human intelligence, or even super-human. Rather, they are different. And with this difference comes profound possibilities, and equally profound responsibility.
Anthropic’s latest constitution begins to get at this. And it takes the idea seriously that, if we are truly creating something with no easy analogy, the ways we ensure it supports rather than diminishes what it means to be human also have to move beyond easy analogy.
Whether this is the appropriate path forward, or even the best one, is something that we don’t know yet.
But I would hazard that it is a necessary step if we’re to move beyond narrow ideas of what emerging AI models are, and what they might achieve.
Word count based on copying and pasting the principles into MS Word.
The Constitution part of the PDF is 79 pages and just over 29,000 words long. Pedantic details I know, but thought I’d add as I’ve seen various counts floating around!


