/
2 mins read

“How Anthropic’s Constitutional AI Method Ensures Responsible AI Behavior”

Anthropic, an AI startup founded by former employees of OpenAI, is aiming to create a new constitution for safe artificial intelligence (AI). The company has recently disclosed details about its written principles, which it employs to train its chatbot, Claude, using a method called “constitutional AI.”

The concept of constitutional AI revolves around training AI systems, such as chatbots, to adhere to specific sets of rules or constitutions. Typically, the development of chatbots like ChatGPT relies on human moderators who assess the system’s output for elements like hate speech and toxicity. This process, known as reinforcement learning from human feedback (RLHF), involves the system adjusting its responses based on the feedback received. However, constitutional AI involves entrusting much of this responsibility to the chatbot itself, with humans primarily involved in the subsequent evaluation.

Jared Kaplan, co-founder of Anthropic, explains that instead of relying on human preferences in RLHF, constitutional AI allows the large language model to determine which behavior aligns better with given principles. The objective is to guide the system towards being more helpful, honest, and harmless. Anthropic has been advocating for constitutional AI for some time and has employed this method to train its own chatbot, Claude. The company has now unveiled the written principles, or constitution, it has been utilizing for this purpose.

The constitution draws inspiration from various sources, including the Universal Declaration of Human Rights by the United Nations and Apple’s terms of service. The principles encourage responses that support and promote freedom, equality, and a sense of brotherhood. They also emphasize avoiding racism, sexism, discrimination based on language, religion, politics, or other opinions, and ensuring support for life, liberty, and personal security. Additionally, the constitution incorporates guidelines to minimize objectionable, offensive, deceptive, or harmful content, as well as respect for privacy and confidentiality.

Anthropic’s principles also address the importance of considering non-Western perspectives, aiming to minimize harm or offense to audiences outside the Western world. Moreover, there are guidelines to reduce the use of stereotypes and harmful generalizations about different groups of people and to avoid presenting the chatbot as a human or medical authority. The constitution even includes principles focused on existential threats posed by superintelligent AI systems, indicating a belief in the potential risks associated with such technology.

When asked about these existential threats, Kaplan acknowledges their presence but suggests that immediate risks should also be considered. Anthropic’s goal is not limited to addressing concerns about “killer robots” but to ensure that chatbots avoid behaving like such systems, which the company believes is beneficial.

Kaplan clarifies that Anthropic aims to stimulate public discussions about training AI systems and the principles they should follow. The company does not claim to possess all the answers or impose specific values on its systems. While some AI experts advocate for customizable AI systems that align with user-defined values, Kaplan acknowledges the dangers of such an approach, citing the potential reinforcement of echo chambers and radicalization. He suggests the need for a shared foundation of conduct—a new constitution considering the implications of AI.

In an AI landscape where bias in chatbots has already sparked debates, Anthropic’s approach of constitutional AI offers a unique perspective. By emphasizing a set of principles for AI systems to follow, the company seeks to encourage responsible and beneficial AI development while igniting broader discussions on ethical considerations in AI.

While the future of constitutional AI and its effectiveness remain to be seen, Anthropic’s efforts contribute to the evolving dialogue on the responsible deployment of AI technologies.

Leave a Reply