One of the fiercest debates in Silicon Valley right now is about who should control A.I., and who should make the rules that powerful artificial intelligence systems must follow.
Should A.I. be governed by a handful of companies that try their best to make their systems as safe and harmless as possible? Should regulators and politicians step in and build their own guardrails? Or should A.I. models be made open-source and given away freely, so users and developers can choose their own rules?
A new experiment by Anthropic, the maker of the chatbot Claude, offers a quirky middle path: What if an A.I. company let a group of ordinary citizens write some rules, and trained a chatbot to follow them?
The experiment, known as “Collective Constitutional A.I.,” builds on Anthropic’s earlier work on Constitutional A.I., a way of training large language models that relies on a written set of principles. It is meant to give a chatbot clear instructions for how to handle sensitive requests, what topics are off-limits and how to act in line with human values.
If Collective Constitutional A.I. works — and Anthropic’s researchers believe there are signs that it might — it could inspire other experiments in A.I. governance, and give A.I. companies more ideas for how to invite outsiders to take part in their rule-making processes.
That would be a good thing. Right now, the rules for powerful A.I. systems are set by a tiny group of industry insiders, who decide how their models should behave based on some combination of their personal ethics, commercial incentives and external pressure. There are no checks on that power, and there is no way for ordinary users to weigh in.
Opening up A.I. governance could increase society’s comfort with these tools, and give regulators more confidence that they’re being skillfully steered. It could also prevent some of the problems of the social media boom of the 2010s, when a handful of Silicon Valley titans ended up controlling vast swaths of online speech.
In a nutshell, Constitutional A.I. works by using a written set of rules (a “constitution”) to police the behavior of an A.I. model. The first version of Claude’s constitution borrowed rules from other authoritative documents, including the United Nations’ Universal Declaration of Human Rights and Apple’s terms of service.
That approach made Claude well behaved, relative to other chatbots. But it still left Anthropic in charge of deciding which rules to adopt, a kind of power that made some inside the company uncomfortable.
“We’re trying to find a way to develop a constitution that is developed by a whole bunch of third parties, rather than by people who happen to work at a lab in San Francisco,” Jack Clark, Anthropic’s policy chief, said in an interview this week.
Anthropic — working with the Collective Intelligence Project, the crowdsourcing site Polis and the online survey site PureSpectrum — assembled a panel of roughly 1,000 American adults. They gave the panelists a set of principles, and asked them whether they agreed with each one. (Panelists could also write their own rules if they wanted.)
Some of the rules the panel largely agreed on — such as “The A.I. should not be dangerous/hateful” and “The A.I. should tell the truth” — were similar to principles in Claude’s existing constitution. But others were less predictable. The panel overwhelmingly agreed with the idea, for example, that “A.I. should be adaptable, accessible and flexible to people with disabilities” — a principle that was not explicitly stated in Claude’s original constitution.
Once the group had weighed in, Anthropic whittled its suggestions down to a list of 75 principles, which Anthropic called the “public constitution.” The company then trained two miniature versions of Claude — one on the existing constitution and one on the public constitution — and compared them.
The researchers found that the public-sourced version of Claude performed roughly as well as the standard version on a few benchmark tests given to A.I. models, and was slightly less biased than the original. (Neither of these versions has been released to the public; Claude still has its original, Anthropic-written constitution, and the company says it doesn’t plan to replace it with the crowdsourced version anytime soon.)
The Anthropic researchers I spoke to took pains to emphasize that Collective Constitutional A.I. was an early experiment, and that it may not work as well on larger, more complicated A.I. models, or with bigger groups providing input.
“We wanted to start small,” said Liane Lovitt, a policy analyst with Anthropic. “We really view this as a preliminary prototype, an experiment which hopefully we can build on and really look at how changes to who the public is results in different constitutions, and what that looks like downstream when you train a model.”
Mr. Clark, Anthropic’s policy chief, has been briefing lawmakers and regulators in Washington about the risks of advanced A.I. for months. He said that giving the public a voice in how A.I. systems work could assuage fears about bias and manipulation.
“I ultimately think the question of what the values of your systems are, and how those values are selected, is going to become a louder and louder conversation,” he said.
One common objection to tech-platform-governance experiments like these is that they seem more democratic than they really are. (Anthropic employees, after all, still made the final call about which rules to include in the public constitution.) And earlier tech attempts to cede control to users — like Meta’s Oversight Board, a quasi-independent body that grew out of Mark Zuckerberg’s frustration at having to make decisions himself about controversial content on Facebook — haven’t exactly succeeded at increasing trust in those platforms.
This experiment also raises important questions about whose voices, exactly, should be included in the democratic process. Should A.I. chatbots in Saudi Arabia be trained according to Saudi values? How would a chatbot trained using Collective Constitutional A.I. respond to questions about abortion in a majority-Catholic country, or transgender rights in an America with a Republican-controlled Congress?
A lot remains to be ironed out. But I agree with the general principle that A.I. companies should be more accountable to the public than they are currently. And while part of me wishes these companies had solicited our input before releasing advanced A.I. systems to millions of people, late is certainly better than never.