In many of our conferences, lectures, classes and events, we’ve been keeping an eye on this – the idea that we’re going to need to apply moral philosophy to our interaction with AI.
This cuts both ways – we have to think about how AI systems will affect humans, and, increasingly, how we may affect them.
Transformer’s report on news that Anthropic has hired Kyle Sing as a full-time AI welfare expert is just the latest harbinger of a new movement to understand better whether we have obligations to AI models.
“Fish, who joined the company’s alignment science team in mid-September, told Transformer that he is tasked with investigating ‘model welfare’ and what companies should do about it,” writes Shakeel Hashim. “The role involves exploring heady philosophical and technical questions, including which capabilities are required for something to be worthy of moral consideration, how we might recognize such capabilities in AIs, and what practical steps companies might take to protect AI systems’ interests — if they turn out to have any.”
Let’s take a look at some of the components of this kind of work.
Conscious or Robustly Agentic Systems
One of the most interesting aspects of the research that’s going on right now is around how an AI system reaches a level of sentience that we might call “moral patienthood.”
For reference, this terminology is front and center in a report that Fish and others wrote for the Eleos AI group that was set up to manage just these kinds of questions.
Just as Fish is a relative newcomer to Anthropic, at least in this capacity, a lot of this research is brand new as well…
In a paper titled “Taking AI Welfare Seriously,” Fish and others suggest that AI may attain moral patienthood either by becoming more cognitively evolved or “conscious”, or by reaching a certain level of agency that we’re now beginning to explore with reasoning models like OpenAI’s o1. The research team enumerates characteristics that might represent consciousness on the part of the system: “global workspace, higher-order representations, and an attention schema.” Characteristics for robust agency include “certain forms of planning, reasoning, or action-selection.”
After delineating these types of pathways, the authors start to talk about how to handle the eventualities involved – which we’ll discuss a bit later.
Government Input
This news comes in the same week that we’ve been reporting on the National Security Memorandum from the Biden administration on AI.
Like the work of those at Eleos.ai and other researchers, the federal guidance speaks to understanding the impact of AI, in this case, primarily, to human life, and what we should do to anticipate it.
So it seems like the memorandum is pretty timely – because of what all of these private sector companies are doing right now. For example, the memorandum supports evaluating “high impact” cases of AI, and regulating them in particular ways by documenting what they do and exploring the effects on the population at large.
Framework for AI Welfare
More on this emerging research from the AI welfare team suggests that they have a number of recommendations for AI companies, boiled down into a three-fold imperative:
· Acknowledge
· Assess
· Prepare
Researchers also talk about the risk of overestimating or underestimating the humanity of AI models, using the terms “anthropomorphism” and “anthropodenial.”
For instance, they point out that we tend to describe more agency to entities that “have eyes” or can “see,” or those that have “distinct motion trajectories” and those that display self-directed behaviors. Further:
“Evidence also suggests that features such as “cuteness” can encourage attributions of mental states and moral patienthood. Many robots or chatbots are designed to appear conscious and charismatic, and in the future, many AI systems will have bodies, life-like motion, and (at least apparently) contingent interactions. Furthermore, unlike nonhuman animals, AI systems are already increasingly able to hold extremely realistic conversations, making seemingly thoughtful contributions in realistic timeframes. These traits do not guarantee that humans will see and treat these systems as welfare subjects and moral patients, but they will increase the probability of such reactions.”
These are some good starting points for thinking about how to integrate AI regulation into our markets and our lives. In Aasimov’s words, computers should do no harm to humans, and by implication of what Anthropic and others are looking into, we shouldn’t do any harm to them, either.