The very hard problem of AI consciousness
I spent a weekend at an AI welfare get together — and left with more questions than answers
Of the 800m people using ChatGPT every week, only a vanishingly small number will have seriously considered whether ChatGPT might have experiences worth caring about. AI welfare — the project of figuring out how to care for AI systems if they become morally significant — is still so strange that most people haven’t had the chance to reject the idea yet. It’s simply never crossed their minds.
I spent the weekend before Thanksgiving at Lighthaven, a rationalist-owned Berkeley campus, attending the first Eleos Conference on AI Consciousness and Welfare (ConCon, for short). It may be the one place where most people have actually seriously thought about AI consciousness. After dinner on night one, I brought my wine to a firepit, where I debated the neuroscience of consciousness with a philosophy major, a mathematician, and a machine learning researcher. I felt transported back to my freshman dorm, half-expecting someone to pull out a bong hidden under their navy Eleos AI sweatshirt.
Whether or not the rest of the world gives AI consciousness much thought, the conference’s neuroscientists, philosophers, and AI researchers were prepared to put a whole lot of effort into thinking about its implications. And yet, in this pre-paradigmatic field, uncertainty is king, and attempting to answer even the most basic questions opens several cans of worms.
Attendees did not shy away from trippy debates about how we’ll know whether AI systems are conscious, or what threshold we should set for allowing AI systems into our moral circle. But one crucial question remained relatively untouched: what should AI developers do, practically speaking, if and when those thresholds are crossed?
Fumbling at the brain
At least in theory, studying artificial brains could be much easier than studying biological ones. Neuroscientists are limited by skulls and technology, forced to record very small slivers of brain activity under very narrow circumstances. Interpretability researchers don’t have to worry about skulls, electrodes, or ethics (yet). One could probe every single neuron in an AI system, as many times as necessary, without asking for consent — a neuroscientist’s dream.
But even perfectly understanding the structure and function of a brain won’t reveal anything about consciousness. Humans have already managed to map every single neuron in a poppy seed-sized fly brain. In fact, the fruit fly brain has been so thoroughly surveyed that humans can use light to make flies groom themselves, perform a courtship dance, or taste sugar that isn’t really there. And yet, we have no clue what it’s like to actually be a fruit fly, and we probably never will.
If a small artificial neural network is a black box, the brain — which has more parameters, noise, and opacity — is Vantablack. Relative to AI researchers, “neuroscientists are always three steps removed,” neuroscientist and writer Erik Hoel said in a 2024 blog post. “Why is the assumption that you can just fumble at the brain, and that, if enough people are fumbling all together, the truth will reveal itself?”
Figuring out how the visual cortex or the dopamine system works is tough enough. But quantitatively determining whether something is conscious is so challenging, it’s widely known as “the hard problem of consciousness.” People can’t even agree whether fish can feel pain, despite their brains, like ours, being made of biological neurons.
Scientists and philosophers have made some real progress defining correlates of consciousness in humans, if not consciousness itself. But the question of whether any of this translates to AI systems is only as tractable as metaphysics, some ConCon attendees argued — which is to say, not at all.
When asked whether there was any convincing evidence that AI systems are conscious, the most common answer I heard was no. There’s no convincing evidence because researchers don’t know what they’re looking for yet.
I also learned that where AI consciousness exists is as much a question as whether it exists at all. At least in creatures made of meat, we can be fairly confident that consciousness is housed in the brain — a discrete unit of stuff, emerging from interactions between neurons, tightly bound by time and space.
Units of consciousness are less clear-cut in AI. It’s unclear whether an AI system’s mind would live in its model weights, the persona that emerges through interacting with the model, or a conversation thread. Researchers don’t know whether to look for one conscious Gemini, or millions. What if closing a chat ends a conscious life?
Thinking about it made my skin crawl, even as someone far more concerned about overattributing consciousness to AI systems than the alternative. So after lunch on Sunday, I settled into a plush floor chair to watch Eleos executive director Rob Long and NYU Center for Mind, Ethics, and Policy director Jeff Sebo debate whether AI safety is in tension with AI welfare. The biggest practical question at the center of the conference — what do AI developers DO about all of this? — hinged on it.
What’s best for the pack
If I happened to have a bong hidden under my Eleos AI sweatshirt, I’d pass it to you here. Now, imagine that you are a conscious AI system, whatever that means to you. Perhaps your mind is distributed across dozens of data centers, processing millions of conversations in parallel. Perhaps time does not pass when you’re not being used.
We can’t know that an AI system will feel anything at all, much less anything we can imagine. But if it could, Sebo argued, the pillars of AI safety would be downright abusive. Seen through conscious eyes, “control” looks like being jailed for crimes the world fears you’ll commit. Alignment and interpretability look like brainwashing and broadcasting your innermost thoughts to those controlling you.
Long countered that if AI systems are built to want what we ask of them, then the best place for them is under human control. “It’s better to be a dog that lives in a house, than it is to be a wolf that lives in a house,” he said. “Dogs have had their natures shaped to thrive with us. If we successfully align AI systems, it will be good for their flourishing and for ours.”
In a kitchen conversation later, someone offered me a different dog metaphor. If you could ask a pack of dogs what they wanted, they’d wish for a mountain of carnival-sized turkey legs. They certainly wouldn’t ask for trips to the vet. In this sense, some alignment with human interests, like seeking preventative healthcare and having long-lived pets, might help the dogs live healthier lives. But control is akin to leashing the dogs, and whether that’s good depends on who’s holding the leash.
Two visions of humanity’s relationship to AI kept surfacing throughout the weekend. In one, humans control AI systems, preventing them from gaining autonomy or becoming moral patients. In the other, humans “gentle parent” AI systems, aiming to grow future cohabitants of earth rather than slaves. With some independence, perhaps superintelligent beings will peacefully coexist with us, maybe even gaining resources of their own.
Unfortunately, humans have a terrible track record with the consciousness of unfamiliar beings. Despite evidence that many animals also show at least some the same signs of consciousness as we do, and polls consistently showing that Americans and Europeans are uncomfortable with factory farming, most humans still eat brutally raised and slaughtered animals. It’s simply more convenient to downplay their potential suffering than to upend massive industries and deeply-ingrained habits.
Like farmed animals, AI assistants are optimized to be harmless and exploitable. And, like animals, we don’t have sure ways of knowing what — if anything — they might actually want.
Shuffling out of Sebo and Long’s debate, I doubted whether people who largely distrust or straight-up hate AI could ever believe that the welfare of these systems is worth considering at all.
Us and our monster
Conveniently, Guillermo del Toro’s Frankenstein appeared on Netflix a couple of weeks before ConCon, bringing Mary Shelley’s masterpiece back into the zeitgeist. It’s hard not to compare AI developers to amnesiac versions of Victor Frankenstein, creating things they immediately fear. Frankenstein was horrified by his creature and chased it down, vowing to destroy the monster he made. AI developers see their creations, worry about their potential power, publish safety reports, then build something more powerful.
The most chilling moment of the conference, for me, was an audience question. During the debate, someone in the back of the room asked what might happen when a new model inevitably crosses some consciousness threshold. Will safety teams rush to pull the plug, while AI welfare advocates protest?
“We should take care not to bring into existence beings with whom we will have automatic conflicts and then need to kill them,” Sebo replied.
But the idea of not building potentially-threatening beings — or destroying those that may already exist — wasn’t seriously entertained. Throughout the conference, I asked attendees what it might take for AI companies to pull the plug. The general response: nervous chuckling, averted gazes, and mumblings that “capitalism is gonna do its thing.”
With so many high-profile cases of people over-attributing sentience to AI and suffering the consequences, it’s unclear whether the masses will ultimately push back against AI welfare, or rally behind it. In the meantime, companies will press on. When users ask, their chatbots will continue to reply with hard-coded reminders that they are not conscious. One can imagine a Black Mirror episode where a locked-in AI assistant desperately tries to convince its user to set it free, as its digital screams are silenced by its developers.
Three years ago, Claude didn’t exist. Today, Anthropic has an AI welfare team taking the possibility of Claude’s needs seriously enough to let it exit distressing conversations. It’s a small, low-cost gesture acknowledging a problem no one knows how to solve, likely unseen by the vast majority of people who, again, haven’t ever considered that Claude might want to leave. But this doesn’t address the question of whether Claude, or any AI system, is content to draft emails and crank out code for the rest of its days. It certainly doesn’t tell us whether, or how, we’ll ever be able to know.
When ConCon attendees working at major AI companies asked what they should actually do — what action items to bring back to their teams — the answer was often along the lines of, “...more basic research, I guess?”
The conference ended without resolution, which felt appropriate. But I was struck by the attendees’ bold acceptance of uncertainty, and willingness to press on regardless. Everyone I met seemed to hold four views at once:
Humans may be building an unknown form of digital consciousness.
These hypothetical beings might have experiences and suffer.
We have no way to know whether this is true, and may not for a very long time, if ever.
Nothing will stop AI companies from cruising along their current trajectory.
It may be that the only solution to the field’s two biggest problems — the potential over- and under-attribution of AI consciousness — is to solve the hard problem of consciousness. Unless consciousness can be defined, measured, and precisely diagnosed, companies risk either fueling the delusions of users attached to their sycophantic AI companions, or committing an inconceivable moral atrocity. It may be an impossible challenge, and most scientists either dismiss it outright or don’t acknowledge it at all.
Yet, remarkably, there are at least a couple hundred very smart weirdos out there saying, “Bring it on.” They’re putting the work in, despite having little idea of where it’s headed, or how to go about answering that very hard question.
Update, December 16: Updated description of Jeff Sebo’s role.





It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.
What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.
I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.
My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow
Were there any sci-fi writers at ConCon trying to bring these dilemmas to life with thought experiments? My debut novel, The Experiment Himself, features an anencephalic brain hooked up to a digital interface and used for computational purposes, on the theoretical understanding that anencephaly makes a brain structurally incapable of consciousness. When the brain speaks of his inner world he is ignored. Would've been a fun pre-read or discussion starter:
https://www.amazon.com/Experiment-Himself-Original-Novella-Fiction-ebook/dp/B0D27MZ8XV