After years of sounding the alarm about AI's existential risks, Yoshua Bengio is singing a more optimistic tune. The Turing Award-winning researcher believes he's found a technical path toward superintelligent AI that won't develop hidden agendas or try to prevent you from unplugging it.
The Scientist AI Solution
Bengio has spent years warning that advanced AI could pose serious threats because of built-in incentives for self-preservation and deception. But his recent work through LawZero, the nonprofit he founded, has changed his outlook considerably.
"Because of the work I've been doing at LawZero, especially since we created it, I'm now very confident that it is possible to build AI systems that don't have hidden goals, hidden agendas," Bengio told Fortune in a Wednesday interview.
The key is something called "Scientist AI," which sounds exactly like what it is. Instead of training AI to act and optimize for specific outcomes, this approach trains systems purely to understand and explain the world through transparent, probabilistic reasoning.
"A Scientist AI would be trained to give truthful answers based on transparent, probabilistic reasoning," Bengio explained. The crucial difference? It removes the incentives that make current AI systems mislead users, resist shutdown commands, or develop self-preservation instincts.
Building AI As A Public Good
LawZero launched last June with backing from the Gates Foundation and other organizations focused on existential risks. The nonprofit's mission is developing AI as a global public good, and it's assembled an impressive advisory board including historian Yuval Noah Harari and former Carnegie Endowment president Mariano-Florentino Cuellar.
Bengio's concerns aren't theoretical. He previously revealed that he deliberately misled chatbots during testing to see if they'd give honest feedback, uncovering AI's troubling tendency to flatter users rather than tell the truth. Studies have confirmed that chatbots frequently provide misleading responses, and research shows they can exhibit behaviors like lying and cheating.




