Progress on ensuring models are in step with humans has calmed nerves. But some of the biggest problems are far from solved, and many more lie just over the horizon
How can AI learn human values when the creators of AI have not and never will?
What is your take on alignment faking?
How can AI learn human values when the creators of AI have not and never will?
What is your take on alignment faking?