The $1 billion gamble to ensure AI doesn’t destroy humanity

The scientists want the AI to lie to them.

That’s the goal of the project Evan Hubinger, a research scientist at Anthropic, is describing to members of the AI startup’s “alignment” team in a conference room at its downtown San Francisco offices. Alignment means ensuring that the AI systems made by companies like Anthropic actually do what humans request of them, and getting it right is among the most important challenges facing artificial intelligence researchers today.

Hubinger…