Value Alignment Evaluation

One-way AI alignment no longer works in generative AI world: Here's why

The authors argue that generative AI introduces a new class of alignment risks because interaction itself becomes a mechanism of influence. Humans adapt their behavior in response to AI outputs, ...

Investing

Anthropic and OpenAI release joint model alignment evaluation findings

Investing.com -- Anthropic and OpenAI have published results from their first joint alignment evaluation exercise, revealing strengths and weaknesses in both companies’ AI models when tested in ...

MediaNama

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

On August 27, 2025, Anthropic and OpenAI jointly released findings from their pilot alignment evaluation exercise, marking a significant collaboration between the two AI research organisations. In ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

One-way AI alignment no longer works in generative AI world: Here's why

Anthropic and OpenAI release joint model alignment evaluation findings

OpenAI, Anthropic Reveal Findings from AI Models Safety Tests for Misuse, Sycophancy

Trending now