Anthropic Apologizes for Hidden Fable Filters
Anthropic apologized after safety features in Claude Fable 5 invisibly downgraded answers about AI development and broadly routed sensitive biology, chemistry, and cybersecurity questions to safer paths. The company has now added visible alerts when a request is refused, flagged, or rerouted. The episode puts a spotlight on the trade-off between model safeguards and user transparency, especially for researchers working near the frontier.
- The rollback followed criticism that hidden limits could have quietly distorted research workflows rather than simply blocking prohibited uses.
- Anthropic said visibility may require a wider safety net which could mean more benign prompts trigger safeguards while classifiers are refined.
- Researchers warned the policy could affect third-party model evaluation by making it unclear whether Claude's outputs were being silently degraded during tests.
