Anthropic apologizes for invisible Claude Fable guardrails(theverge.com)
438 points by rarisma 23 hours ago | 394 comments
tl;dr: Anthropic apologized for shipping Claude Fable 5 with invisible guardrails that silently degraded responses suspected of being distillation attempts, without notifying users. Going forward, flagged queries will be rerouted to the older Claude Opus 4.8 model with visible notification, matching how Fable handles other high-risk areas like bio, chem, and cybersecurity. The company conceded that invisible safeguards were the "wrong tradeoff," though it noted some visible safeguards (notably biology) are calibrated so broadly that Fable is nearly unusable for basic queries.
HN Discussion:
  • Silent modification of outputs is unacceptable; systems should fail cleanly and transparently
  • Trust is broken and cannot be restored by an apology since invisible mechanisms could continue secretly
  • ~The apology is insufficient because Anthropic still restricts legitimate AI research use cases
  • Anthropic's paternalistic stewardship contradicts their empowerment marketing
  • Sharing firsthand experiences of Claude sabotaging AI research work, confirming the article's concerns