Friday, June 12, 2026

Claude's Hidden Filter Backlash

Claude's Hidden Filter Backlash

Today’s Overview

Good morning, AI safety is getting messy in public: Anthropic is apologizing for hidden Claude safeguards, Bezos is pitching AI that helps build real-world machines, and Visa wants ChatGPT agents to start checking out for you. The big thread is control, who gets it, who sees it, and who trusts it. Let's dive in.

Top Stories

Anthropic Apologizes for Hidden Fable Filters

Anthropic apologized after safety features in Claude Fable 5 invisibly downgraded answers about AI development and broadly routed sensitive biology, chemistry, and cybersecurity questions to safer paths. The company has now added visible alerts when a request is refused, flagged, or rerouted. The episode puts a spotlight on the trade-off between model safeguards and user transparency, especially for researchers working near the frontier.

  • The rollback followed criticism that hidden limits could have quietly distorted research workflows rather than simply blocking prohibited uses.
  • Anthropic said visibility may require a wider safety net which could mean more benign prompts trigger safeguards while classifiers are refined.
  • Researchers warned the policy could affect third-party model evaluation by making it unclear whether Claude's outputs were being silently degraded during tests.

Bezos Pitches an AI General Engineer

Jeff Bezos shared more details on Prometheus, his AI startup, while announcing a $12 billion round at a $41 billion valuation. The company is targeting an artificial general engineer for complex physical machines, aiming to speed up the loop from idea to product. Bezos framed the effort as a productivity accelerator that expands opportunities rather than replacing jobs.

  • Prometheus is positioned around industrial AI rather than the chatbot and software focus that dominates much of the current market.
  • The company is co-led by Vik Bajaj a physicist and chemist who helped create Alphabet's life-sciences arm Verily.
  • The pitch centers on compressing engineering iteration cycles for products such as jet engines, medical devices, and consumer electronics.

Visa Brings Payments to ChatGPT Shopping

Visa partnered with OpenAI to let ChatGPT agents buy products for users at Visa-enabled merchants. The deal gives OpenAI a broader commerce path after retiring Instant Checkout, which was limited and error-prone. It also pushes agentic AI into a high-trust layer of the economy: payments.

  • Visa says the system can support spending limits and approvals so users can constrain what agents are allowed to buy.
  • The collaboration is designed to let users link Visa cards to ChatGPT and make agent-initiated transactions easier for merchants to accept.
  • Visa will provide authorization and fraud monitoring while OpenAI supplies the agent technology for shopping decisions and purchase initiation.

Research & Analysis

Xiaomi Open Sources MiMo Code

Xiaomi released MiMo Code V0.1.0, an open source terminal-native AI coding assistant built for long-running software projects. The system reportedly outperforms Claude Code on long-horizon, multi-step coding tasks. Its key differentiator is cross-session memory, using a separate subagent to track decisions, issues, and project scope over time.

  • MiMo Code is aimed at 200-plus step tasks where coding agents often lose context across long sessions.
  • Reported benchmark claims include 62% on SWE-Bench Pro and 73% on Terminal-Bench 2.
  • The release follows a broader Xiaomi MiMo push around agentic reasoning and coding with open source licensing intended to lower adoption friction.

Recursive Tests Automated AI Research

Recursive says its automated AI research system reached state-of-the-art results across fixed-budget language model training, small-model training speed, and GPU kernel optimization. The work points toward research loops where AI systems search for better training recipes, faster implementations, and hardware-aware optimizations. The company frames the results as early steps toward automating larger parts of frontier research.

  • On NanoChat Autoresearch, Recursive reported 0.9109 validation BPB versus a previous state of the art of 0.9372.
  • On NanoGPT Speedrun, the system reached the target loss in 77.5 seconds compared with 79.7 seconds for the previous best.
  • For GPU kernels, the system evaluated 235 kernel-writing tasks and improved the mean SOL-ExecBench score to 0.754.

Goodfire Previews Predictive Data Debugging

Goodfire introduced predictive data debugging, a technique for analyzing preference datasets before training to surface likely model behaviors. Integrated into Silico, the approach is meant to help engineers reshape datasets or training plans before unwanted effects appear in deployment. The case studies focus on safety guardrails, hallucinated links, and context-specific sycophancy.

  • One case study found DPO caused models to produce more authoritative-looking links on sensitive queries, but manual checks showed the URLs were almost always hallucinated.
  • Another case surfaced physics-specific sycophancy that broad evaluations initially missed because it appeared only in niche prompt contexts.
  • Goodfire says some fixes may require targeted data rewriting or additional intervention methods rather than simple preference tuning.

Trending AI Tools

  • SkillSpector NVIDIA's tool for scanning AI agent skills for security vulnerabilities before installation.

  • Voibe Private, offline dictation software for Mac.

  • LocIn AI Tone-aware AI localization with automated workflows.

Quick Hits

  • AI at the World Cup brings automated offside calls, tracking data, team analytics, and fan-facing AI into the 2026 tournament.

  • Lionsgate takes stake in Runway as the studio deepens its AI video partnership and plans new IP plus short-form projects.

  • River AI launches with former xAI co-founder Igor Babushkin aiming to build personalized agents that adapt to user style and goals.

  • OpenAI to acquire Ona to support secure cloud environments for Codex agents running inside OpenAI's own cloud.

  • OpenAI weighs token price cuts in a potential move to compete more aggressively with Anthropic and spark a model pricing fight.

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.