Monday, June 29, 2026

Open Coding Models Learn The Scaffold

Open Coding Models Learn The Scaffold

Today’s Overview

Good morning, open models are getting more ambitious: DeepReinforce is pushing coding agents that can build their own RL scaffolds, while Anthropic and OpenAI are both threading powerful cyber-capable models through limited trusted-access rollouts. There is also fresh research on robot world simulators and hash-based language models. Let's dive in.

Top Stories

Anthropic Reopens Mythos 5 for Trusted U.S. Orgs

Anthropic has partially restored access to Mythos 5, its strongest cybersecurity model, after the company’s two most powerful models were pulled offline under a U.S. government directive. Access is back for roughly 100 trusted U.S. organizations, while general API users and international customers remain limited to older Claude models.

  • The restored access is focused on cybersecurity and critical infrastructure rather than broad commercial availability.
  • Project Glasswing members cleared for access include Apple, Google, Cisco, Nvidia, and Microsoft among other trusted U.S. organizations.
  • Anthropic is still negotiating broader availability while Fable 5 remains offline for general use.

OpenAI Previews GPT-5.6 Sol, Terra, and Luna

OpenAI introduced GPT-5.6 as a three-model family designed around different speed, cost, and capability tradeoffs. Sol is the flagship for complex coding, security research, and long multi-step tasks, Terra targets GPT-5.5-level workloads at lower cost, and Luna is positioned for high-volume summarization and routine automation.

  • The new family adds a max reasoning effort for Sol and an ultra mode that uses subagents for complex work.
  • OpenAI says Sol set a new state of the art on Terminal-Bench 2.1 for command-line workflows requiring planning, iteration, and tool coordination.
  • The rollout starts through API and Codex for select trusted partners before broader access through ChatGPT, Codex, and the API.

DeepReinforce Open-Sources Ornith-1.0 Coding Models

DeepReinforce released Ornith-1.0, an open-source family of self-improving coding models built for agentic coding. The models are trained on Gemma 4 and Qwen 3.5 foundations, with weights and a technical report available on Hugging Face for teams that want to run or study them directly.

  • The release spans from a 9B Dense model for edge deployment to a 397B MoE model for frontier-scale work.
  • Its RL loop lets the model generate both solution rollouts and task scaffolds instead of relying only on human-designed harnesses.
  • DeepReinforce describes a three-layer anti-gaming setup with a trust boundary, deterministic monitor, and frozen LLM judge to reduce reward-hacking risks.

Research & Analysis

PhysisForcing Makes Robot World Models More Physical

PhysisForcing is a training framework for embodied world simulation that targets physically implausible robot manipulation videos. It focuses supervision on physics-informative regions through pixel-level trajectory alignment and semantic-level relational alignment, improving both video generation quality and closed-loop robot performance.

  • The method supervises DiT features using reference point trajectories for pixel-level alignment.
  • Its semantic loss aligns features with inter-region relations extracted from a frozen video understanding encoder.
  • Experiments span R-Bench, PAI-Bench, and EZS-Bench to test improvements over strong embodied video generation baselines.

MultiHashFormer Shrinks Token Embeddings With Hash Signatures

MultiHashFormer proposes a hash-based autoregressive language modeling framework that replaces conventional token embeddings with unique hash signatures. A Hash Encoder compresses each signature into a latent vector for the Transformer decoder, while a Hash Decoder predicts the next token’s signature before mapping it back to text.

  • The paper frames standard embedding matrices as a scaling problem because they grow linearly with vocabulary size.
  • Its core fix is to avoid many-to-one collisions by using multi-ID hash signatures for each token.
  • The authors evaluate the approach at 100M, 1B, and 3B parameters and report gains over standard Transformer language models.

Trending AI Tools

  • Codex app Adds faster scrolling and smarter navigation in a minor app update.

  • Codebase-memory Indexes the Linux kernel in 3 minutes and is described as cutting agent tool calls by 2x.

  • Qwen-AgentWorld An Apache 2.0 open-weight world model for simulating agent environments across seven domains.

Quick Hits

  • Grok 4.5 private beta is reportedly running at SpaceX and Tesla, with Elon Musk claiming performance comparable to Claude Opus.

  • Nous Mixture of Agents lets multiple models answer in parallel before an aggregator produces a final response, with internal benchmarks claiming gains over Claude Opus 4.8 and GPT-5.5.

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.