Monday, June 29, 2026

Open Coding Models Learn The Scaffold

Today’s Overview

Good morning, open models are getting more ambitious: DeepReinforce is pushing coding agents that can build their own RL scaffolds, while Anthropic and OpenAI are both threading powerful cyber-capable models through limited trusted-access rollouts. There is also fresh research on robot world simulators and hash-based language models. Let's dive in.

Top Stories

Anthropic Reopens Mythos 5 for Trusted U.S. Orgs

Anthropic has partially restored access to Mythos 5, its strongest cybersecurity model, after the company’s two most powerful models were pulled offline under a U.S. government directive. Access is back for roughly 100 trusted U.S. organizations, while general API users and international customers remain limited to older Claude models.

The restored access is focused on cybersecurity and critical infrastructure rather than broad commercial availability.
Project Glasswing members cleared for access include Apple, Google, Cisco, Nvidia, and Microsoft among other trusted U.S. organizations.
Anthropic is still negotiating broader availability while Fable 5 remains offline for general use.

OpenAI Previews GPT-5.6 Sol, Terra, and Luna

OpenAI introduced GPT-5.6 as a three-model family designed around different speed, cost, and capability tradeoffs. Sol is the flagship for complex coding, security research, and long multi-step tasks, Terra targets GPT-5.5-level workloads at lower cost, and Luna is positioned for high-volume summarization and routine automation.

The new family adds a max reasoning effort for Sol and an ultra mode that uses subagents for complex work.
OpenAI says Sol set a new state of the art on Terminal-Bench 2.1 for command-line workflows requiring planning, iteration, and tool coordination.
The rollout starts through API and Codex for select trusted partners before broader access through ChatGPT, Codex, and the API.

DeepReinforce Open-Sources Ornith-1.0 Coding Models

DeepReinforce released Ornith-1.0, an open-source family of self-improving coding models built for agentic coding. The models are trained on Gemma 4 and Qwen 3.5 foundations, with weights and a technical report available on Hugging Face for teams that want to run or study them directly.

The release spans from a 9B Dense model for edge deployment to a 397B MoE model for frontier-scale work.
Its RL loop lets the model generate both solution rollouts and task scaffolds instead of relying only on human-designed harnesses.
DeepReinforce describes a three-layer anti-gaming setup with a trust boundary, deterministic monitor, and frozen LLM judge to reduce reward-hacking risks.

PhysisForcing Makes Robot World Models More Physical

PhysisForcing is a training framework for embodied world simulation that targets physically implausible robot manipulation videos. It focuses supervision on physics-informative regions through pixel-level trajectory alignment and semantic-level relational alignment, improving both video generation quality and closed-loop robot performance.

The method supervises DiT features using reference point trajectories for pixel-level alignment.
Its semantic loss aligns features with inter-region relations extracted from a frozen video understanding encoder.
Experiments span R-Bench, PAI-Bench, and EZS-Bench to test improvements over strong embodied video generation baselines.

MultiHashFormer Shrinks Token Embeddings With Hash Signatures

MultiHashFormer proposes a hash-based autoregressive language modeling framework that replaces conventional token embeddings with unique hash signatures. A Hash Encoder compresses each signature into a latent vector for the Transformer decoder, while a Hash Decoder predicts the next token’s signature before mapping it back to text.

The paper frames standard embedding matrices as a scaling problem because they grow linearly with vocabulary size.
Its core fix is to avoid many-to-one collisions by using multi-ID hash signatures for each token.
The authors evaluate the approach at 100M, 1B, and 3B parameters and report gains over standard Transformer language models.

Codex app Adds faster scrolling and smarter navigation in a minor app update.
Codebase-memory Indexes the Linux kernel in 3 minutes and is described as cutting agent tool calls by 2x.
Qwen-AgentWorld An Apache 2.0 open-weight world model for simulating agent environments across seven domains.

Quick Hits

Grok 4.5 private beta is reportedly running at SpaceX and Tesla, with Elon Musk claiming performance comparable to Claude Opus.
Nous Mixture of Agents lets multiple models answer in parallel before an aggregator produces a final response, with internal benchmarks claiming gains over Claude Opus 4.8 and GPT-5.5.

Open Coding Models Learn The Scaffold

Today’s Overview

Top Stories

Anthropic Reopens Mythos 5 for Trusted U.S. Orgs

OpenAI Previews GPT-5.6 Sol, Terra, and Luna

DeepReinforce Open-Sources Ornith-1.0 Coding Models

Research & Analysis

PhysisForcing Makes Robot World Models More Physical

MultiHashFormer Shrinks Token Embeddings With Hash Signatures

Trending AI Tools

Quick Hits

Today’s Overview

Top Stories

Anthropic Reopens Mythos 5 for Trusted U.S. Orgs

OpenAI Previews GPT-5.6 Sol, Terra, and Luna

DeepReinforce Open-Sources Ornith-1.0 Coding Models

Research & Analysis

PhysisForcing Makes Robot World Models More Physical

MultiHashFormer Shrinks Token Embeddings With Hash Signatures

Trending AI Tools

Quick Hits

Keep reading for free