Monday, April 27, 2026

OpenAI Rebuilds Codex Around GPT-5.5

OpenAI Rebuilds Codex Around GPT-5.5

Today’s Overview

Good morning, OpenAI is reshuffling its coding stack around GPT-5.5 while Google makes a huge bet on Anthropic. On the product side, there are fresh tools for clinics, workspaces, and image generation, plus a few useful signals on how teams are evaluating and scaling agentic systems. Let's dive in.

Top Stories

OpenAI folds Codex into GPT-5.5

OpenAI is folding Codex into GPT-5.5, which now serves as its main system for agentic coding and general reasoning. Microsoft is also rolling the model into GitHub Copilot, Microsoft 365, and Azure Foundry to support more complex workflows. OpenAI says the model is more economical overall, even with a higher API price, because it uses fewer tokens to finish tasks.

  • OpenAI says the model completes work with 40% fewer tokens even after a 20% API cost increase.
  • Developers are being told to rebuild prompts from scratch because older prompts can hold performance back.
  • GPT-5.5 also reclaimed the $11 spot on the AI index while posting mixed results on factual reliability and hallucinations.

Google may invest up to $40 billion in Anthropic

Google plans to invest between $10 billion and $40 billion in Anthropic, with the final amount tied to performance targets. The deal values Anthropic at $350 billion and is meant to help narrow the gap between AI compute demand and supply. Anthropic recently also received a major investment from Amazon.

Research & Analysis

How to monitor LLM behavior in production

Monitoring LLM behavior calls for an evaluation stack that separates deterministic checks from model-based evaluations. The approach pairs offline regression testing with human-reviewed golden datasets and online drift monitoring once systems are live. Production telemetry then feeds back into the pipeline so models can adapt as user behavior changes.

  • Use deterministic tests for syntax and routing integrity so basic failures are caught early.
  • Reserve model-based checks for semantic quality where judgment is less binary.
  • Close the loop with production telemetry to spot drift and keep performance aligned with real usage.

Meta's long-horizon coding agent trick

Meta describes a test-time scaling framework for coding agents that turns past rollouts into structured representations. By summarizing earlier work, the system can select and reuse prior steps more effectively. The result is improved benchmark performance.

Trending AI Tools

  • OpenAI Workspace Agents Codex-powered shared agents for Business and Enterprise plans, with Slack, Salesforce, Notion, and Google Drive support.

  • ChatGPT Images 2 OpenAI's new image model with 2K resolution, stronger text rendering, and web search during generation.

  • DeepSeek V4 China's model is expanding globally as competition with Western labs becomes more geopolitical.

  • Hy3 preview Tencent's first model from a rebuilt training stack, with competitive coding and search-agent scores.

  • Odyssey-2 Max A world model focused on improved physical accuracy.

Quick Hits

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.