Thursday, May 7, 2026

⚡ Anthropic’s Huge Compute Surprise

Today’s Overview

Good morning, Claude just got a bigger runway, OpenAI is pushing deeper into hardware, and the research stack keeps moving fast beneath both of them. There’s also a fresh batch of tools and benchmarks aimed at making agents more capable and easier to build. Let’s dive in.

Top Stories

Anthropic Raises Claude Limits With SpaceX Compute

Anthropic is lifting Claude usage limits after striking a compute partnership with SpaceX that gives it access to more than 220,000 NVIDIA GPUs. The company says the expansion supports wider enterprise use and international growth, including in regulated industries.

Anthropic is tapping more than 220,000 NVIDIA GPUs through its new SpaceX compute partnership.
The extra capacity is meant to support higher Claude usage limits for more customers.
Anthropic also says the move fits its push toward enterprise and international expansion including regulated markets.

OpenAI Accelerates Its First AI Phone

Ming-Chi Kuo says OpenAI is targeting mass production of its first AI phone in the first half of 2027, earlier than previously reported. He says the device will lean on image sensing and dual AI processors, with MediaTek as the sole chip supplier.

Kuo says OpenAI now wants mass production in the first half of 2027 instead of a later timeline.
He says the phone’s standout hardware will be an enhanced image signal processor built to improve visual sensing for agents.
Kuo also says MediaTek will be the sole chip supplier and the device will use two AI processors.

Codex Jumps Ahead In The Agent Race

OpenAI’s Codex is being framed as a stronger option than Claude Code after its GPT-5.5 integration and app performance gains. The shift is less about a single launch and more about how the product now fits into real workflows, from strategy docs to recruiting.

Codex is now being described as ahead of Claude Code after the GPT-5.5 integration.
The gains are tied to better app performance rather than one standalone feature.
Users are applying it to tasks like strategy documents and recruiting based on career trajectories.

ProgramBench Measures Agents On Rebuilding Software

ProgramBench is a benchmark that asks agents to recreate software executables without source code, using only documentation and experimentation. It covers 200 tasks and more than 248,000 behavioral tests, spanning everything from terminal utilities to compilers and libraries.

The benchmark asks agents to rebuild executables using only documentation and experimentation with no source code.
It includes 200 tasks across a wide range of software.
The suite contains 248,000 behavioral tests to measure how well agents hold up.

Tuna-2 Pushes Multimodal Benchmarks Forward

Tuna-2 is outperforming Tuna-R and Tuna across a range of multimodal benchmarks by using pixel embeddings. Meta plans to release only a foundation checkpoint, with some layers removed from the LLM backbone and diffusion head.

Tuna-2 is beating earlier versions by using pixel embeddings across multimodal benchmarks.
Meta says it will release only a foundation checkpoint instead of the full production-trained weights.
The release keeps most components intact, though some layers will be removed from the backbone and diffusion head.

TokenSpeed Targets Faster Agent Inference

TokenSpeed is a high-performance LLM inference engine built for agentic workloads. It claims faster throughput than TensorRT-LLM for coding agents and adds compiler-backed scheduling plus Blackwell-focused optimizations.

TokenSpeed is designed for agentic workloads with a focus on low-latency inference.
It claims faster throughput than TensorRT-LLM for coding agents.
The system adds compiler-backed scheduling along with Blackwell-focused optimizations.

vLLM V1 Tightens Correctness For RL

The vLLM V1 update fixed several inference discrepancies, including logprob computation, runtime defaults, inflight weight updates, and final projection precision. The goal was to preserve expected RL performance without needing objective-side corrections.

The update fixes discrepancies in logprob computation and runtime defaults that affected inference behavior.
It also aligns weight updates and final projection precision with vLLM V0 behavior.
The result is meant to preserve expected RL performance without extra objective-side corrections.

Claude Managed Agents Adds dreaming, outcomes, and multiagent orchestration for more self-improving workflows.
Ajelix AI Agent for Work An agentic AI sidebar for Google Workspace users.
TinyFish Search and Fetch APIs TinyFish has made its Search and Fetch APIs free for all developers.

Quick Hits

Pentagon adds new AI partners including OpenAI, Google, Microsoft, AWS, Nvidia, Oracle, Reflection, and SpaceX to classified networks.
GPT-5.5 Instant becomes default for ChatGPT’s main consumer model.
White House weighs model vetting for new AI releases before deployment.
ElevenLabs names new investors including BlackRock, Jamie Foxx, and Eva Longoria.
Google tests Antigravity upgrades with screen sharing and custom agents in its IDE.
YC’s OpenAI stake is estimated at about 0.6%, worth over $5 billion at current valuation.

⚡ Anthropic’s Huge Compute Surprise

Today’s Overview

Top Stories

Anthropic Raises Claude Limits With SpaceX Compute

OpenAI Accelerates Its First AI Phone

Codex Jumps Ahead In The Agent Race

Research & Analysis

ProgramBench Measures Agents On Rebuilding Software

Tuna-2 Pushes Multimodal Benchmarks Forward

TokenSpeed Targets Faster Agent Inference

vLLM V1 Tightens Correctness For RL

Trending AI Tools

Quick Hits

Today’s Overview

Top Stories

Anthropic Raises Claude Limits With SpaceX Compute

OpenAI Accelerates Its First AI Phone

Codex Jumps Ahead In The Agent Race

Research & Analysis

ProgramBench Measures Agents On Rebuilding Software

Tuna-2 Pushes Multimodal Benchmarks Forward

TokenSpeed Targets Faster Agent Inference

vLLM V1 Tightens Correctness For RL

Trending AI Tools

Quick Hits

Keep reading for free