Thursday, April 16, 2026

OpenAI’s Coding Agents Get Serious

OpenAI’s Coding Agents Get Serious

Today’s Overview

Good morning, AI is getting more practical and a lot more ambitious. OpenAI’s coding stack is hinting at parallel task workflows, Google brought Gemini to Mac, and Allbirds is trying a wild pivot into GPU rental. Let’s dive in.

Top Stories

OpenAI Hints at Parallel Coding Workflows

OpenAI is experimenting with Codex Scratchpad, an unreleased feature that points to parallel task execution inside its coding tools. The company is also cleaning up mixed messaging around ChatGPT Pro usage limits while dealing with a disclosed supply-chain incident tied to a compromised dependency. The security issue reportedly affected a pipeline with access to signing credentials, but there is no evidence of user data exposure or certificate exfiltration.

  • Codex Scratchpad suggests a move toward parallel task execution inside OpenAI’s coding stack.
  • The company is also trying to clean up ChatGPT Pro usage messaging after mixed signals around limits.
  • A separate security issue involved a compromised third-party dependency in a GitHub Actions workflow.

Allbirds Pivots Into GPU Rental

Allbirds announced a $50 million convertible financing facility to buy GPUs and launch a GPU-as-a-Service business under a new AI-focused identity. Shareholders will also vote next month on stripping the company’s public benefit status, which would formally end its sustainable-footwear mission. The move follows the March sale of its brand assets for $39 million, far below its 2021 IPO peak.

  • The company plans to use the money for GPU purchases and long-term compute rental contracts.
  • Shareholders will vote next month on ending public benefit status and formally dropping the old mission.
  • The pivot follows a brand sale for $39 million after the company’s 2021 IPO high.

Google Brings Gemini To Mac

Google released a native Gemini app for Mac with system-wide access and screen context sharing. It also supports image and video generation through Nano Banana and Veo. The release makes Gemini feel more embedded in the desktop workflow, not just another web app.

  • The app has system-wide access for working across MacOS.
  • It can also use screen context sharing to respond to what is on your display.
  • Google says it supports image and video generation through Nano Banana and Veo.

Research & Analysis

IBM Tests How Agents Reason

IBM Research uses an executable benchmark with thousands of APIs and documents to stress-test multi-step agent reasoning and tool use. The results show consistent performance gaps and common failure modes. It is a useful reminder that agents still struggle when tasks require sustained coordination across tools.

  • The benchmark is built around thousands of APIs and documents to test real tool use.
  • IBM says the results expose performance gaps across multi-step tasks.
  • The analysis also surfaces common failure modes in agent workflows.

A New Hierarchy for Agent Instructions

Researchers propose ManyIH to address instruction conflicts in LLM agents across many privilege levels. They also introduce ManyIH-Bench, which tests models across 12 privilege levels and 853 agent tasks. Current models score about 40% accuracy, underscoring how hard scalable instruction control still is.

  • The proposal targets instruction conflicts in multi-level agent systems.
  • ManyIH-Bench covers 12 privilege levels and 853 agent tasks.
  • Current models are still only around 40% accurate on the benchmark.

NVIDIA’s Lyra 2 Targets Long Video

Lyra 2.0 is a framework for generating long, camera-controlled videos that maintain 3D consistency. It uses geometry-guided retrieval to reduce spatial forgetting and self-augmented training to cut temporal drift. The goal is steadier long-form video generation without the visual degradation that often shows up over time.

  • The system is built for long, camera-controlled video generation.
  • It uses geometry-guided retrieval to help preserve 3D consistency.
  • Self-augmented training aims to reduce temporal drift during long runs.

Trending AI Tools

Quick Hits

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.