March 6, 2026

AI takes the mouse, checkout moves out & more

Today’s Overview

Enterprise AI is entering a phase of rapid capability expansion and cost reduction, as leading providers roll out more autonomous agents and higher‑performing models while hardware suppliers anticipate massive chip demand. These moves signal broader adoption across cloud, open‑source, and productivity ecosystems.

OpenAI's GPT-5.4 agents can control desktop environments and achieve 75% success on the OSWorld‑Verified benchmark, surpassing average human performance.
OpenAI upgraded ChatGPT to GPT-5.3 Instant, cutting hallucinations by roughly 27% on web queries and delivering 25% faster inference.
Broadcom projected AI chip sales exceeding $100 billion by 2027, reflecting growing demand from hyperscale data centers.
Alibaba released the Qwen 3.5 series, with the 9B model outperforming OpenAI's GPT-OSS-120B on several benchmarks.
Google introduced Gemini 3.1 Flash-Lite, offering a million‑token context window at one‑eighth the price of its Pro tier while being 2.5× faster to first token.
Microsoft open‑sourced the Phi-4-reasoning-vision-15B multimodal model, delivering efficient text‑and‑image reasoning for enterprise workloads.

Top Stories

OpenAI launches GPT-5.4 with AI agents that surpass human performance on operating-system tasks

OpenAI introduced the GPT-5.4 family, enabling AI agents to control desktop environments through mouse and keyboard actions. In independent testing on the OSWorld-Verified benchmark, the models achieved a 75% success rate, exceeding the average human score of 72.4%. The Pro version also set new records on the FrontierMath benchmark and topped the Short-Story Creative Writing Benchmark. A new Tool Search feature dynamically retrieves API definitions, cutting token usage by about 47% for complex workflows. The rollout is planned for the second quarter of 2026 after a brief prototype phase.

Read Full Article

OpenAI upgrades ChatGPT to GPT-5.3 Instant with reduced hallucinations

OpenAI has made GPT-5.3 Instant the default model for ChatGPT, replacing the previous 5.2 version. The update reduces web-based hallucinations by 26.8% and internal knowledge errors by 19.7%. Inference speed is 25% faster than earlier Instant releases. OpenAI emphasizes the improvement in factual precision for high-risk domains such as finance and law. The announcement also mentioned new developer tools and internal data agents, though those are beyond the scope of this update.

Read Full Article

OpenAI to remove native checkout from ChatGPT, moving purchases to partner apps

OpenAI announced that it will discontinue the built-in checkout flow in ChatGPT and transfer transaction handling to third-party applications. The original checkout, launched in September 2025, attracted few merchants and required OpenAI to manage onboarding and sales-tax collection. By delegating payments to partner apps, OpenAI aims to simplify compliance and leverage existing e-commerce ecosystems. This change may affect the speed at which new services can be monetized within ChatGPT and could reshape the platform's revenue model.

Read Full Article

Research & Analysis

Broadcom projects over $100 billion in AI chip revenue by 2027

Broadcom forecast AI chip sales exceeding $100 billion in 2027. The company expects continued demand from hyperscale cloud providers expanding data-center capacity to support large language models and generative AI services. This growth positions Broadcom as a key supplier in the global AI compute market. The projection reflects anticipated increases in chip utilization for high-performance inference workloads.

Read Source

Alibaba releases Qwen 3.5 series, with 9B model outperforming GPT-OSS-120B

Alibaba unveiled the Qwen 3.5 Small Model Series, ranging from 0.8B to 9B parameters. The 9B variant delivers strong reasoning capabilities and surpasses OpenAI's open-source GPT-OSS-120B on several benchmark suites. The 4B model offers a 262,144-token context window suitable for lightweight agents. All model weights are released under an Apache 2.0 license on Hugging Face and ModelScope.

Read Source

Anthropic proposes framework to quantify AI’s impact on labor markets

Anthropic published a paper introducing a quantitative framework for measuring AI-driven changes in employment. The authors define metrics that capture both direct automation effects and indirect productivity gains. The framework is intended to guide future economic studies and inform policy and corporate strategy. Establishing a standard measurement can improve decision-making around AI adoption.

Read Source

New paper studies scaling laws for native multimodal foundation models

The paper (arXiv:2603.03276) investigates scaling laws for native multimodal foundation models through controlled pre-training experiments. It examines how various modality combinations—such as text, images, and video—affect model performance. The authors present empirical scaling relationships that describe how multimodal pre-training efficiency changes with model size and data volume. These findings aim to guide future research on optimal modality integration strategies.

Read Source

Trending Tools

Google open-sources CLI for Workspace with 40+ agent skills
Google released an open-source command-line interface for Workspace that includes more than 40 pre-built agent skills, enabling developers to automate tasks across Gmail, Docs, Sheets, and other apps. This tool facilitates integration of Workspace functions into broader AI-driven workflows.
Cursor launches automation platform for continuously running AI agents
Cursor introduced an automation platform that lets developers create AI agents that run continuously or trigger on specific events. Agents execute in isolated cloud sandboxes, verify their outputs, and retain memory across runs to improve tasks such as code reviews and pipeline monitoring.
Perplexity adds Skills feature to Computer platform for reusable markdown workflows
Perplexity expanded its Computer product with Skills, allowing users to craft markdown-based reusable workflow snippets for consistent task automation. A forthcoming "Final Pass" mode will enhance document review capabilities.

Quick Hits

OpenAI develops internal GitHub alternative after Azure migration outages
OpenAI is developing an internal code repository platform to replace GitHub. Project started after repeated GitHub outages during Azure migration. May eventually be offered to external paying customers.
Google unveils Gemini 3.1 Flash-Lite at one-eighth the cost of Pro
Gemini 3.1 Flash-Lite is 2.5× faster to first token than Gemini 2.5 Flash Throughput of 363 tokens / second and a 1‑million‑token context window Costs $0.25 per 1M input tokens and $1.50 per 1M output tokens (1/8 the price of Gemini 3.1 Pro)
AI safety outlook warns only twelve months to embed safeguards
A perspective argues that AI labs have limited time to embed safety measures before competitive pressures make them impossible to enforce.
Vietnam enacts comprehensive national AI legislation
Vietnam passed comprehensive AI law covering regulation, ethics, and enforcement. Marks one of the most extensive AI policy frameworks in Southeast Asia.
Microsoft open-sources Phi-4-reasoning-vision-15B multimodal model
Microsoft releases Phi‑4‑reasoning‑vision‑15B, a compact multimodal model that can reason over text and images while being efficient in compute and data usage.
Google AI enhances visual search to recognize multiple objects per image
Google explains the technology behind Lens and Search to Image. New system can detect and search multiple objects within a single picture. Improves visual discovery for outfits, rooms, and complex scenes.

Today’s Overview

Top Stories

OpenAI launches GPT-5.4 with AI agents that surpass human performance on operating-system tasks

OpenAI upgrades ChatGPT to GPT-5.3 Instant with reduced hallucinations

OpenAI to remove native checkout from ChatGPT, moving purchases to partner apps

Research & Analysis

Broadcom projects over $100 billion in AI chip revenue by 2027

Alibaba releases Qwen 3.5 series, with 9B model outperforming GPT-OSS-120B

Anthropic proposes framework to quantify AI’s impact on labor markets

New paper studies scaling laws for native multimodal foundation models

Trending Tools

Quick Hits

Keep reading for free