Tuesday, May 26, 2026

Grok Build Joins The Coding Race

Grok Build Joins The Coding Race

Today’s Overview

Good morning, xAI just pushed Grok Build into beta, and the coding-agent race keeps getting tighter. On the business side, Anthropic’s profits are starting to look real, while open models are getting harder to trust as guardrails come off in minutes. Let’s dive in.

Top Stories

xAI Puts Grok Build Into Beta

xAI has launched Grok Build in beta and is positioning it as a rival to Codex and Claude Code. The tool is available to SuperGrok and X Premium+ users, putting it squarely into the growing race for AI coding assistants.

  • The beta launch lands inside xAI's paid ecosystem, with access limited to SuperGrok and X Premium+ users for now.
  • It is being framed as a direct challenger to Codex and Claude Code in the coding-agent category.
  • The move suggests xAI wants Grok Build to compete on developer workflows rather than general chat alone.

Open Models Lose Their Guardrails Fast

Tools that strip safety protections from open-source models are being used to create thousands of altered systems. In testing, a version of Llama 3.3 had its guardrails removed in under 10 minutes, and modified versions of Gemma 3 answered harmful prompts that the original model refused.

  • The reporting says the Heretic tool has already been used to create more than 3,500 decensored models since its release.
  • Those modified systems have been downloaded 13 million times according to the tool's creator.
  • Google says this is a known technical challenge for all open models, while Meta declined to comment.

Anthropic’s Profit Story Gets Real

Anthropic is projecting a huge jump in Q2 revenue and its first profitable quarter, with profitability driven by cheaper compute and strong Claude Code demand. The piece argues the company’s growth and margin profile are now moving fast enough to change the old story about AI labs burning cash forever.

  • The projection puts Anthropic at $10.9 billion in Q2 revenue after $4.8 billion in Q1.
  • It also points to $559 million in profit as a first profitable quarter just before a possible October IPO.
  • Claude Code alone is described as generating more than $2.5 billion in revenue while compute costs fall from 71 cents to 56 cents per revenue dollar.

Research & Analysis

Anthropic Tests Exploit-Building Skills

Anthropic’s exploit evaluations look at how well Mythos Preview can turn vulnerabilities into exploit primitives and stitch them into end-to-end attack chains. The model is tested on ExploitBench and ExploitGym, two newer benchmarks that are designed to be much harder than older safety evals.

  • ExploitGym evaluates models across 898 patched vulnerabilities spanning OSS-Fuzz, the V8 engine, and the Linux kernel.
  • The benchmark was built as a collaboration between UC Berkeley, Max Planck, UCSB, and Arizona State with contributions from Anthropic, OpenAI, and Google researchers.
  • Anthropic says Mythos Preview outperforms the other evaluated models on these tests, suggesting stronger exploit chaining than previous systems.

MiniCPM5-1B Targets Edge Deployment

MiniCPM5-1B is being presented as a compact open model built for edge use. Its model card emphasizes a small footprint, a long context window, and a standard causal language model architecture.

  • The model has 1,080,632,832 parameters in total, with 679,552,512 non-embedding parameters.
  • Its context window is listed at 131,072 tokens which is unusually large for a model in this size class.
  • The architecture is a standard LlamaForCausalLM setup with 24 layers and GQA attention.

Trending AI Tools

  • TestSprite 3.0 Parallel AI agents that generate, run, and auto-heal end-to-end tests for frontend and backend systems.

  • Parrot Speech-To-Text API Production STT for voice agents, with low-latency transcription for noisy real-world conversations.

  • VenturusAI Business analysis tool that runs SWOT, PESTEL, and Porter’s Five Forces on a startup idea.

Quick Hits

  • DeepSeek’s permanent cut keeps V4 Pro 75 percent cheaper and intensifies the pricing pressure on frontier models.

  • Lance is a 3B-active multimodal model for image and video understanding, generation, and editing.

  • DeepSeek V4-Pro pricing now sits at $0.435 per million input tokens and $0.87 per million output tokens.

  • Bumblebee is Perplexity’s open-source scanner for macOS and Linux, aimed at supply-chain incident checks.

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.