Wednesday, July 1, 2026

Claude Science Enters the Lab

Claude Science Enters the Lab

Today’s Overview

Good morning, AI is moving straight into the research stack. Anthropic has a new science workbench for biomedical teams, Meta is turning non-invasive brain signals into text with a big accuracy jump, and tabular AI models are getting a tougher reality check. Let’s dive in.

Top Stories

Claude Science brings AI into the research workbench

Anthropic launched Claude Science, a beta AI workbench for Pro, Max, Team, and Enterprise users on macOS and Linux. The app pulls fragmented scientific workflows into one environment, with native rendering for 3D protein structures, genome browser tracks, and chemical structures.

  • The workbench includes 60-plus curated skills and connectors for genomics, single-cell analysis, proteomics, structural biology, cheminformatics, and related domains.
  • Claude Science can run where labs already work, including local machines, SSH, or HPC login nodes rather than forcing teams to move sensitive data into a new cloud workflow.
  • Anthropic is pairing the beta with up to 50 AI for Science projects and credits for selected research efforts focused early on biology and biomedical work.

Anthropic’s Fable 5 gets cleared to return

The US Department of Commerce lifted export controls on Anthropic’s top-tier systems, clearing a path for Claude Fable 5 and Mythos 5 to return after a multi-week ban tied to jailbreak concerns. Anthropic may also introduce strict KYC identity checks for Fable 5 access as part of its regulatory response.

  • The return affects Anthropic’s top-tier systems rather than only a routine product availability update.
  • The reported restriction centered on jailbreak concerns which makes access controls and monitoring part of the product story.
  • Anthropic’s related product messaging says Fable 5 returns globally July 1 alongside a proposed industry framework for scoring jailbreak severity.

Anthropic starts an AI drug discovery push

Anthropic is starting an internal drug discovery program to build AI tools for drugmakers. The effort focuses on neglected diseases that traditional biopharmaceutical companies would not consider attractive targets, though the company has not made clear what it will do if it identifies promising candidates.

  • The program’s target area is neglected diseases where commercial incentives often make conventional drug development harder to justify.
  • The work expands Anthropic’s science effort from tooling into internal discovery activity while still centering on tools for drugmakers.
  • A key unanswered question is candidate ownership and development if Anthropic finds a promising drug lead through the program.

Research & Analysis

Meta open-sources a brain-to-text decoder

Meta released Brain2Qwerty v2, an open-source brain-to-text decoder trained on 22,000 sentences from 9 volunteers wearing MEG devices while typing. The system reached 61% average word accuracy, 78% for its best participant, and moved beyond character decoding toward whole words and meaning.

  • Meta describes v2 as non-invasive and real-time which positions it against brain-computer interfaces that require surgical implants.
  • The system improves sharply over 8% word accuracy reported for other non-invasive methods in Meta’s comparison.
  • The release includes full training code for v1 and v2 while BCBL is releasing the v1 dataset to support outside neuroscience research.

Frontier LLMs take on open math problems

Researchers tested whether frontier LLMs can do frontier theory research, not just routine mathematical tasks. They used GPT-5.5 Pro as a solver and Claude Opus 4.7 as a verifier in a prover-verifier workflow, then stress-tested the setup on open problems across areas with different levels of model familiarity.

  • The workflow separated proving from verification so one frontier model generated candidate solutions while another checked them.
  • The benchmark included open theory problems rather than only closed-form exercises or classroom-style math tasks.
  • The reported result was strongest because the system resolved a list of open questions suggesting a possible role for LLMs in active mathematical research.

Tabular foundation models face a harder benchmark

TabArena researchers introduced BeyondArena, a unified benchmark for tabular data, plus Data Foundry for curating datasets. Across 11 models and 142 curated datasets, the paper finds that tabular foundation models perform best on tiny- to medium-sized IID data, while traditional methods still lead on harder non-IID, large, and high-dimensional cases.

  • BeyondArena expands evaluation beyond standard IID settings into temporal and grouped tasks that better reflect non-IID real-world prediction problems.
  • The benchmark spans datasets from 100 to 1 million rows and from 3 to 22,000 features.
  • It also tests hard feature regimes such as free text and high-cardinality categoricals which are often excluded from simpler tabular benchmarks.

BlockPilot adapts speculative decoding per input

BlockPilot tackles diffusion-based speculative decoding by predicting the optimal block size for each sample from the prefilling representation. The method frames block size selection as a lightweight policy learning problem and reports an acceptance length of 5.92 with a 4.20x speedup on Qwen3-4B at temperature 1.

  • The paper argues fixed block sizes are suboptimal because optimal values vary across samples and materially affect speculative decoding performance.
  • Its decision space is simplified by a local structure around training block size which keeps the policy lightweight.
  • The prediction is made once after prefilling so the method can be integrated without repeated selection overhead during generation.

Trending AI Tools

  • Cursor for iOS A public beta control center for launching, steering, and reviewing AI coding agents from a phone.

  • DeepSpec DeepSeek’s open-source decoding improvement claims up to 400% higher LLM throughput.

  • Hosted X MCP A hosted MCP server for connecting AI tools to the X API for reading and taking actions.

Quick Hits

  • Meta limits Claude and Codex over reported concerns about AI model distillation and tighter competitive defenses.

  • OpenAI fixes ChatGPT search crashes after tracing Rockset pipeline failures to faulty Azure hardware and an old GNU libunwind race condition.

  • Claude Science beta connects Claude models to HPC clusters and 60 scientific databases for biomedical data pipelines.

  • Claude Sonnet 5 adds a lower-cost Sonnet model with stronger agentic performance across planning, tool use, coding, and knowledge work.

  • Arena reaches $100M showing the commercial value of AI benchmarking and evaluation infrastructure.

Keep reading for free

Enter your email. If you're already subscribed, we'll send a sign-in code. If not, you'll subscribe in the next step.

Free access. Subscribe once, then use the same email on future issues.

Free to read. Subscription just unlocks the full issue.