Nvidia targets agentic inference with Blackwell Ultra
Nvidia announced Blackwell Ultra can deliver up to 50x performance and 35x cost reductions for agentic inference, aimed at running long-lived AI agents with persistent context memory.
Nvidia introduced the Blackwell Ultra platform this week, stating it can deliver up to 50x performance and 35x cost reductions for agentic inference compared with prior generations. The company positioned the system to run long-lived AI agents that retain persistent context memory.
Agentic inference systems ingest data from multiple sources, reason through chains of logic, and act on conclusions while preserving context over time. These workloads shift emphasis from single-query GPU throughput to a balance of compute power, larger memory capacity, higher memory bandwidth and low-latency access to stored context.
Nvidia described Blackwell Ultra as a hardware and software stack co-designed so chips, memory and storage work together for multi-step autonomous tasks. The company used the phrase “the agentic chapter is different” in its announcement.
VAST Data announced an inference architecture optimized for Nvidia’s platform to provide persistent context storage. Cloud providers have been adapting services to run agentic workloads on Nvidia hardware; DigitalOcean expanded infrastructure in partnership with Workato to support enterprise agentic inference.
For cloud operators, the requirements for agentic systems can affect procurement and data center design. Enterprises building autonomous agents may request integrated inference stacks that keep context readily available and reduce latency between compute and memory layers.
Nvidia cited the 50x and 35x figures as improvements over prior generations specifically for agentic inference. Nvidia stated Blackwell Ultra focuses on persistent context memory and integrated compute stacks for production deployments of autonomous models.




