Open-source AI narrows gap; value shifts to inference

Open-weight models from Zhipu, Alibaba and others have closed performance gaps with proprietary systems. Investors now target inference, edge delivery and custom silicon.

Open-source and open-weight language models from Chinese and global developers have closed much of the performance gap with proprietary systems, and market participants are shifting attention to the costs of running those models. Zhipu AI, Alibaba and several private labs released high-capability models in early 2026 that match many practical benchmarks for coding and reasoning while lowering per-token cost.

Zhipu AI published GLM-5.2 on June 13, 2026 under an MIT license with a one-million-token context window. Moonshot AI released a trillion-parameter model focused on coding a week earlier. Alibaba’s Qwen family passed one billion downloads in January 2026 and accounted for a majority of open-model downloads on public model repositories in early 2026.

Two technical techniques have supported the rapid uptake of open weights. Distillation uses large “teacher” models to generate examples that smaller “student” models train on, transferring capability with less compute. Self-improving harnesses allow models to propose, solve and grade their own tasks, enabling iterative improvement without sending internal data to external providers. Companies deploy multiple specialized sub-agents in parallel to break complex jobs into parts and merge results for multi-step workflows.

In a June 14 essay, Microsoft CEO Satya Nadella framed company AI strategy as building “human capital and token capital.” He warned against “tokenmaxxing,” a practice he defined as routing every task through an expensive frontier model when a cheaper specialized model would suffice.

The economics of AI operation are shifting from training to serving. Inference now represents about two-thirds of AI compute compared with roughly one-third in 2023. When model weights are available under permissive licenses, the remaining costs concentrate on serving models at scale and placing them close to users.

Nebius Group operates a managed serving platform called Token Factory that hosts more than 40 open models and reports customer inference-cost reductions of up to 26 times versus proprietary alternatives; one customer on the platform runs as many as 200 billion tokens per day.

Cloudflare routes inference across more than 70 models from data centers in hundreds of cities and expanded its model catalog with the April 2026 acquisition of Replicate to increase deployment options. Locating models nearer to end users reduces latency and bandwidth use for many enterprise applications.

Custom silicon and AI-specific server hardware are taking a larger share of AI deployments. TrendForce forecasts ASIC-based AI servers will account for about 27.8% of AI server shipments in 2026. MediaTek projects more than $1 billion in AI ASIC revenue for 2026. Suppliers of AI connectivity and interconnect components reported strong growth: Astera Labs’ first-quarter revenue rose 93% to $308 million, and Credo Technology guided to roughly 120% revenue growth for fiscal 2026.

The ROBO Global Artificial Intelligence Index (THNQ) includes both model developers such as Meta, Alibaba and Microsoft and infrastructure firms that provide serving, edge delivery and silicon. Index managers and analysts are tracking three measurable indicators-share of compute used for inference, ASIC penetration in AI servers and enterprise adoption of agent-style automation-to identify which companies capture recurring revenue tied to open-source model adoption.

Articles by this author