Inference and Sampling

Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference

AI reasoning does not necessarily require spending huge amounts on frontier models. Instead, smaller models can yield ...

InfoWorld

Evolving Kubernetes for generative AI inference

Kubernetes has become the leading platform for deploying cloud-native applications and microservices, backed by an extensive community and comprehensive feature set for managing distributed systems.

SiliconANGLE

Akamai distributes AI inference across the globe, promising lower latency and higher throughput

Akamai Technologies Inc. is expanding its developer-focused cloud infrastructure platform with the launch of Akamai Cloud Inference, a highly distributed foundation for running large language models ...

NextBigFuture

OpenAI Strawberry LLM Reasoning Needs More Compute and Energy for Inference

Jim Fan is one of Nvidia’s senior AI researchers. The shift could be about many orders of magnitude more compute and energy needed for inference that can handle the improved reasoning in the OpenAI ...

Google unveils TPU 8t and TPU 8i chips for agentic AI and reasoning workloads

At Google Cloud Next, Google announced its eighth-generation Tensor Processing Units (TPUs), introducing two purpose-built architectures: TPU 8t and TPU 8i. These chips are designed to support ...

The Next Platform

Google Shows Off Its Inference Scale And Prowess

If the hyperscalers are masters of anything, it is driving scale up and driving costs down so that a new type of information technology can be cheap enough so it can be widely deployed. The ...

The Next Platform

The Battle Begins For AI Inference Compute In The Datacenter

The major cloud builders and their hyperscaler brethren – in many cases, one company acts like both a cloud and a hyperscaler – have made their technology choices when it comes to deploying AI ...

Business Wire

AWS and Cerebras Collaboration Aims to Set a New Standard for AI Inference Speed and Performance in the Cloud

Fastest inference coming soon: AWS and Cerebras are partnering to deliver the fastest AI inference available through Amazon Bedrock, launching in the next couple of months. Industry-leading speed and ...

Computerworld

CES 2026: AI compute sees a shift from training to inference

In recent years, the big money has flowed toward LLMs and training; but this year, the emphasis is shifting toward AI inference. LAS VEGAS — Not so long ago — last year, let’s say — tech industry ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results