All posts tagged: GPUS

FOMO is why enterprises pay for GPUs they don’t use — and why prices keep climbing

Published by skeptic

Enterprises can’t fix their GPU waste problem because the fix makes the problem worse. Releasing idle capacity would improve utilization, but the same shortage driving GPU prices up is exactly why no team will give capacity back. So the fleet sits at roughly 5%, billed by the hour, and the cycle tightens. That pressure — repeated across thousands of enterprises over the past two years — is the reason most companies are now running their GPU fleets at roughly 5% utilization, according to Cast AI’s 2026 State of Kubernetes Optimization Report, which measured actual production clusters rather than surveying them. It’s also the reason nobody releases the idle capacity. Cast AI co-founder and President Laurent Gil has been tracking the dynamic for two years. “Many of the neoclouds are not cloud,” he told VentureBeat. “They are neo-real estate.” Five percent is about six times worse than a no-effort baseline. Gil puts a reasonable human-managed target at around 30% once you factor in day cycles, weekends and normal business patterns. Five percent means enterprises are running …

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Published by skeptic

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin. The obvious workaround is spot GPU markets — renting spare capacity to whoever needs it. But spot instances mean the cloud vendor is still the one doing the renting, and engineers buying that capacity are still paying for raw compute with no inference stack attached. FriendliAI’s answer is different: run inference directly on the unused hardware, optimize for token throughput, and split the revenue with the operator. FriendliAI was founded by Byung-Gon Chun, the researcher whose paper on continuous batching became foundational to vLLM, the open source inference engine used across most production deployments today. Chun spent over a decade as a professor at Seoul National University studying efficient execution of machine learning models at scale. That research produced a paper called Orca, which introduced continuous batching. The technique processes inference requests dynamically rather than waiting to fill a fixed batch before executing. It is …

Intel will start making GPUs, a market dominated by Nvidia

Published by skeptic

As Intel continues to try to turn itself around, its CEO promised that the company will start producing a new type of chip, one that has been made very popular by rival Nvidia. At the Cisco AI Summit on Tuesday, Intel CEO Lip-Bu Tan announced that the company will start producing graphics processing units (GPUs). These are more specialized processors, compared to the CPUs Intel traditionally produces, and are used for gaming and tasks like training artificial intelligence models. TechCrunch reached out to Intel for more information. The project will be overseen by Kevork Kechichian, the executive vice president and general manager of Intel’s data center group, according to reporting from Reuters. Kechichian was hired in September among a slew of new engineer-focused hires. Intel also hired Eric Demers for the effort in January. Demers was previously at Qualcomm for more than 13 years, most recently serving as a senior vice president of engineering. This initiative seems to be in relatively early stages as Tan said the company plans to develop its strategy around customer …

Skeptic Society Magazine

for honest conversations

Years

Authors

Filter by Month

Filter by Categories

Filter by Tags

All posts tagged: GPUS

FOMO is why enterprises pay for GPUs they don’t use — and why prices keep climbing

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Intel will start making GPUs, a market dominated by Nvidia