All posts tagged: GPUS

Meet ZAYA1-8B, a super efficient, open reasoning model trained on AMD Instinct MI300 GPUs

Meet ZAYA1-8B, a super efficient, open reasoning model trained on AMD Instinct MI300 GPUs

Even as leading AI providers like OpenAI and Anthropic battle over the compute to train and release ever larger, more powerful models, other labs are going in a different direction — pursuing the development of smaller, more efficient models and often open sourcing them. The latest worth paying attention to comes from the lesser-known Palo Alto startup Zyphra, which this week released its new reasoning, mixture-of-experts (MoE) language model, ZAYA1-8B, with just over 8 billion parameters and only 760 million active — far fewer than the trillions estimated for the likes of the big labs. Yet, ZAYA1-8B retains competitive performance on third-party benchmarks against GPT-5-High and DeepSeek-V3.2. It can be downloaded from Hugging Face now free of charge under a permissive, standard, enterprise-friendly Apache 2.0 license — and enterprises and indie developers can begin using and customizing it immediately to suit their needs. Individual users can also test it themselves here free at Zyphra Cloud, the startup’s inference solution. But the real headline is what ZAYA1-8B was trained on: a full stack of AMD Instinct …

FOMO is why enterprises pay for GPUs they don’t use — and why prices keep climbing

FOMO is why enterprises pay for GPUs they don’t use — and why prices keep climbing

Enterprises can’t fix their GPU waste problem because the fix makes the problem worse. Releasing idle capacity would improve utilization, but the same shortage driving GPU prices up is exactly why no team will give capacity back. So the fleet sits at roughly 5%, billed by the hour, and the cycle tightens. That pressure — repeated across thousands of enterprises over the past two years — is the reason most companies are now running their GPU fleets at roughly 5% utilization, according to Cast AI’s 2026 State of Kubernetes Optimization Report, which measured actual production clusters rather than surveying them. It’s also the reason nobody releases the idle capacity. Cast AI co-founder and President Laurent Gil has been tracking the dynamic for two years. “Many of the neoclouds are not cloud,” he told VentureBeat. “They are neo-real estate.” Five percent is about six times worse than a no-effort baseline. Gil puts a reasonable human-managed target at around 30% once you factor in day cycles, weekends and normal business patterns. Five percent means enterprises are running …

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin. The obvious workaround is spot GPU markets — renting spare capacity to whoever needs it. But spot instances mean the cloud vendor is still the one doing the renting, and engineers buying that capacity are still paying for raw compute with no inference stack attached. FriendliAI’s answer is different: run inference directly on the unused hardware, optimize for token throughput, and split the revenue with the operator. FriendliAI was founded by Byung-Gon Chun, the researcher whose paper on continuous batching became foundational to vLLM, the open source inference engine used across most production deployments today. Chun spent over a decade as a professor at Seoul National University studying efficient execution of machine learning models at scale. That research produced a paper called Orca, which introduced continuous batching. The technique processes inference requests dynamically rather than waiting to fill a fixed batch before executing. It is …

Intel will start making GPUs, a market dominated by Nvidia 

Intel will start making GPUs, a market dominated by Nvidia 

As Intel continues to try to turn itself around, its CEO promised that the company will start producing a new type of chip, one that has been made very popular by rival Nvidia. At the Cisco AI Summit on Tuesday, Intel CEO Lip-Bu Tan announced that the company will start producing graphics processing units (GPUs). These are more specialized processors, compared to the CPUs Intel traditionally produces, and are used for gaming and tasks like training artificial intelligence models. TechCrunch reached out to Intel for more information. The project will be overseen by Kevork Kechichian, the executive vice president and general manager of Intel’s data center group, according to reporting from Reuters. Kechichian was hired in September among a slew of new engineer-focused hires. Intel also hired Eric Demers for the effort in January. Demers was previously at Qualcomm for more than 13 years, most recently serving as a senior vice president of engineering. This initiative seems to be in relatively early stages as Tan said the company plans to develop its strategy around customer …