Red Hat, the global leader in open source software, has been released LLM-D, Designed to solve the key challenges of generating AI, new open source projects efficiently execute large-scale AI models at scale. By combining Kubernetes with VLLM technology, LLM-D It enables fast, flexible, and cost-effective AI performance across a variety of clouds and hardware.
CoreWeave, Google Cloud, IBM Research, and Nvidia have founded contributors for LLM-D. Partners such as AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI are also on board. Researchers from the University of California, Berkeley and the University of Chicago supported the project, which developed the VLLM and LMCache.
A new era of flexible and scalable AI
Red Hat’s goal is clear. Enterprises run on any hardware, on any hardware, on any hardware, without being locked to expensive or complex systems. To help Red Hat make Linux a standard for enterprises, we want to make VLLM and LLM-D a new standard for running at scale.
By building a strong and open community, Red Hat aims to make AI easier, faster, faster and more accessible.
Also Read: Kubectl-Ai: AI for Kubernetes CLI Management 2025
What LLM-D brings to the table
LLM-D Introducing a variety of new technologies to speed up and simplify your AI workloads.
- VLLM IntegrationA widely adopted open source inference server that runs on many hardware types, including the latest AI models and Google Cloud TPU.
- Partition processing (Prefill and Decode): Divide the model’s tasks into two steps and run them on different machines to improve performance.
- Smarter memory usage (kv cache offload):Storing expensive GPU memory using inexpensive CPU or network memory with LMCache.
- Efficient resource management using Kubernetes: Balance your computing and storage needs in real time to keep things fast and smooth.
- AI AWARE Routing: Send requests to a server where the relevant data is already cached, speeding up responses.
- Faster data sharing between servers: Move data quickly between systems using fast tools such as NVIDIA’s NIXL.
Red Hat’s LLM-D is a powerful new platform for running large AI models quickly and efficiently, helping businesses use AI without high costs or slowing down.
Conclusion
Red Hat Release LLM-D Take the major steps towards making generated AI practical and scalable. Combining the power of Kubernetes, VLLM, and the advanced AI infrastructure strategies, LLM-D enables businesses to run large language models more efficiently in the cloud, hardware, or environment. Focusing on strong industry support and open collaboration, Red Hat not only solves the technical barriers of AI inference, it also lays the foundation for a flexible, affordable, standardized AI future.