XDP LightningAI Is An Ideal Match for Multi-Turn Conversation and Task Agent Applications; Significantly Improves All Models - Including Those by DeepSeek

XDP LightningAI

Pliops XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed KV service. This solution revolutionizes LLM performance by delivering end-to-end efficiency gains while significantly reducing cost, power, and computational requirements. By enabling vLLM to process each context only once, Pliops is setting a new standard for scalable and sustainable AI innovation.

Get the latest news
delivered to your inbox
Sign up for The Manila Times newsletters
By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

Pliops XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed KV service. This solution revolutionizes LLM performance by delivering end-to-end efficiency gains while significantly reducing cost, power, and computational requirements. By enabling vLLM to process each context only once, Pliops is setting a new standard for scalable and sustainable AI innovation.

SAN JOSE, Calif., Feb. 05, 2025 (GLOBE NEWSWIRE) -- With the growing demand for generative AI applications, optimizing large language models (LLM) inference efficiency and reducing costs have become essential. Pliops is empowering developers to tackle these challenges head-on. At AI DevWorld next week, Pliops will showcase its innovative XDP LightningAI solution, which revolutionizes LLM performance by delivering end-to-end efficiency gains while significantly reducing cost, power, and computational requirements. By enabling vLLM to process each context only once, Pliops is setting a new standard for scalable and sustainable AI innovation.

As LLMs continue to grow in size and sophistication, their demands for computational power and energy also increase significantly. This growth introduces challenges, such as longer processing times, to generate the first token of a response due to the need to handle extensive context. Notably, up to 99% of context data - such as conversation history, books, and domain-specific text - may be processed repeatedly during LLM inference. This repetition leads to inefficiencies, as these models must continuously compute their key-value (KV) caches for unchanged information. 

Pliops LightningAI: a Boost for vLLM 

Pliops XDP LightningAI, a revolutionary accelerated KV distributed smart node, introduces a new petabyte tier of memory below high-bandwidth memory (HBM) for GPU compute applications. It utilizes cost-effective, disaggregated smart storage to retain computed KV caches, allowing them to be retrieved if discarded from HBM. When serving a pre-processed context, the saved KV caches are efficiently loaded from storage, allowing vLLM to generate new content significantly faster.

Pliops' LLM inference solution is optimal for AI autonomous task agents, an emerging use case for LLMs. These models have the capability to function autonomously, and are adept at addressing a diverse array of complex tasks through strategic planning, sophisticated reasoning, and dynamic interaction with external environments. 

Pliops' AI DevWorld demo, featuring multi-turn conversations, fundamentally supports autonomous task agents. At the show, Pliops' CTO and co-founder, Moshe Twitto, will deliver a presentation that highlights the details and provides an overview of this groundbreaking capability. The session will take place on Thursday, February 13 at 10 a.m. PST - with a virtual session to follow on Thursday, February 20 at 10 a.m. PST.

XDP LightningAI fully saturates the fabric (including 400G and beyond), even when handling traffic with extremely small random I/O sizes for both read and write operations. It also facilitates seamless sharing of KV caches across multiple GPUs, vLLM instances, and users. With virtually unlimited storage capacity, any portion of the cached context can be reused without re-computation, unlocking new levels of scalability and efficiency. 

XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed KV service. XDP LightningAI outperforms traditional Filesystem (FS) and DRAM-based solutions, addressing critical limitations in handling modern AI workloads. 

Pliops' technology is highly versatile and effective, supporting all advancements in LLMs. The recent announcement by DeepSeek and its innovations further reinforce Pliops' competitive edge. Each of DeepSeek's major architectural innovations either enhances or maintains the advantages of Pliops' KV cache offloading solution.

  • MLA (KV compression) reduces KV cache size but does not lower compute, resulting in a net gain for Pliops. 
  • Speculative decoding reduces HBM bandwidth per token, making batching more efficient, which strengthens Pliops' benefits. 
  • Prefill-decode disaggregation aligns with Pliops' expected market direction, where its solution delivers up to 8x efficiency gains.

DeepSeek's advancements underscore the robustness of Pliops' shared KV store solution. As new models emerge, the fundamental bottlenecks in memory bandwidth and I/O persist, ensuring that Pliops remains a critical enabler for high-performance AI inference. 

Live at AI DevWorld 

Pliops has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The company's demo at AI DevWorld is centered around accelerating LLM inferencing applications. This same memory tier is seamlessly applicable for other GenAI applications that Pliops plans to introduce over the next few months. 

"As the world's largest artificial intelligence dev event, AI DevWorld provides the perfect platform to showcase how our solutions are transforming AI infrastructure, enabling developers to build faster, more sustainable, and scalable AI applications,” said Ido Bukspan, Pliops CEO. "We're excited to share how our technology is paving the way for faster, more efficient, and cost-effective AI innovation.” 

Highlights at Pliops booth #912 on the AI DevWorld show floor of the Santa Clara Convention Center include:

  • Pliops XDP LightningAI running with Dell PowerEdge servers 
  • Pliops XDP enhancements for AI VectorDB 
For more information about Pliops, please visit www.pliops.com

Connect with Pliops 

Read Blog 

About Pliops 

Visit Resource Center - XDP LightningAI Solution Brief 

Connect on LinkedIn 

Follow on X 

About Pliops 

A winner of the FMS 2024 most innovative AI solution, Pliops is a technology innovator focused on making data centers run faster and more efficiently. The company's Extreme Data Processor (XDP) radically simplifies the way data is processed and managed. Pliops overcomes I/O inefficiencies to massively accelerate performance and dramatically reduce overall infrastructure costs for data-hungry AI applications. Founded in 2017, Pliops has been named a few times one of the 10 hottest semiconductor startups. The company has raised over $200 million to date from leading investors including Koch Disruptive Technologies, State of Mind Ventures Momentum, Intel Capital, Viola Ventures, SoftBank Ventures Asia, Expon Capital, NVIDIA, AMD, Western Digital, SK hynix and Alicorn. For more information, visit www.pliops.com.

Media Contact: 

Stephanie Olsen 

Lages & Associates 

(949) 453-8080 

[email protected]

A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/8dec8c19-acf9-41c9-be7c-305ddbeb9825