XDP LightningAI Is An Ideal Match for Multi-Turn Conversation and Task Agent Applications; Significantly Improves All Models - Including Those by DeepSeek
XDP LightningAI
As LLMs continue to grow in size and sophistication, their demands for computational power and energy also increase significantly. This growth introduces challenges, such as longer processing times, to generate the first token of a response due to the need to handle extensive context. Notably, up to 99% of context data - such as conversation history, books, and domain-specific text - may be processed repeatedly during LLM inference. This repetition leads to inefficiencies, as these models must continuously compute their key-value (KV) caches for unchanged information.
Pliops LightningAI: a Boost for vLLM
Pliops XDP LightningAI, a revolutionary accelerated KV distributed smart node, introduces a new petabyte tier of memory below high-bandwidth memory (HBM) for GPU compute applications. It utilizes cost-effective, disaggregated smart storage to retain computed KV caches, allowing them to be retrieved if discarded from HBM. When serving a pre-processed context, the saved KV caches are efficiently loaded from storage, allowing vLLM to generate new content significantly faster.
Pliops' LLM inference solution is optimal for AI autonomous task agents, an emerging use case for LLMs. These models have the capability to function autonomously, and are adept at addressing a diverse array of complex tasks through strategic planning, sophisticated reasoning, and dynamic interaction with external environments.
Pliops' AI DevWorld demo, featuring multi-turn conversations, fundamentally supports autonomous task agents. At the show, Pliops' CTO and co-founder, Moshe Twitto, will deliver a presentation that highlights the details and provides an overview of this groundbreaking capability. The session will take place on Thursday, February 13 at 10 a.m. PST - with a virtual session to follow on Thursday, February 20 at 10 a.m. PST.
XDP LightningAI fully saturates the fabric (including 400G and beyond), even when handling traffic with extremely small random I/O sizes for both read and write operations. It also facilitates seamless sharing of KV caches across multiple GPUs, vLLM instances, and users. With virtually unlimited storage capacity, any portion of the cached context can be reused without re-computation, unlocking new levels of scalability and efficiency.
XDP LightningAI easily connects to GPU servers by leveraging the mature NVMe-oF storage ecosystem to provide a distributed KV service. XDP LightningAI outperforms traditional Filesystem (FS) and DRAM-based solutions, addressing critical limitations in handling modern AI workloads.
Pliops' technology is highly versatile and effective, supporting all advancements in LLMs. The recent announcement by DeepSeek and its innovations further reinforce Pliops' competitive edge. Each of DeepSeek's major architectural innovations either enhances or maintains the advantages of Pliops' KV cache offloading solution.
- MLA (KV compression) reduces KV cache size but does not lower compute, resulting in a net gain for Pliops.
- Speculative decoding reduces HBM bandwidth per token, making batching more efficient, which strengthens Pliops' benefits.
- Prefill-decode disaggregation aligns with Pliops' expected market direction, where its solution delivers up to 8x efficiency gains.
Live at AI DevWorld
Pliops has focused on LLM inferencing, a crucial and rapidly evolving area within the GenAI world that demands significant efficiency improvements. The company's demo at AI DevWorld is centered around accelerating LLM inferencing applications. This same memory tier is seamlessly applicable for other GenAI applications that Pliops plans to introduce over the next few months.
"As the world's largest artificial intelligence dev event, AI DevWorld provides the perfect platform to showcase how our solutions are transforming AI infrastructure, enabling developers to build faster, more sustainable, and scalable AI applications,” said Ido Bukspan, Pliops CEO. "We're excited to share how our technology is paving the way for faster, more efficient, and cost-effective AI innovation.”
Highlights at Pliops booth #912 on the AI DevWorld show floor of the Santa Clara Convention Center include:
- Pliops XDP LightningAI running with Dell PowerEdge servers
- Pliops XDP enhancements for AI VectorDB
Connect with Pliops
Visit Resource Center - XDP LightningAI Solution Brief
About Pliops
A winner of the FMS 2024 most innovative AI solution, Pliops is a technology innovator focused on making data centers run faster and more efficiently. The company's Extreme Data Processor (XDP) radically simplifies the way data is processed and managed. Pliops overcomes I/O inefficiencies to massively accelerate performance and dramatically reduce overall infrastructure costs for data-hungry AI applications. Founded in 2017, Pliops has been named a few times one of the 10 hottest semiconductor startups. The company has raised over $200 million to date from leading investors including Koch Disruptive Technologies, State of Mind Ventures Momentum, Intel Capital, Viola Ventures, SoftBank Ventures Asia, Expon Capital, NVIDIA, AMD, Western Digital, SK hynix and Alicorn. For more information, visit www.pliops.com.
Media Contact:
Stephanie Olsen
Lages & Associates
(949) 453-8080
A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/8dec8c19-acf9-41c9-be7c-305ddbeb9825