Data Processing Solutions

Data Processing

Data engines ready for AI.

NVIDIA cuDF | NVIDIA cuVS

Overview

New Data Demands

To transform your enterprise, AI agents need continuous access to your data, putting strain on data infrastructure not designed for agentic reasoning loops.

By accelerating unstructured and structured data processing with NVIDIA cuDF and NVIDIA cuVS, enterprises can meet the new volume and velocity of data demands from AI, while leveraging the data infrastructure they've invested in for years.

The world's most popular data engines run on the accelerated computing platform—helping agents access structured data living in tables and unstructured data living as PDFs, emails, images, and videos across the enterprise.

NVIDIA cuDF and cuVS Adopted by World's Leading Data Platforms

Learn how leading data platforms are using NVIDIA cuDF and cuVS to accelerate structured analytics and unstructured vector search for AI-ready data.

Benefits

Transform Your Data for AI

Massive Performance Gains

The accelerated computing platform delivers up to 20x speedup for data processing, enabling enterprises to take action faster with new use cases. 

Significant Cost Savings

By running on the NVIDIA optimized stack, organizations have saved 80% in costs or more, helping your data infrastructure do more with less.

Easy to Adopt

The world’s most popular analytics and vector data engines have drop-in accelerators to make adoption straightforward, including Apache Spark, OpenSearch, and more.

AI-Ready Data

With context from 90% of enterprise data stored in PDFs, messages, and emails with NVIDIA cuVS, and ground truth from terabytes of structured data processed in minutes with NVIDIA cuDF, your data is ready for agentic AI. 

Products

CUDA-X for Data Processing

cuDF and cuVS are CUDA-X™ toolkits, built on highly optimized CUDA® primitives, to accelerate the data processing ecosystem.

cuDF for Structured Data

  • Accelerates analytics engines on NVIDIA GPUs
  • Includes drop-in accelerators for Apache Spark, Presto, Polars, and DuckDB 
  • Executes analytical queries in minutes from hours

cuVS for Unstructured Data

  • GPU-accelerated vector search and index building for RAG and AI pipelines
  • Integrates with OpenSearch, Elastic, Milvus, and more
  • Reduces vector index build times from hours to minutes

Adopters

Data Processing Ecosystem

From analytical SQL queries to vector search, organizations are adopting NVIDIA's accelerated computing platform into their existing data platforms to accelerate AI-ready pipelines.

Data Processing on NVIDIA Vera

For enterprises running agentic AI workloads at scale, AI agents dramatically increase concurrent, continuous small-scale querying of structured enterprise data. NVIDIA Vera has 1.2 TB/s of memory bandwidth and high-speed on-chip fabric that offers the per-core performance, high throughput, and predictability under load that supports the increased volume and velocity of queries. For the Starburst analytics engine, NVIDIA Vera processed queries 3x faster compared to x86, reducing query execution from minutes to seconds, while the Redpanda streaming engine saw a 6x improvement in p99 versus x86, enhancing the reliability of the data engine.

Coming soon.

Resources

The Latest in Data Processing

NVIDIA cuDF and cuVS Adopted by World's Leading Data Platforms

NVIDIA's accelerated computing platform is fueling modern enterprise data processing. Integrated with the world's most widely used open source data engines—downloaded over 200 million times monthly by developers—these libraries are harnessed across enterprise data platforms, databases, and data lakes.

How Snap Scaled A/B Testing With NVIDIA cuDF

Snap processes 10+ petabytes daily for A/B testing across 940M+ users. Accelerating Apache Spark with NVIDIA cuDF on Google Cloud delivered 4x faster runtimes and 76% cost savings.

Accelerating Large-Scale Analytics With Velox and NVIDIA cuDF

IBM and NVIDIA integrate cuDF with the Velox execution engine, enabling GPU-native query execution for Presto and Apache Spark—delivering up to 12x faster analytics than CPU-only systems.

Data Is the Ground Truth and Context for AI

Hear CEO Jensen Huang's thoughts on the role of the data processing ecosystem in the age of agentic AI.

IBM Reinvents Data Processing

IBM watsonx.data SQL analytics engine Presto is accelerated by cuDF for 5x speedup and 83% cost savings.

Processing 100 Million Rows of Data in Under 2 Seconds With Polars

Polars GPU Engine executes polars code on GPUs for massive speedups.

Next Steps

Ready to Learn More?

Get the latest on data processing news, content, and events.

cuDF

Open source toolkit for structured data using GPU parallelism and memory bandwidth to accelerate data processing and analytics workflows.

cuVS

Open source library for unstructured vector search and data clustering that enables faster vector searches and index builds.

Sign up to receive data science news