Join the community on Slack! | Documentation | Performance Benchmarks
Vortex is a next-generation columnar file format and toolkit designed for high-performance data processing. It is the fastest and most extensible format for building data systems backed by object storage. It provides:
-
Blazing Fast Performance
- 100x faster random access reads (vs. modern Apache Parquet)
- 10-20x faster scans
- 5x faster writes
- Similar compression ratios
- Efficient support for wide tables with zero-copy/zero-parse metadata
-
Extensible Architecture
- Modeled after Apache DataFusion's extensible approach
- Pluggable encoding system, type system, compression strategy, & layout strategy
- Zero-copy compatibility with Apache Arrow
-
Open Source, Neutral Governance
- A Linux Foundation (LF AI & Data) Project
- Apache-2.0 Licensed
-
Integrations
- Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more
- Apache Iceberg (coming soon)
🟢 Development Status: Library APIs may change from version to version, but we now consider the file format . From release 0.36.0, all future releases of Vortex should maintain backwards compatibility of the file format (i.e., be able to read files written by any earlier version >= 0.36.0).
- Logical Types - Clean separation between logical schema and physical layout
- Zero-Copy Arrow Integration - Seamless conversion to/from Apache Arrow arrays
- Extensible Encodings - Pluggable physical layouts with built-in optimizations
- Cascading Compression - Support for nested encoding schemes
- High-Performance Computing - Optimized compute kernels for encoded data
- Rich Statistics - Lazy-loaded summary statistics for optimization
Vortex strictly separates logical and physical concerns:
- Logical Layer: Defines data types and schema
- Physical Layer: Handles encoding and storage implementation
- Built-in Encodings: Compatible with Apache Arrow's memory format
- Extension Encodings: Optimized compression schemes (RLE, dictionary, etc.)
All features are exported through the main vortex crate.
cargo add vortexuv add vortex-dataFor browsing the structure of Vortex files, you can use the vx command-line tool.
# Install latest release
cargo install vortex-tui --locked
# Or build from source
cargo install --path vortex-tui --locked
# Usage
vx browse <file># Optional but recommended dependencies
brew install flatbuffers protobuf # For .fbs and .proto files
brew install duckdb # For benchmarks
# Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# or
brew install rustup
# Initialize submodules
git submodule update --init --recursive
# Setup dependencies with uv
uv sync --all-packagesFor optimal performance, we suggest using MiMalloc:
#[global_allocator]
static GLOBAL_ALLOC: MiMalloc = MiMalloc;Licensed under the Apache License, Version 2.0.
Vortex is an independent open-source project and not controlled by any single company. The Vortex Project is a sub-project of the Linux Foundation Projects. The governance model is documented in CONTRIBUTING.md and is subject to the terms of the Technical Charter.
See CONTRIBUTING.md for guidelines.
If you discover a security vulnerability, please email vuln-report@vortex.dev.
Copyright © Vortex a Series of LF Projects, LLC. For terms of use, trademark policy, and other project policies please see https://lfprojects.org
The Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.
- BtrBlocks - Efficient columnar compression
- FastLanes & FastLanes on GPU - High-performance integer compression
- FSST - Fast random access string compression
- ALP & G-ALP - Adaptive lossless floating-point compression
- Procella - YouTube's unified data system
- Anyblob - High-performance access to object storage
- ClickHouse - Fast analytics for everyone
- MonetDB/X100 - Hyper-Pipelining Query Execution
- Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Format for the Many-Core Age
- The FastLanes File Format - Expression Operators
- Apache Arrow
- Apache DataFusion
- parquet2 by Jorge Leitao
- DuckDB
- Velox & Nimble