Convert CSV To Parquet
CC2P (Convert CSV To Parquet) is a high-performance command-line tool written in Rust that efficiently converts CSV files to the Apache Parquet format. Parquet is a columnar storage file format that offers efficient data compression and encoding schemes, making it ideal for big data processing.
Why Use CC2P?
- Performance: Leverages Rust's speed and multi-threading for fast conversions
- Memory Efficiency: Processes files with minimal memory footprint
- Flexibility: Supports various CSV formats with different delimiters and header options
- Schema Inference: Automatically detects column types from your data
- Batch Processing: Convert multiple CSV files in a single command
Features
- Columnar Storage: Parquet's columnar format provides better compression and faster query performance compared to row-based formats like CSV
- Efficient Compression: Uses Snappy compression for a good balance between compression ratio and speed
- Schema Handling: Automatically infers data types and handles duplicate column names
- Parallel Processing: Multi-threaded conversion using Tokio runtime
- Progress Tracking: Real-time progress indication with indicatif progress bars