Parquet

Parquet is an open-source file format for storing data in columns. Parquet files are smaller than CSV files, and they can be read and written much faster. Parquet files also support nested data structures, which makes them ideal for storing complex data.

Parquet is a popular file format for use with Big Data processing frameworks such as Hadoop, Spark, and Impala. It is compatible with most data processing frameworks in the Hadoop echo systems.

Parquet uses efficient data compression and encoding scheme for fast data storage and retrieval. Parquet with "gzip" reduction is slightly quicker to export than CSV. Importing is about 2x times faster than CSV.

It is used for Databricks and Apache Spark

Parquet

Links to This Note

2023-06-30