Skip to main content


A data lake is a large storage repository that holds a vast amount of raw data in its native format until it is needed. It's a system or repository of data stored in its natural/basic form, usually object blobs or files. This raw data may include structured data from relational databases (rows, columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).