Hello everyone, welcome to Learn Data Lake From Storage!
Data lakes are complex systems characterized by varying specifications, formats, and engines. However, the foundational element of all data lakes is the storage layer. Observing how they organize metadata and data on this layer, as well as their optimization strategies based on file design, provides clearer insights. From the storage layer perspective, we can fundamentally understand data more thoroughly. All engines built upon a data lake are essentially implementation details.
This project seeks to explore various data lake projects by deploying them and analyzing their storage behaviors to gain insights into their functionality and design.
This project is structured into various questions along with different data lake projects, each with its own directory. Every project includes a README.md
file that describes the project and provides deployment instructions.
You can navigate to the corresponding directory to explore the project you are interested in. Or you can follow the questions in the Questions section to explore the data lake projects step by step.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0