What are Data Lakes

  1. Collect
  2. Organize
  3. Analyze
  4. Infuse

What is a data lake ?

What are the benefits of data lake ?

Scalability : Data lakes offer massive scalability up to the exabyte scale. This is important because when creating a data lake you generally don’t know in advance the volume of data it will need to hold. Traditional data storage systems can’t scale in this way. Data lakes are based on the Hadoop framework, which is a framework that helps in the balanced processing of huge data sets across clusters of systems using simple models. It scales up from a single server to thousands, offering local computation and storage at each node. Hadoop supports huge clusters maintaining a constant price per execution bereft of scaling. To accommodate more one just has to plug in a new cluster.



