Introduction to Big data
In today’s world everything that we do, may it be scrolling endlessly through any social media site, or may it be taking a walk in a park, leaves a digital trace. We are producing more data in years than we produced in the last decades. In every 60 seconds, 400 hours of videos are uploaded on YouTube. Every day we produce whooping 2.5 quintillion bytes of data; in layman’s terms, it will be equal to 10million blue ray disks. Ever thought about what happens to this much data? We can’t just store it in databases or on hard drives. One thing we can say for sure is that data this large, fast, and complex is impossible to process or store using traditional methods. So, it got me thinking about how these huge organizations, process these quintillion bytes to get insights into user behaviors?
Before we dive into methods of processing such huge exponentially growing data, we must understand the concept of Big Data. So let us understand big data first.
According to Gartner,
Big data is high-volume, high velocity, and/or high variety information assets that demand cost effective, innovative forms of information processing that enables enhanced insights, decision making, and process automation
Definition of Big Data - Gartner Information Technology Glossary
Big data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective, innovative…
The above definition revolves around three words: Velocity, Volume, and Variety. These three Vs define very own characteristics of Big data.
Velocity: In today’s dynamic world due to the rise of technologies like IoT and 5G, data is streaming at an unprecedented rate, so big data can be expressed in terms of volume, as data being generated at extremely fast rates and which needs to transform into real-time to get insights.
Volume: The name ‘Big data’ itself suggests that the size of data is enormous. And with the rising of sources and scalable infrastructure amount of data being generated is increasing exponentially.
Variety: As we all know, data being generated by our day-to-day activities is in form of texts, images, audio, and videos. so big data systems are not just capable of processing structured data but can also handle Semi-structured and unstructured data.
So, to sum up we can say that big data is the collection of huge amounts of data that grows exponentially, so it is not possible to store it using traditional methods. Big data is stored and processed using different software tools and software frameworks like Hadoop, Apache Spark, HIVE, etc., which we will discuss in upcoming blogs.
Big data doesn’t simply revolve around how much data you have, the true value or potential of big data is unlocked depending upon how we use it. By analyzing we can find answers to improve efficiency, optimize products, streamline resource management. It can help in taking smart decisions which will add up in revenue growth.
The world of big data is exciting and I can’t cover everything in one blog. So, I have made an attempt to give an introduction about the same. I will cover topics like MapReduce algorithm, Hadoop, and Apache Spark in my upcoming blogs, please stay tuned.