I would like to start this blog by an interesting statement I read during my research,
Data is everything and data is much more than “just data”.
If you are a data science and data analytics enthusiast just like me ,then you must have gone through so many different blogs discussing how data is transformed, integrated, and processed to get analytical insights so that we can predict some figures from it. But while reading all these different blogs and different books,there always was one question in my mind, that was “What data actually is ?”
If we go by the dictionary meaning of the word data, then
“Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things”
If we want to express data in more scientific words then, data is defined as
Interpret-able representation of information in a formalized manner suitable for communication, interpretation, or processing. [Reference : Open Archival Information System ( OAIS ) ]
Here the words “Representation of Information” and “collection of facts” carry much more weight than one can think,we will get to that in a minute.
For now let’s focus on what is considered as data,to put in simple terms, data can be anything.It can be observations taken during an experiment, or may be computer simulation results or even any physical artifacts and relics. Note that I am not saying that all observations or recordings will be “Relevant”. To understand it more clearly, assume you are modelling a Neural Network to identify notes played by the piano artist during the concert, then sound of the audience or even sound of other instruments is completely irrelevant to you, then such irrelevant data is called Noise. Okay!!, so now we know what is noise and what is data but still we haven’t discussed why information is so different from the data.
Information is nothing but a set of data which is structured, processed, and presented in a meaningful manner. To understand this, consider the following example.
Consider provided unorganized, raw data :
UT, 1234, Joe, Circle, SIC, 8015553211, 84084, Smith
This data appears as just a random string of words separated by commas but when we process and organize the above string we get result like this,
Name : Joe Smith.
Address : 1234 Circle, Salt Lake City, UT, 84084.
Contact number : (801) 555–3211.
This structured, processed and organized data gives much more insights than the actual raw data. That’s why it can be called as information
So we can say that “When a data is formatted, so as to be in specific form, it is called as information”
We can conclude by saying, Data is “collection of facts” , and it “represents the information”, which when processed into required structure wields a lot of power to a person or entity.
If you liked this blog, you may also like my other blogs from the same series :