Until the emergence of the concept of “Big Data”, data were mainly treated locally in data warehouses consisting of several structured databases. Gradually, the data sources are widely diversified and become relatively heterogeneous and were mainly localized on the Internet.
Analysts are projecting the future of the analysis of customer data. Several points of attention are highlighted. While most companies collect, store and analyze data, majority of them are struggling with their big project data and are struggling to meet IT challenges associated with the use of this framework.
Newscientist in association with Microsoft released an infographics how big data techniques seek to gain new insight by analyzing very large data sets. The proliferation of data sources associated with 3V (volume, variety, velocity) have contributed to big data’s growth in recent years. As per the infographics, data is coming in a growing variety of new and often unstructured forms such as text, video and sensor reading. Data sets are growing to around a petabyte (one million gigabytes), which until recently would have been unmanaged with standard hardware and software.
By 2020, all the digital data created, replicated and consumed per year will reach 40,000 Exabytes. The connected devices including pocket calculators, personnel computer, mobile phones, servers and mainframes and videogame consoles have contributed more than 10 million instructions per second.
Today, only 0.5% of data in the digital universe is being analyzed with a potential of increasing the figure to 23% if data is tagged and analyzed properly. Moreover, 13% (5,208 EB) of total digital data will be stored in the cloud in 2020, 24% will be processed or transmitted in the cloud but will not be stored and over 63% (25,030 EB) will remain unprocessed in the cloud.
Big Data Techniques
The infographics states that a wide variety of techniques has been developed and adapted to visualize, analyze, manipulate and aggregate big data to make this kind of data volume tractable. These techniques include Data fusion, Crowdsourcing, Time series analysis, A/B testing, Network analysis, Cluster analysis, Ensemble learning, Association rule learning, Machine learning and much more.
While A/B testing is used to compare different options against a control group in order to determine what treatments will improve a given objective; Cluster analysis is used for classifying objects that splits a diverse group into smaller groups of similar objects. Ensemble learning uses multiple predictive models to obtain better predictive performance.
Network analysis analyzes connections between nodes in a network and their strength. Lastly, Machine learning uses artificial intelligence to automatically learn to recognize complex patterns and make intelligent decisions based on data.