Technology is changing how we interact with data and computation. Businesses are increasingly using big data to analyse information in minutes rather than days, empowering decision-makers with valuable real-time information. Big data refers to massive datasets that require more than traditional data analytics to process them within a reasonable time frame.
Startups and enterprises are also collecting more data than ever to gain business insights, improve process efficiencies and better target customers. According to the International Data Corporation, the amount of data enterprises create and store doubles every 18 months. Consequently, businesses are struggling to keep up, often swimming in more data than they can analyse and action.
Below, we predict what data trends are likely to emerge in 2017, and how this will impact business. Namely, the Internet of Things (IoT), Hadoop and Apache Spark, machine learning and cybersecurity.
The Internet of Things (IoT) is the idea of having everything connected to the Internet – smartphones, vehicles, buildings, household appliances, micro-chipped animals, etc. The IoT has significant benefits for personal and professional purposes. You could connect your alarm clock to your coffee machine and toaster, which would communicate and make you breakfast as you wake up. The diagram below (Figure 1) illustrates the potential of the IoT to impact technology inside the home. On a professional level, companies could employ proximity-based advertising or track goods within a supply chain.
Interconnectivity produces more data, allowing transparency within processes. For example, there are many areas of a distribution channel to optimise. You could tag a product in a distribution channel, and record locational data on its route from the factory to the warehouse and then to stores. This data would provide valuable information into the inefficiencies of the supply chain process, allowing you to draw reliable data-driven conclusions instead of guesswork and intuition.
As firms better understand the enormous value of collecting and analysing data from consumers, more will invest in IoT technology. A direct consequence of this is that businesses will end up with huge stores of data which they hope to analyse efficiently. Infrastructure surrounding the processing of big data will then also continue to mature.
There are two main challenges with big data: storage and processing. The market leader for both purposes is Hadoop (formally Apache Hadoop), a platform for large datasets. As such, it has become almost synonymous with big data. Enterprises have widely adopted Hadoop over the last few years, and there has been an emergence of many third-party applications written for systems running Hadoop. However, the focus going forward for most enterprises will shift from adopting Hadoop to putting big data to good use.
To better understand why storage is a challenge, consider having a data file so large that a whole hard drive cannot contain it. The only way that you could store the file would be to break it up and save it across multiple hard drives – this is known as distributed storage. The practical benefits of this are that enterprises can efficiently store enormous files on the Hadoop platform. In the past, businesses had to process it using a custom software before they store it – a far more expensive option.
Processing big data is also a challenge. Traditionally, you would have to transfer data from the database to a computer for analysis in the same way that you transfer data from the Internet to a browser. However, moving large amounts of data over a network is extremely slow. Rather than moving the data over to the analytics software, Hadoop’s processing moves the analytics software to the data, resulting in much faster processing.
Recently, another product called Apache Spark has drawn attention as an alternative to Hadoop’s processing capabilities.
Chief Analytics Officer Spring 2017
15% off with code MP15
Big Data and Analytics for Healthcare Philadelphia
$200 off with code DATA200
10% off with code 7WDATASMX
Data Science Congress 2017
20% off with code 7wdata_DSC2017
20% off with code AIP17-7WDATA-20