Federal agencies, like organizations in virtually every sector, are handling more data than ever before.
According to Cisco Systems, global IP traffic is expected to more than double in the span of only a few years — growing to a monthly per-capita total of 25 gigabytes by 2020 (up from 10GB per capita in 2015).
This data boom presents a massive opportunity to find new efficiencies, detect previously unseen patterns and increase levels of service to citizens, but Big Data analytics can’t exist in a vacuum. Because of the enormous quantities of data involved in these solutions, they must incorporate a robust infrastructure for storage, processing and networking, in addition to analytics software.
While some organizations already have the capacity in place to absorb Big Data solutions, others will need to expand resources to accommodate these new tools, or else add new capacity to allow for a continued surplus of resources. This truly is a situation in which the chain is only as strong as its weakest link; if storage and networking are in place, but the processing power isn’t there — or vice versa — a Big Data solution simply won’t be able to function properly.
Often, organizations already possess enough storage in-house to support a Big Data initiative. (After all, the data that will be processed and analyzed via a Big Data solution is already living somewhere.) However, agencies may decide to invest in storage solutions that are optimized for Big Data. While not necessary for all Big Data deployments, flash storage is especially attractive due to its performance advantages and high availability.
Large users of Big Data — companies such as Google and Facebook — utilize hyperscale computing environments, which are made up of commodity servers with direct-attached storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered network-attached storage (NAS).
Cloud storage is an option for disaster recovery and backups of on-premises Big Data solutions. While the cloud is also available as a primary source of storage, many organizations — especially large ones — find that the expense of constantly transporting data to the cloud makes this option less cost-effective than on-premises storage.
Servers intended for Big Data analytics must have enough processing power to support this application.