Much of the recent big data experience has been a bare-metal affair, meaning Hadoop has happened largely on non-virtualized servers. That could change as containers and microservices gain traction in application development circles.
Both containers and microservices break up monolithic application code into more finely grained pieces. That streamlines development and makes for easier testing, which is one of the keys to more flexible application deployment and code reuse.
It is early on for such techniques to be applied to big data, but, for new jobs like data streaming, microservices shows promise. For a technology manager at a leading European e-commerce company, the microservices approach simplifies development and enables code reuse.
With microservices, "you can very much economize on what you're doing," according to Rupert Steffner, chief platform architect for business intelligence systems at Otto GmbH, a multichannel retailer based in Hamburg, Germany. He goes further: For some types of applications, not using microservices "is stupid. You're building the same functionality over and over again."
The types of applications Steffner is talking about are multiple artificial intelligence (AI) bots that run various real-time analytics jobs on the company's online retail site. Otto uses a combination of microservices, Docker containers and stream processing technologies to power these AI bots.
Containers and microservices, oh my Cloud computing has been one of the drivers edging Hadoop, Spark and other big data technologies toward virtualization, containers and microservices. There is still much infrastructure to build out, but companies are working on technologies to ease the evolution. "Hadoop was largely run on bare metal, but it runs also on virtual machines; for example, on the Amazon cloud and Azure cloud and via OpenStack. Now it is moving to containers," said Tom Phelan, co-founder and chief architect at BlueData Software Inc., maker of a platform that automatically spawns Hadoop or Spark clusters. "It used to be that performance of Hadoop clusters on bare metal was better, but that is changing," he said. Containers need to gain maturity, he acknowledged, adding that Hadoop, as it was originally designed, is not a microservices-style architecture. Santa Clara, Calif.-based BlueData recently updated is software to improve container support, rolling out automated Kerberos setups for Hadoop clusters and Linux privileged access management tools.