This month’s column is a transcript of a fascinating conversation I had with MapR executives Jim Scott, the director of enterprise strategy and architecture at Hadoop solution provider MapR, and Jack Norris, senior VP of data and applications, on the subject of microservices and scaling Big Data.
SD Times: So, we know that scaling data can be a hassle. What is the impact of microservices on this issue?
Jack Norris: There are some complementary technologies that really are game-changing in terms of how to take advantage of [microservices]. The underlying data layer is an incredible enabler of microservices. If you’re doing microservices that are ephemeral and don’t require a lot of stateful data, then I think it’s pretty well understood and people can be quite successful with it. But the data issues drive a lot of complexity for the developers and for the administrators, and that’s an area that Jim has championed for quite a while, and his experience as an architect and a developer allowed him to grasp this and see it early on.
Jim Scott: There are two different ways to look at it when you look at the more ephemeral services. If you were to take just kind of a general front-end service that’s handling the primary load of a consumer-facing application, it’s probably not going to be doing a lot of work. It’s probably going to be handing off the workload to other services that are sitting behind it. Those services sitting behind it are the ones that are more likely to fall into this model. So, if you were to imagine companies building websites like Amazon, where it consists of 100-plus different service calls to a bunch of different back-end services, there’s the need to compile all the different information to bring back and build a user experience.
When you start looking at those services, being able to have a linearly scalable back-end data flow is pretty important. As you scale out your services, which are going to be doing some of the work, they need to figure out who the user is, what information is relevant to them, they then need to give that information back to the front end to render a front end for the user. The compilation of those different data sets is pretty important. Being able to scale out that tier that is intelligent, where it’s clearly doing some level of computational work, is one thing. But in the same vein, without the data that it depends on, it can’t really do anything.
So, as you scale that service up, you will see how much work each instance of that microservice can perform. You know your scaling factors, and then you know based off of how many different services you have what your workloads are on your back-end data platform, and so when you exceed the total capabilities, you just add another server to that cluster. The same goes for whether it’s a streaming capability, a database capability or a file system capability.
Those microservices, when you imagine for just a moment when you start deploying microservices, if you are the software engineer, you need to have visibility into your services. And that is to say, how fast are they performing? Are there bottlenecks? Are there certain types of requests that are coming in that are causing errors? So, when you look at performance and application monitoring, you must be able to emit data from these instances of microservices so that you can troubleshoot. In the old troubleshooting model, we typically did that by doing complete isolation for different servers, and then each server had its own logs, and you could just trace it that way. Trouble is, that doesn’t scale very well from a cost perspective.
The great thing is, if you imagine how it was done last year, or five years ago, or 10 years ago, however far back you want to go, they were the equivalent of multipurpose applications, monolithic if you will, and those applications had a long life cycle to be able to get updates into them. And the scaling factors for them were all or nothing.