I want to quickly paint the picture in my head about distributed systems (maybe it’s a sloppy picture, but nevertheless). When we talk about microservices, we talk about using microservices as a vehicle for building business Agile IT systems, or systems that allow a business to more quickly change, build new functionality, experiment, and stay ahead of its disruptors and competition (startups, etc.).
As part of autonomous systems that interact with each other to provide business agility, we also need to consider what happens when parts of these systems fail and how a system reacts to overcome failure. A central pre-requisite to being able to build Agile, failure-tolerant systems is autonomy. Autonomous systems can evolve independently from each other because they tend to shed dependencies on other systems, teams, and processes. Changes to a service A shouldn’t force system B to change, nor will there be any other ripple effects. If service A (on which service B depends) goes down, service B should not just blow up.
Where do we have examples of this autonomy in other systems outside of microservices? Well, if you follow the real reasons why microservices are a success, then you know it’s not the technology per se that enables Netflixes and Amazons of the world to be successful with microservices: it’s the organization system structure.
Some examples of these same types of Agile systems include open-source communities, cities, stock markets, ant colonies, flocks of birds, and countless others. They can evolve, react, and even continue on in the face of massive failure. In fact, they’re a well-studied bunch in the field of Complex Adaptive Systems theory. The underlying common themes between these systems? Purpose, autonomy, and reaction to their environments. These autonomous agents react to events.
When something happens, an autonomous agent (ant, person, service) can do something or do nothing, but it’s these events that drive the behavior in complex adaptive systems. Think about how you (as an autonomous person) do things throughout the day. You wake up, you dress based on the temperature (an event or fact), and you get in your car and drive to work (stopping at stop lights, avoiding the people driving erratically, and partaking in other events). These are all responses to events. You get emails in your inbox, you respond. You get a text from your wife to pick up dinner on the way home, etc. We live our entire life responding to events. IT systems built on events can be made to be equally autonomous, scalable, and resilient to failures.
In most distributed systems implementations I’ve seen, we tend to extend the notion of building systems within a single address space to building across an unreliable network. This is a bad idea for many reasons, but many times, it appears to be the simpler approach. We tend to invoke remote objects to prod them to do something, or we call a remote service to look up data. Maybe the tax service is the canonical location for anything to do with tax calculations. If we’re a shopping cart service, we need to calculate the final price for the items in a shopping cart during checkout. So, the shopping cart service calls the pricing service. The pricing service may also call the tax service to do some other adjustments to the price based on shipping location (country, state, city, etc.). The tax service may call the catalog service (taxes may be different depending on product). The shipping service may also call the inventory service, etc.
We may end up with these long strings of calls (which may be okay in monolith application where all these objects live in the same address space, etc.). We’re following the authority pattern of accessing data; we call the service that has authority over the data.