You need to implement a fault-tolerant, scalable, soft, real-time system with requirements for high availability. It has to be event driven and react to external stimulus, load, and failure. It must always be responsive. You have heard many success stories that suggest Erlang is the right tool for the job. And indeed it is—but while Erlang is a powerful programming language, on its own, it’s not enough to group these features together and build complex reactive systems. To get the job done correctly, quickly, and efficiently, you also need middleware, reusable libraries, tools, design principles, and a programming model that tells you how to architect and distribute your system.
In this tutorial, we will look at the steps needed to design scalable and resilient systems. The lessons learnt apply to Erlang, but are in fact technology agnostic and could be applied to most stacks, including Scala/AKKA, Elixir/OTP and others.
We will focus on:
Distribution: This section covers how to break up your system into manageable microservices. How do you collect these micro services into nodes, which together form distributed architectural patterns, giving you your end-to-end system? What network connectivity do you use to let them communicate with each other?
Interfaces and state: This section covers how you define your service interfaces. What data and state do you distribute across your nodes, clusters, and data centers? And if requests fail across nodes, what is your recovery strategy?
Availability: You need at least two computers to make a fault-tolerant system. When dealing with fault tolerance, you have to make decisions about resilience and reliability. This section covers techniques needed to make sure your system never fails and the trade-offs you need to make in your design.
Scalability: When you picked your distributed pattern, decided how to distribute your data, and made choices on fault tolerance, resilience, and reliability, you also made trade-offs on scalability. This section covers the decisions you have to make and how they affect scalability, as well as how to deal with capacity planning, load regulation, and back pressure.
Visibility: This section covers the importance of visibility on both a business level and a system level. To achieve five-nines availability, you need preemptive support and automation. To trigger automation, you need to know the state of your system and be able to react to it as quickly as possible. This includes metrics, alarms, and notifications.