The official definition, from the Principles of Chaos Engineering website, describes Chaos Engineering as “the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent conditions in production”. Put it in simpler terms, Chaos Engineering breaks parts of software, on purpose, to identify failures before becoming outages that impact the system and, eventually, the business.
The goal of Chaos Engineering is to improve the stability and resiliency of your systems, similar to the Performance Testing one. The difference between the two practices is the approach they use. Instead of evaluating the impact of load-related stressful situations, Chaos Engineering allows testing even more failure cases. For example, what would happen to my system if one of my third-party providers of services releases a defect and breaks it down? The two practices are actually synergic, and not disconnected.
Companies are moving to the cloud or they are rearchitecting their systems to be cloud-native. This makes their systems more distributed and increases the possibility of unplanned failures or unexpected outages. In fact, sooner or later, failures will happen and it is up to you to anticipate them and mitigate their business impact. This is not an easy task.
Chaos Engineering aims to simplify this task by allowing the identification of potential faults caused by infrastructure, dependencies, configurations, and processes.
Common use cases for Chaos Engineering could be:
Chaos Engineering allows increasing the reliability of the systems by:
As previously mentioned, Chaos Engineering enables companies to compare what they think will happen to what actually happens in their systems. Running a chaos experiment means that you need to apply the scientific method to IT systems and not just run a random attack against a random system.
Chaos Engineering is usually implemented with a stepwise approach:
If not well planned and organized, Chaos Engineering may not provide the expected benefits or even be dangerous. In fact, the general recommendation is to start small (by limiting the so-called “blast radius”) and gradually gain confidence with all the tools needed to perform and analyze the experiments’ data before expanding its scope.
Starting small, fixing what does not work, and repeating the experiment, quickly adds up. This way the systems become better at handling real-world events which can’t be controlled or prevented. Thus accomplishing the goal of Chaos Testing.
Moviri constantly evolves its services catalog to allow its customers to adopt cutting-edge technology in the IT space.
We are pleased to announce that our Performance Engineering team added Chaos Engineering to its service portfolio.
Chaos Engineering integrates Moviri Design & Validation offering, allowing to further improve the value that Moviri Performance Engineering services bring to their customers. Moviri Chaos Engineering proposition features strong synergies with the existing services, providing a unified framework that can address a wide set of use cases.
As explained above, if the chaos experiment isn’t well planned and organized, it may cause more harm than good. The experiment needs to follow a strict plan and you need to start small. This is where the more than 20 years of expertise of Moviri’s Performance Engineering team comes into work.
Here are some interesting use cases in which Moviri experts can help businesses.
We can help you implement a state-of-the-art performance engineering framework to deliver the best performance to your business.
Don’t wait any longer. Improve your service resilience now!