To fully explain chaos engineering, we must first use a simple story to set the stage for our expert whitepaper on this topic.
Imagine that it’s 1955, and Dr. Jonas Salk is about to inject his newly invented polio vaccine into his first human test subject. The patient is told that she will have a small sample of the polio virus injected directly into her bloodstream. She is told that she will not get sick. Instead, her body will automatically build-up an immunity to the virus, making it impossible for her to get polio now or in the future.
The patient is understandably nervous, all the while perhaps considering that the potential health risks may be far greater than the possible benefits. However, she ultimately decides to be vaccinated. And, of course, she becomes much better off in the long-term compared to those who fearfully refuse the new treatment.
This a metaphoric illustration of the basis of chaos engineering. Computer systems have limits and multiple points of possible failure. By injecting a system with variables that have a likelihood of causing disruption, the disaster recovery team can identify areas of vulnerability and weakness. The team can then determine the necessary step-by-step solutions and protocols which will eventually allow the computer systems to become even more resilient and fault-tolerant.
Where many companies fail is in the adaptation and modification of existing disaster recovery plans (DRP) to incorporate these newly discovered cause-and-effect strategies. In fact, many managers may be surprised to learn that effective and well-tested chaos engineering principles can essentially streamline the entire DRP exercise process while causing certain portions of the documentation to become essentially obsolete.
As a direct result, business continuity of services strengthens, customer service ratings climb, and corporate profitability soars.
The whitepaper is available for free below or for purchase on Amazon Kindle.
What's in our Chaos Engineering whitepaper?
Our expert whitepaper contains 10 key sections:
- Introduction. In the introduction, we lay out a brief argument as to why this approach is important – and how it strengthens business continuity & disaster recovery strategies.
- Chaos Monkey, the Simian Army, and other Chaos Engineering Tools. In this section, illustrate the example from the introduction for optimum disaster recovery management using the real-world example of Netflix and their 2010 implementation of chaos engineering as they moved their systems into the cloud.
- Chaos Engineering Exercise Strategy. In this section, we explain the strategy for testing in a chaos engineering environment – with detailed explanations of the types of testing that can be conducted.
- Chaos Engineering vs. Regular Testing. In this section, we compare the approaches used in chaos engineering versus more traditional disaster recovery testing to draw a clear comparison.
- Benefits of Chaos Engineering. In this section, we explain our view of the benefits of this approach and how it may benefit your organization.
- Chaos Engineering Testing: Development and Implementation. In this section, we provide a detailed explanation of how to develop and implement chaos-based testing within your organization.
- Chaos Engineering: Best Practices. In this section, we discuss several best practices that should be considered as you evaluate how this approach may best fit into your disaster recovery strategies.
- Getting Started: Basic Examples of Chaos Engineering Experiments. In this section, we provide a couple of examples of tests that may be helpful in designing your first experiment.
- The Netflix Method: Real-World Chaos Testing in Production Environments. In this section, we provide a brief overview of Netflix’s methodology for conducting chaos testing in its cloud-based production environments.
- Conclusion. In the conclusion, we tie the whitepaper together and provide a brief closing commentary. A bibliography is also included with links to the primary sources for this whitepaper.
Click to get the FREE report!
About the Author
Our report on Chaos Engineering was authored by Bryghtpath Principal & CEO Bryan Strawser.
Learn more about Bryan and his background in disaster recovery, business continuity, and crisis management in his biography.