In the US, the holiday season accounts for anywhere between 30% and 60% of annual e-commerce sales, depending on merchandising categories. As the share of e-commerce retail sales continues to explode worldwide, it is hard to understate how critical seasonal peak shopping events are for the retail industry. Black Friday, and similar events, can make or break your year’s business performance.
In this article, we are going to talk about the importance of performance engineering, and specifically, performance testing, to guarantee that e-commerce websites, payment apps, and digital logistics infrastructure operate without disruption during peak season.
The COVID-19 pandemic “increased online sales’ share of total retail sales from 16% to 19% in 2020, according to estimates in a UNCTAD report. Forced to spend more time at home, unable to travel and go out to dine at restaurants, consumers have put more of their discretionary spending into physical goods, delivered to their homes. This is an acceleration of trends that were already at play and there is initial evidence that the acceleration forced by COVID is, at least in part, permanent, even as economies reopen.
Running a modern, successful e-commerce business requires complex capabilities and sound infrastructure. A priority among them is ensuring a variety of website components perform as expected.
Product search and detail view functionalities rely on fast database calls and static resources loaded from cached data and CDNs. The shopping cart experience involves tasks such as consulting and modifying SKU stock, looking up the shipping method and its costs by external API calls. For payment processing, the application must interact reliably with different payment systems, update the stock and trigger notifications.
And this is only a small part that needs to be tested and optimized to have the best performance and availability even under stress.
Architecture matters too. The level of complexity and the orchestration required increases when e-commerce applications migrate from monolithic architectures to microservices-based architectures, whose components can be on cloud or on-premise.
Anybody who has ever bought anything online is likely to have experienced some technical or performance issue that prevented them from completing a transaction. While outright failures are increasingly rare, at least during normal operations, we still continue to see “site/app down” reports at times of peak demand.
Even more frequently, performance issues that do not necessarily break an application, but simply slow it down, continue to exact a heavy toll on e-commerce businesses.
According to research by Google, if a website takes more than 3 seconds to load, 40% of customers will abandon it. Conversely, research by Amazon found that Revenue increased by 1% for every 100ms page speed improvement.
With the total loss of potential sales due to abandoned shopping carts estimated at $18 billion/year, retailers do not need to also contend with performance problems that frustrate shoppers.
Performance and availability are absolutely crucial during peak season events like Black Friday and Cyber Monday. According to Adobe Analytics, in 2020, on Black Friday, online sales in the US reached $9 billion, a 22% increase from 2019. Also, during last year’s Thanksgiving, online shopping set a record of $5.1 billion, 21.5% higher than the previous year.
How can you then ensure that your e-commerce apps won’t crash or underperform under peak-season load?
There are different approaches that companies can use in order to ensure their e-commerce operation performance, especially during peak season events.
Traditionally, companies implement a reactive approach. In essence, they manage system performance by reacting as fast as possible when problems are detected and taking action to fix them, either manually (e.g. bug fixing, configuration changes) or automatically (e.g. autoscaling).
This approach relies heavily on monitoring and alerting systems. Depending on the level of detection accuracy and automation, operations and development teams will be notified and be able to intervene in a timely manner or not.
Although every company should implement a good monitoring and alerting system, this kind of approach is useful to manage day-by-day performance problems, but suboptimal when dealing with problems that are out of the ordinary like those typically hitting applications during peak season events. Traffic and transaction loads are by definition exceptional, and business-as-usual countermeasures are often not enough to promptly react to performance problems that can lead to service degradation and ultimately to lost business.
Organizations must invest resources into understanding beforehand different scenarios that may have to face during peak season, so that their systems can robustly handle extreme load peaks, without performance degradation, and with the appropriate infrastructure sizing. This is one of the main goals of the proactive approach.
The proactive approach builds on monitoring and response capabilities, by introducing some important preliminary activities that have proven to be far more effective in managing problems during peak season events.
Data gathered from previous years and similar events, combined with data describing expected demand in the upcoming events, allows performance engineers to understand model load patterns, understand what led to problems in the past and predict possible issues.
Based on the predictions derived from analysis, performance engineers can derive assumptions regarding the IT infrastructure sizing and e-commerce application and components configuration in order to manage the expected load.
How can performance engineers evaluate if their assumptions and sizing are correct? Thanks to performance testing, engineers simulate, in a controlled environment and following proven methodologies, the conditions that should occur during the peak load event, stress-testing the system and monitoring how it responds.
This step is critical to the process and relies on a range of well-established techniques. Which one to use is a decision that depends on the test’s goal. For example, a stress test is performed when the goal is to find an upper bound for the load a system can manage, an endurance test is appropriate when potential memory leaks need to be verified, and load testing ensures that the expected heavy load patterns can be effectively managed by the system.
Of course, all these activities must be supported by an effective monitoring platform, which helps in understanding the behavior of the system, evaluating test results, identifying problems and planning for future actions. If the outcomes of the tests highlight performance problems, it is possible to apply the appropriate countermeasures and redo the simulation to assess their effects and repeat the cycle until satisfying results are obtained.
Chaos testing, a practice with a long history but that has been developed into a structured and systematic process just in the last few years, is predicated on simulating unexpected and random infrastructure failures.
These failures, such as components becoming unavailable, network delays, and even the taking offline of entire data centers, can be simulated to assess system, application, and service resilience from both functional and non-functional (i.e., performance) points of view.
For example, what would happen if one of the payment methods goes down? Or if a portion of the CDN is taken out by a cyber attack? Chaos engineering is increasingly used by organizations to inform strategies that increase resilience against adverse events that are hard to model or predict.
Chaos Engineering is a broad and fast-developing field, in which our teams are increasingly involved. Stay tuned for more articles on the subject 😉
Over the past 20 years, the Moviri Performance Engineering team has helped hundreds of enterprise organizations implement performance testing, capacity management, observability, automation, and chaos engineering solutions. As more and more organizations depend on the resiliency, performance and cost-effectiveness of their digital operations during peak season events, Moviri professionals’ deep expertise helps them ensure that their e-commerce, payment, and fulfillment applications don’t buckle under pressure and deliver the user experience today’s consumers expect.