Wednesday, January 19, 2022

How to avoid or at least mitigate the risk of software and hardware component failures?

Last Thursday, my Firefox web browser stopped working at a regular zoom meeting with my team. Today, thanks to The Register, I realized that it was due to a Foxstuck software bug. For further details about the bug read 

My troubleshooting was pretty quick. Both Chrome and Safari worked fine, so it was evident that this was definitely the Firefox issue.

I tried various classic tricks to solve the Firefox problem (clearing the cache, cookies, reinstalling the software to the latest version, etc.), but because nothing helped in the 10 minutes I was willing to invest, I decided I didn't have time for further experiments and after about a year of using Firefox, I switched back to Chrome.

The switch over was all about transferring important data from Firefox to Chrome. I use an external password manager (thank god), so the only important data in Firefox were my bookmarks. Exporting bookmarks from Firefox and importing them into Chrome was a matter of seconds.

Problem solved. Hurrah!

But, it's clear that a similar software bug may hit Chrome or Safari in the future, so it's only a matter of time before I will be forced to switch to another web browser. Actually, Chrome has made me angry in the past and that was the reason to switch to Firefox.

So what is the moral of this story?

The only way not to be affected by such software bugs is dual, triple, or even multi-vendor strategy (in this case Firefox, Chrome, Safari) and the art of quickly identifying a problematic component and replacing it with another.

This blog is about data centers, data center infrastructure, and software-defined infrastructure. Does it apply here? I think so.

In the hardware area, we can solve the MULTI-VENDOR strategy using a computer, storage, and network virtualization, where VMware is the industry leader. Server virtualization (ESXi) gives us hardware abstraction so we use HPE, Dell, or Lenovo servers in the same way. Storage virtualization (vSAN, vVols) gives us storage abstraction and independence on storage vendors. Network virtualization does the same for network components like a switch, router, firewall, and load balancer. 

When we virtualize all hardware components we have a software-defined infrastructure. If we do not want to plan, design, implement and operate software-defined infrastructure by ourselves, we can outsource it to cloud providers and consume it as a service. This is IaaS cloud infrastructure.

If we consume IaaS cloud infrastructure, we can solve the MULTI-VENDOR strategy using MULTI-CLOUD. The MULTI-CLOUD strategy is based on the assumption that if one IaaS cloud provider fails, the other cloud providers will not fail at the same time, therefore such strategy has a positive impact on the availability and/or recoverability.

And if we already have an adopted MULTI-CLOUD strategy, then we only lack modernly designed applications that can automatically detect an infrastructure failure of one cloud provider and recover from it by a fast application fail-over to another cloud. Kubernetes can help with multi-cloud from an infrastructure point of view but in the end, it is all about the application architecture having self-healing natively within application DNA. The application architected for MULTI-CLOUD architecture is, at least for me, the CLOUD NATIVE APPLICATION. The application, which is able to live in the cloud and survive inevitable failures. This is exactly how the human body works and how the human civilizations are migrating between the regions. That's why we have multi-site and multi-region architectures and cloud-native applications are able to recognize where is the best place to live, do some cost analysis and migrate if it makes sense. Isn't it similar to humans? 

And that's it. Easy to write, isn't it? ... The real implementation of MULTI-CLOUD architecture is a bit trickier, but with today's technology, it's feasible.

No comments: