Redundancy and Resiliency

I remember reading this once – ‘Whatever your mission or passion is, you have to understand that your skills, message and knowledge are extremely valuable. Sharing your opinions and professional expertise with the world not only helps you to achieve an effective collaboration but also encourages your followers to learn from your experiences.’ Inspired by this, I, Domenico and my friend, Tarleton started with our own venture: Coffee Next Tuesday, a series of podcasts in which we talk about some really cool IT and Telco stuff over our favourite coffees, at a cafe in Richmond.

 

Our first conversation:

In our very first conversation, Tarleton and I gave our opinions on the subject of Redundancy and Resiliency from Telco and IT perspectives, our respective fields of expertise.

 

Why do we need to know about Redundancy and Resiliency?

 

Well, if we have just launched a new enterprise, we will need to assess and understand all the elements of risk involved in running the business so that we can work out a new plan incase of an outage. Apart from that, it will be crucial for us to know about how to actually prevent an outage even in the worst cases. This is where we see the concept of Redundancy and Resiliency coming in the picture.

Often, the terms Redundancy and Resiliency are confused with each other but as a matter of fact, it is impossible to have one without the other, just like Tarleton says that

datacentre

 resiliency is the inbuilt strength of the network which tells us how to prevent things from failing and redundancy is the “Oh crap, we have an issue! What are we going to do about it?” For example, the IT service providers I work with build redundant networks for our clients to load balance and ensure they have a failover mechanism in the event there is a failed network device or Internet connection.

 

What I have learnt from my experiences:

 

In the 10 years that I’ve worked with IT service providers, I have embraced a certain perspective towards different elements of risk. When we have to respond to a tender, we check if it’s written for redundancy and resiliency. We make our own SLAs which makes it possible to evaluate whether the architecture meets the business requirements or not .

As a managed service provider, my services sit on top of infrastructure; so the SLA solution works both ways while making sure if anything could happen to us, we’re still available and accountable for the outcome.

What Tarleton has learnt from his experiences:

 

If we look at it from the Telco perspective, it’s basically moving risk from your provider to your business, but you still got to estimate the worth of the risk. Tarleton says that when he talks to heads of IT, ITM’s and CIO’s, he feels that they might understand what down time looks like in emergency situations but in a hindsight situation, they also don’t have a specific cost for the business. According to a research he conducted the other day, mid-size companies in Australia paid an average cost $74,000 an hour during downtime. With such a large number, It’s rare that you lose an entire company network but you definitely lose out on a branch site.

 

What are the different elements of risk that companies are protecting themselves against?

 

According to Tarleton, if we look from a retail industry perspective, the primary requirement that the retail companies need to fulfill is drafting their own POS. They can pretty much function with anything else as long as they have POS. Most POSs do offline settlements for under $100 or make use of 4G. However, in the field of finance – banks, gambling or stock exchange, we start talking about increased latency as lost money, which massively changes things at a network level.

 

Do retail companies benefit from the resiliency banks build into the POS system?

 

From a telco perspective, the retail companies definitely benefit from the resiliencyretail pos.jpg banks have in their POS system but they also benefit from the resiliency of their own network. The fact that they can operate offline for two weeks while it is far from an ideal perspective, means it is a resilient network built by them.

Talking about it from an IT perspective, I think vendors or the big names such as CISCO, HP and DELL have done a really good job in building redundancy and resiliency within their platforms. When we talk about the efficiency of a virtual machine, we see the application’s ability to quickly fail over to their second data centre or building that resiliency. It’s a high availability deployment. Instead of pushing a button that yells we’re in a redundancy situation, the actual application has a solution built into it so that even if there was an outage, the user wouldn’t know about it or wouldn’t be necessarily impacted by it.

 

When that platform is being designed, we make strategic investment to make sure we’re doing our bit of due diligence. We never buy things in ones but always in twos because we’ve got dual data centers. However, it’s pointless having two of something if the application can’t actually facilitate the redundancy situation. Thus, we never opt for a vendor who doesn’t have the same perspective.

 

What does a business owner need to know at the time of an outage?

 

losing-money-in-the-stock-market.jpgI think you as a business owner need to learn that it’s not a matter of ‘if it goes down’ but of ‘when it goes down’.

You should always keep these three things in mind at the time of an outage:

A- How we are going to be impacted

B- How much it is going to cost us

C- What we can do to protect ourselves against this

In my view, we need to understand what the impact can be and how much that will cost the company. This determines whether you need to look at two telcos or two managed service providers or completely redundant and resilience N+1 infrastructure.

 

When the companies are building networks and IT deployments, should they focus on the resiliency network over redundancy as a priority?

 

Retail companies want to prevent an outage from occurring at any time because even if there is an outage and companies have to enter a redundancy situation, there is still going to be that window of failover. Therefore, if there’s resilience built into it, you never experience an outage.

 

What are the key points from a networking perspective that you’d advise a customer to look at in terms of resiliency and redundancy?

 

Tarleton says that from a telco perspective:

  1. You need to understand how much you can spend and what down time is worth your business.
  2. When you’re talking to large enterprises with key sites and multiple data centres, you have to use diverse paths and connect them with a dark fibre.
  3. Your service provider should provide you with KMZ so that you can see what they look like. From an internet perspective, you need them to be delivered through two different head ends: a primary and secondary provider because that is a stronger solution.
  4. You should test your redundancy system because there’s no point of having resiliency built into your network if you don’t know about the status of your BGP configuration.

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s