High Availability Systems

2 min readSep 19, 2020

With the criticality of services being provided by software systems increasing, unplanned or longer downtimes must be circumvented. High availability systems aim to provide a fixed level of performance for longer periods of time.

Uptime can be impacted either by failures, which can happen anytime or by maintenance tasks, which are scheduled and have known impact on the system. HA aims to mitigate downtime caused by failures. The downtime required for maintenance tasks, and the nature of maintenance tasks are accounted for while calculating uptime.

Principle methods to achieve higher uptime include:

Eliminating single point of failure by adding redundancies at different levels, this increases reliability of the system.
Reliable crossover mechanisms which allows recovering from component failure without any data loss.
Monitoring of components to ensure failures are detected immediately. This monitoring must also be highly available to ensure accurate measurements.

Redundancy in software systems:

Passive Redundancy: Multiple identical units perform the same function, failure of a single unit leads to decline in performance with no impact on availability of the system. Eg: Load balancing.
Active Redundancy: Multiple identical units perform the same function, failure of a single unit is immediately identified followed by automatic reconfiguration to bypass the failed unit, leading to no impact on performance as well as availability. Eg: Failover (automatic switching of failed unit to redundant/stand by unit).

Ways to ensure HA:

Database scaling either using slaves or using shards.
Distribute components across data centres.
Maintain data backups, and define methods for data recovery.
Use clusters to provide immediate failover mechanisms.
Network load balancing.
Define failover mechanisms, and recovery plans.

High Availability Systems

Written by Priyanka Saxena